osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

AUCPR of individual features using Random Forest (Error: unhashable Type)



I have a data set of 19 features (v1---v19) and one class label (c1) , I can eaily get the precision recall value of all variables with the class label, but I want the AUCPR of individual features with the class label The data is in this form

V1       V2       V3    V4  V5  V6  V7  V8  V9  V10 V11 V12 V13 V14 V15 V16 V17 V18       V19       C1
4182    4182    4182    1   2   0   0   0   4   1   1   0   5   0   1   1   24  4.4654  28.18955043 1
11396   3798.6  3825    3   1   0   1   0   0   3   3   1   0   1   1   3   5   4.452   11.90765492 0
60416   5034.66 5393.5  12  1   0   0   0   0   12  12  3   6   1   4   12  2   4.4711  35.11543135 0
34580   4940    5254    7   1   4   0   2   0   10  12  8   0   1   1   10  45  4.4689  32.44228433 1
8667    4333.5  4333.5  2   1   0   1   0   0   2   2   1   0   1   0   2   1   4.4659  28.79708384 0
4011    4011    4011    1   1   30  0   0   0   2   2   1   8   1   0   2   1   4.4634  25.75941677 0
691347  5083.43 5300    136 2   0   0   0   9   44  44  12  0   1   12  44  32  4.4693  32.92831106 1
So far I have done this

from collections import defaultdict
from sklearn.cross_validation import train_test_split
from sklearn.ensemble import RandomForestClassifier
import pandas as pd
import numpy as np
from sklearn.metrics import average_precision_score

mydata = pd.read_csv("TEST_2.csv")
y = mydata["C1"]  #provided your csv has header row, and the label column is named "Label"

##select all but the last column as data
X = mydata.ix[:,:-1]
X=X.iloc[:,:]
names = X.iloc[:,:].columns.tolist()
# -- Gridsearched parameters
model_rf = RandomForestClassifier(n_estimators=500,
                                 class_weight="auto",
                                 criterion='gini',
                                 bootstrap=True,
                                 max_features=10,
                                 min_samples_split=1,
                                 min_samples_leaf=6,
                                 max_depth=3,
                                 n_jobs=-1)
scores = defaultdict(list)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.5,
                                                    random_state=0)
# -- Fit the model (could be cross-validated)



for i in range(X_train.shape[1]):
    X_t = X_test.copy()
    rf = model_rf.fit(X_train[:,i], y_train)
    scores[names[i]] = average_precision_score(y_test, rf.predict(X_t[:,i))




print("Features sorted by their score:")
print(sorted([(round(np.mean(score), 4), feat) for
              feat, score in scores.items()], reverse=True))
It gives the error unhashable type

The output should be something like that

V1: 0. 82
V2: 0.74
:
:
V19: 0.55