python - AUC-base Features Importance using Random Forest -

January 15, 2010

i'm trying predict binary variable both random forests , logistic regression. i've got heavily unbalanced classes (approx 1.5% of y=1).

the default feature importance techniques in random forests based on classification accuracy (error rate) - has been shown bad measure unbalanced classes (see here , here).

the 2 standard vims feature selection rf gini vim , permutation vim. speaking gini vim of predictor of interest sum on forest of decreases of gini impurity generated predictor whenever selected splitting, scaled number of trees.

my question : kind of method implemented in scikit-learn (like in r package party) ? or maybe workaround ?

ps : question kind of linked an other.

scoring performance evaluation tool used in test sample, , not enter internal decisiontreeclassifier algo @ each split node. can specify criterion (kind of internal loss function @ each split node) either gini or information entropy tree algo.

scoring can used in cross-validation context goal tune hyperparameters (like max_depth). in case, can use gridsearchcv tune of hyperparameters using scoring function roc_auc.

Search This Blog

JVParth

python - AUC-base Features Importance using Random Forest -

Comments

Post a Comment

Popular posts from this blog

toolbar - How to add link to user registration inside toobar in admin joomla 3 custom component -

linux - disk space limitation when creating war file -

How to provide Authorization & Authentication using Asp.net, C#? -