python - AUC-base Features Importance using Random Forest -
i'm trying predict binary variable both random forests , logistic regression. i've got heavily unbalanced classes (approx 1.5% of y=1).
the default feature importance techniques in random forests based on classification accuracy (error rate) - has been shown bad measure unbalanced classes (see here , here).
the 2 standard vims feature selection rf gini vim , permutation vim. speaking gini vim of predictor of interest sum on forest of decreases of gini impurity generated predictor whenever selected splitting, scaled number of trees.
my question : kind of method implemented in scikit-learn (like in r package party) ? or maybe workaround ?
ps : question kind of linked an other.
scoring performance evaluation tool used in test sample, , not enter internal decisiontreeclassifier algo @ each split node. can specify criterion (kind of internal loss function @ each split node) either gini or information entropy tree algo.
scoring can used in cross-validation context goal tune hyperparameters (like max_depth). in case, can use gridsearchcv tune of hyperparameters using scoring function roc_auc.
Comments
Post a Comment