python - AUC-base Features Importance using Random Forest -
i'm trying predict binary variable both random forests , logistic regression. i've got heavily unbalanced classes (approx 1.5% of y=1).
the default feature importance techniques in random forests based on classification accuracy (error rate) - has been shown bad measure unbalanced classes (see here , here).
the 2 standard vims feature selection rf gini vim , permutation vim. speaking gini vim of predictor of interest sum on forest of decreases of gini impurity generated predictor whenever selected splitting, scaled number of trees.
my question : kind of method implemented in scikit-learn (like in r package party
) ? or maybe workaround ?
ps : question kind of linked an other.
scoring
performance evaluation tool used in test sample, , not enter internal decisiontreeclassifier
algo @ each split node. can specify criterion
(kind of internal loss function @ each split node) either gini
or information entropy
tree algo.
scoring
can used in cross-validation context goal tune hyperparameters (like max_depth
). in case, can use gridsearchcv
tune of hyperparameters using scoring function roc_auc
.
Comments
Post a Comment