Use Random Forest Feature Importance for Feature Selection

Saving on computation is a priority for me, as I am practicing data science on an old machine, and don't currently have access to cloud computing software.

So, what to do when recursive feature selection (RFE) runs for 20 minutes while I take a snack break and is still spinning when I return?

The answer is scikitlearn's model.feature_importances_.

My Random Forest models are among my best, so this feature is a really nice way to save on computation and still return a high-performing model.

After fitting, predicting, running a confusion matrix, and a classification report, I turn to feature importance so I am able to iterate and improve my model by running it on fewer features, or report the feature importance to my superiors.

See this bit of documentation code from sklearn's Feature importances with a forest of trees web page

One of my favorite things to do with this information, is produce a top ten features bar chart that plots the features by importance.

And, here is the code I copied or wrote to produce it:

# Feature importance
features = pd.DataFrame(forest6.feature_importances_)
features['Feature'] = X_train.columns.values
features['Feature Importance'] = features[0]
features = features.drop(0, axis=1)
features = features.sort_values(by=['Feature Importance'], ascending=True)
features = features.nlargest(n=10, columns=['Feature Importance'])
features

import matplotlib.style as style
# style.available
style.use('fivethirtyeight')

plt.figure(figsize=(8,8))
plt.barh(range(10), features['Feature Importance'], align='center') 
plt.yticks(np.arange(10), features['Feature']) 
plt.xlabel('Feature importance (Weight)')
plt.ylabel('Feature')
plt.title('Top Ten Features by Importance')

I hope this helps you enjoy feature_importances_ by scikitlearn!

DEV Community

Use Random Forest Feature Importance for Feature Selection

Top comments (0)

Read next

Made FOSS for simplifying NextJS dev with OAuth And Postgres

Frontend Challenge - December Edition, Glam Up My Markup: Winter Solstice - Wintice

Amazon S3: How to Set Lifecycle Policies for Buckets and Configure

Web Development: A Guide to Building Accessible and Engaging Platforms