DataTechNotes: Recursive Feature Elimination (RFE) Example in Python

Extracting influential features of dataset is essential part of data preparation to train model in machine learning. Scikit-learn API provides RFE class that ranks features by recursive feature elimination to select best features. The method recursively eliminates the least important features based on specific attributes taken by estimator.

In this tutorial, we'll briefly learn how to select best features of dataset by using the RFE in Python. The tutorial covers:

RFE Example with Boston dataset
Source code listing

We'll start by loading the required libraries and functions.

from sklearn.feature_selection import RFE
from sklearn.ensemble import AdaBoostRegressor
from sklearn.datasets import load_boston
from numpy import array

RFE Example with Boston dataset

We'll load Boston housing price dataset and check the dimensions of features data. The 'data' property of the boston object is considered a feature data.

boston = load_boston()
x = boston.data
y = boston.target

print("Feature data dimension: ", x.shape)

Feature data dimension:  (506, 13)

The feature data contains 13 columns of 506 rows, our purpose is to decrease those columns by selecting best 8 by their influence rank.

Next, we'll define the model by using RFE class. The class requires estimator and we can use AdaBoostRegressor meta-estimator model for this purpose. The target number of features to select is defined by n_feature_to_select parameter and step defines number of features to remove in each round. We'll fit the model on x and y training data.

estimator = AdaBoostRegressor(random_state=0, n_estimators=100)
selector = RFE(estimator, n_features_to_select=8, step=1)
selector = selector.fit(x, y)

After fitting we can obtain selected features and their ranking positions.

filter = selector.support_
ranking = selector.ranking_

print("Mask data: ", filter)
print("Ranking: ", ranking)

Mask data:  [ True False False False  True  True False  True  True  True  True False
  True]

Ranking:  [1 5 3 6 1 1 4 1 1 1 1 2 1]

To make it readable we'll filter out the selected features.

features = array(boston.feature_names)
print("All features:")
print(features)

print("Selected features:")
print(features[filter])

All features:

['CRIM' 'ZN' 'INDUS' 'CHAS' 'NOX' 'RM' 'AGE' 'DIS' 'RAD' 'TAX' 'PTRATIO'
 'B' 'LSTAT']

Selected features:

['CRIM' 'NOX' 'RM' 'DIS' 'RAD' 'TAX' 'PTRATIO' 'LSTAT']

In this tutorial, we've briefly learned how to get best features of dataset by using recursive feature elimination (RFE) model in Python. The full source code is listed below.

Source code listing

from sklearn.feature_selection import RFE
from sklearn.ensemble import AdaBoostRegressor
from sklearn.datasets import load_boston
from numpy import array

boston = load_boston()
x = boston.data
y = boston.target

print("Feature data dimension: ", x.shape)

estimator = AdaBoostRegressor(random_state=0, n_estimators=100)
selector = RFE(estimator, n_features_to_select=8, step=1)
selector = selector.fit(x, y)

filter = selector.support_
ranking = selector.ranking_

print("Mask data: ", filter)
print("Ranking: ", ranking)

features = array(boston.feature_names)
print("All features:")
print(features)

print("Selected features:")
print(features[filter])

References:

Scikit learn API RFE

DataTechNotes

Pages

Recursive Feature Elimination (RFE) Example in Python

No comments:

Post a Comment