Extremely Randomized Trees (or Extra-Trees) is an ensemble learning
method. The method creates extra trees randomly in sub-samples of datasets to improve the predictivity of the model. By this approach, the method reduces the variance. The method averages the outputs from the decision trees.
In this tutorial, we'll briefly learn how to fit and predict regression data by using
Scikit-learn's ExtraTreesRegressor class in Python. The tutorial
- Preparing the data
- Training the model
- Predicting and accuracy check
- Source code listing
from sklearn.ensemble import ExtraTreesRegressor from sklearn.datasets import load_boston from sklearn.model_selection import train_test_split from sklearn.model_selection import cross_val_score from sklearn.metrics import mean_squared_error import matplotlib.pyplot as plt
Preparing the data
In this tutorial, we'll use the Boston housing dataset as target regression data to predict. First, we'll load the dataset and define the x and y parts.
Then, we'll split them into train and test parts. Here, we'll extract 15 percent of the dataset as test data.
boston = load_boston() x, y = boston.data, boston.target
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.15)
Training the model
Next, we'll define the regressor by using the
ExtraTreesRegressor class. You can set some of the arguments for this class. In this example, we can use the class with default parameters.
etr = ExtraTreesRegressor() print(etr) ExtraTreesRegressor(bootstrap=False, criterion='mse', max_depth=None, max_features='auto', max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators='warn', n_jobs=None, oob_score=False, random_state=None, verbose=0, warm_start=False)
Then, we'll fit the model on train data and check the model accuracy score.
etr.fit(xtrain,ytrain) score = etr.score(xtrain, ytrain) print("Score: ", score) Score: 1.0
We can also apply a cross-validation training method to the model and check the accuracy.
cv_scores = cross_val_score(etr, xtrain,ytrain,cv=10) print("Mean cross-validataion score: %.2f" % cv_scores.mean()) Mean cross-validataion score: 0.84
Predicting and accuracy check
Now, we can predict the test data by using the trained model. After the
prediction, we'll check the accuracy level by using the MSE and RMSE metrics.
ypred = etr.predict(xtest) mse = mean_squared_error(ytest, ypred) print("MSE: %.2f" % mse) print("RMSE: %.2f" % mse**(0.5)) MSE: 8.25 RMSE: 2.87
Finally, we'll visualize the test and predicted data in a plot to check the difference visually.
x_ax = range(len(ytest)) plt.plot(x_ax, ytest, lw=0.6, color="blue", label="original") plt.plot(x_ax, ypred, lw=0.8, color="red", label="predicted") plt.title("Boston target test and predicted data") plt.legend() plt.show()
In this tutorial, we've briefly learned how to fit and predict regression data by using
Scikit-learn API's ExtraTreesRegressor class in Python. The full
source code is listed below.
Source code listing
from sklearn.ensemble import ExtraTreesRegressor from sklearn.datasets import load_boston from sklearn.model_selection import train_test_split from sklearn.model_selection import cross_val_score from sklearn.metrics import mean_squared_error import matplotlib.pyplot as plt boston = load_boston() x, y = boston.data, boston.target xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size = 0.15) etr = ExtraTreesRegressor() print(etr) etr.fit(xtrain,ytrain) score = etr.score(xtrain, ytrain) print("Score: ", score) cv_scores = cross_val_score(etr, xtrain, ytrain, cv = 10) print("Mean cross-validataion score: %.2f" % cv_scores.mean()) ypred = etr.predict(xtest) mse = mean_squared_error(ytest, ypred) print("MSE: %.2f" % mse) print("RMSE: %.2f" % mse**(0.5)) x_ax = range(len(ytest)) plt.plot(x_ax, ytest, lw=0.6, color="blue", label="original") plt.plot(x_ax, ypred, lw=0.8, color="red", label="predicted") plt.title("Boston target test and predicted data") plt.legend() plt.show()
