- Preparing the data
- Defining the model
- Predicting and visualizing the result
- Source code listing
from numpy import array, hstack, math from numpy.random import uniform import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_error from sklearn.ensemble import GradientBoostingRegressor from sklearn.multioutput import MultiOutputRegressor
Preparing the data
First, we 'll create a multi-output dataset for this tutorial. It is randomly generated data with some rules below. There are three inputs and two outputs in this dataset. We'll plot the generated data to check it visually.
def create_data(n): x1=array([math.sin(i)*(i/10)+uniform(-5,5) for i in range(n)]).reshape(n,1) x2=array([math.cos(i)*(i/10)+uniform(-9,5) for i in range(n)]).reshape(n,1) x3=array([(i/50)+uniform(-10,10) for i in range(n)]).reshape(n,1) y1 = [x1[i]+x2[i]+x3[i]+uniform(-1,4)+15 for i in range(n)] y2 = [x1[i]-x2[i]-x3[i]-uniform(-4,2)-10 for i in range(n)] X = hstack((x1, x2, x3)) Y = hstack((y1, y2)) return X, Y n = 300 X, Y = create_data(n)
f = plt.figure() f.add_subplot(1,2,1) plt.title("Xs input data") plt.plot(X) plt.xlabel("Samples") f.add_subplot(1,2,2) plt.title("Ys output data") plt.plot(Y) plt.xlabel("Samples") plt.show()
Next, we'll split the dataset into the train and test parts and check the data shapes.
xtrain, xtest, ytrain, ytest=train_test_split(X, Y, test_size=0.15) print("xtrain:", xtrain.shape, "ytrian:", ytrain.shape)
xtrain: (255, 3) ytrian: (255, 2)
print("xtest:", xtest.shape, "ytest:", ytest.shape)
xtest: (45, 3) ytest: (45, 2)
Defining the model
We'll define the model with the MultiOutputRegressor class of sklearn. As an estimator, we'll implement GradientBoostingRegressor with default parameters and then we'll include the estimator into the MultiOutputRegressor class. You can check the parameters of the model by the print command.
gbr = GradientBoostingRegressor() model = MultiOutputRegressor(estimator=gbr) print(model)
Now, we can fit the model with train data and check the training score.
model.fit(xtrain, ytrain) score = model.score(xtrain, ytrain) print("Training score:", score)
Training score: 0.9952671502749106
Predicting and visualizing the result
We'll predict the test data with a trained model and check the MSE rate for both y1 and y2 outputs.
ypred = model.predict(xtest) print("y1 MSE:%.4f" % mean_squared_error(ytest[:,0], ypred[:,0]))
y1 MSE:10.9138
print("y2 MSE:%.4f" % mean_squared_error(ytest[:,1], ypred[:,1]))
y2 MSE:10.8929
Finally, we'll visualize the results in the plot and check them visually.
x_ax = range(len(xtest)) plt.plot(x_ax, ytest[:,0], label="y1-test", color='c') plt.plot(x_ax, ypred[:,0], label="y1-pred", color='b') plt.plot(x_ax, ytest[:,1], label="y2-test", color='m') plt.plot(x_ax, ypred[:,1], label="y2-pred", color='r') plt.legend() plt.show()
In this tutorial, we've briefly learned how to MultiOutputRegressor class in Python. We've trained the multioutput dataset and predicted test data.
Source code listing
from numpy import array, hstack, math
from numpy.random import uniform
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.multioutput import MultiOutputRegressor
def create_data(n): x1=array([math.sin(i)*(i/10)+uniform(-5,5) for i in range(n)]).reshape(n,1) x2=array([math.cos(i)*(i/10)+uniform(-9,5) for i in range(n)]).reshape(n,1) x3=array([(i/50)+uniform(-10,10) for i in range(n)]).reshape(n,1) y1 = [x1[i]+x2[i]+x3[i]+uniform(-1,4)+15 for i in range(n)] y2 = [x1[i]-x2[i]-x3[i]-uniform(-4,2)-10 for i in range(n)] X = hstack((x1, x2, x3)) Y = hstack((y1, y2)) return X, Y n = 300 X, Y = create_data(n)
f = plt.figure() f.add_subplot(1,2,1) plt.title("Xs input data") plt.plot(X) plt.xlabel("Samples") f.add_subplot(1,2,2) plt.title("Ys output data") plt.plot(Y) plt.xlabel("Samples") plt.show()
xtrain, xtest, ytrain, ytest=train_test_split(X, Y, test_size=0.15) print("xtrain:", xtrain.shape, "ytrian:", ytrain.shape) print("xtest:", xtest.shape, "ytest:", ytest.shape) gbr = GradientBoostingRegressor() model = MultiOutputRegressor(estimator=gbr) print(model) model.fit(xtrain, ytrain) score = model.score(xtrain, ytrain) print("Training score:", score) ypred = model.predict(xtest) print("y1 MSE:%.4f" % mean_squared_error(ytest[:,0], ypred[:,0])) print("y2 MSE:%.4f" % mean_squared_error(ytest[:,1], ypred[:,1])) x_ax = range(len(xtest)) plt.plot(x_ax, ytest[:,0], label="y1-test", color='c') plt.plot(x_ax, ypred[:,0], label="y1-pred", color='b') plt.plot(x_ax, ytest[:,1], label="y2-test", color='m') plt.plot(x_ax, ypred[:,1], label="y2-pred", color='r') plt.legend() plt.show()
hey, It's very good read. However, more detailed explanation of topic would have been great.
ReplyDeleteThe Multi-Target Regression focused here is by taking all targets together while fitting the model and during evaluation. Do you think taking one Target at a time would fetch more better results? I wonder why this idea is not taken into account. Appreciate your comments. Thanks again.
You are welcome! Yes, you can do it. But it becomes simple regression model that fits and predicts each target in multiple steps. Here I wanted to show multi-output prediction case in a single training and prediction.
DeleteI think the Sklearn MultiObjectRegressor works in the same way as Sudheer mentioned.
DeleteThank you! It is possible to do a feature importance as well? Multiple feature importance or it needs to be done separately?
ReplyDeleteYou are welcome! Yes, you need to extract the important features in your data preparation section before training your model on it.
Deleteloved it. thanks buddy!
ReplyDeleteHi, thanks for sharing this interesting topic. I wonder what is the mathematic behind mutioutputregressor? Essentially you can plug it to any regression model, right?
ReplyDeleteI am also keen to know the math behind multioutput regressor. It is true you can plug it in to any model. It seems that it fits one model to a set of independent variables and one target variable at a time.
ReplyDelete