DataTechNotes: Multi-output Regression Example with MultiOutputRegressor in Python

We studied many methods of multioutput regression analysis with Keras in previous posts. In this tutorial, we'll learn how to fit and predict multioutput regression data with scikit-learn's MultiOutputRegressor class. Multioutput data contains more than one target labels for a given x input data. The tutorial covers:

Preparing the data
Defining the model
Predicting and visualizing the result
Source code listing

We'll start by loading the required libraries for this tutorial.

from numpy import array, hstack, math
from numpy.random import uniform
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.multioutput import MultiOutputRegressor

Preparing the data

First, we 'll create a multi-output dataset for this tutorial. It is randomly generated data with some rules below. There are three inputs and two outputs in this dataset. We'll plot the generated data to check it visually.

def create_data(n):
 x1=array([math.sin(i)*(i/10)+uniform(-5,5) for i in range(n)]).reshape(n,1)
 x2=array([math.cos(i)*(i/10)+uniform(-9,5) for i in range(n)]).reshape(n,1)
 x3=array([(i/50)+uniform(-10,10) for i in range(n)]).reshape(n,1)

 y1 = [x1[i]+x2[i]+x3[i]+uniform(-1,4)+15 for i in range(n)]
 y2 = [x1[i]-x2[i]-x3[i]-uniform(-4,2)-10 for i in range(n)]
 X = hstack((x1, x2, x3))
 Y = hstack((y1, y2))
 return X, Y

n = 300
X, Y = create_data(n)

f = plt.figure()
f.add_subplot(1,2,1)
plt.title("Xs input data")
plt.plot(X)
plt.xlabel("Samples")
f.add_subplot(1,2,2)
plt.title("Ys output data")
plt.plot(Y)
plt.xlabel("Samples")
plt.show()

Next, we'll split the dataset into the train and test parts and check the data shapes.

xtrain, xtest, ytrain, ytest=train_test_split(X, Y, test_size=0.15)
print("xtrain:", xtrain.shape, "ytrian:", ytrain.shape)

xtrain: (255, 3) ytrian: (255, 2)

print("xtest:", xtest.shape, "ytest:", ytest.shape)

xtest: (45, 3) ytest: (45, 2)

Defining the model

We'll define the model with the MultiOutputRegressor class of sklearn. As an estimator, we'll implement GradientBoostingRegressor with default parameters and then we'll include the estimator into the MultiOutputRegressor class. You can check the parameters of the model by the print command.

gbr = GradientBoostingRegressor()
model = MultiOutputRegressor(estimator=gbr)
print(model)

Now, we can fit the model with train data and check the training score.

model.fit(xtrain, ytrain)
score = model.score(xtrain, ytrain)
print("Training score:", score)

Training score: 0.9952671502749106

Predicting and visualizing the result

We'll predict the test data with a trained model and check the MSE rate for both y1 and y2 outputs.

ypred = model.predict(xtest)
print("y1 MSE:%.4f" % mean_squared_error(ytest[:,0], ypred[:,0]))

y1 MSE:10.9138

print("y2 MSE:%.4f" % mean_squared_error(ytest[:,1], ypred[:,1]))

y2 MSE:10.8929

Finally, we'll visualize the results in the plot and check them visually.

x_ax = range(len(xtest))
plt.plot(x_ax, ytest[:,0], label="y1-test", color='c')
plt.plot(x_ax, ypred[:,0], label="y1-pred", color='b')
plt.plot(x_ax, ytest[:,1], label="y2-test", color='m')
plt.plot(x_ax, ypred[:,1], label="y2-pred", color='r')
plt.legend()
plt.show()

In this tutorial, we've briefly learned how to MultiOutputRegressor class in Python. We've trained the multioutput dataset and predicted test data.

Source code listing

from numpy import array, hstack, math
from numpy.random import uniform
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.multioutput import MultiOutputRegressor

def create_data(n):
 x1=array([math.sin(i)*(i/10)+uniform(-5,5) for i in range(n)]).reshape(n,1)
 x2=array([math.cos(i)*(i/10)+uniform(-9,5) for i in range(n)]).reshape(n,1)
 x3=array([(i/50)+uniform(-10,10) for i in range(n)]).reshape(n,1)

 y1 = [x1[i]+x2[i]+x3[i]+uniform(-1,4)+15 for i in range(n)]
 y2 = [x1[i]-x2[i]-x3[i]-uniform(-4,2)-10 for i in range(n)]
 X = hstack((x1, x2, x3))
 Y = hstack((y1, y2))
 return X, Y

n = 300
X, Y = create_data(n)

f = plt.figure()
f.add_subplot(1,2,1)
plt.title("Xs input data")
plt.plot(X)
plt.xlabel("Samples")
f.add_subplot(1,2,2)
plt.title("Ys output data")
plt.plot(Y)
plt.xlabel("Samples")
plt.show()

xtrain, xtest, ytrain, ytest=train_test_split(X, Y, test_size=0.15)
print("xtrain:", xtrain.shape, "ytrian:", ytrain.shape)
print("xtest:", xtest.shape, "ytest:", ytest.shape)

gbr = GradientBoostingRegressor()
model = MultiOutputRegressor(estimator=gbr)
print(model)

model.fit(xtrain, ytrain)
score = model.score(xtrain, ytrain)
print("Training score:", score)

ypred = model.predict(xtest)
print("y1 MSE:%.4f" % mean_squared_error(ytest[:,0], ypred[:,0]))
print("y2 MSE:%.4f" % mean_squared_error(ytest[:,1], ypred[:,1]))

x_ax = range(len(xtest))
plt.plot(x_ax, ytest[:,0], label="y1-test", color='c')
plt.plot(x_ax, ypred[:,0], label="y1-pred", color='b')
plt.plot(x_ax, ytest[:,1], label="y2-test", color='m')
plt.plot(x_ax, ypred[:,1], label="y2-pred", color='r')
plt.legend()
plt.show()

8 comments:

SudheerJune 5, 2020 at 7:13 AM
hey, It's very good read. However, more detailed explanation of topic would have been great.

The Multi-Target Regression focused here is by taking all targets together while fitting the model and during evaluation. Do you think taking one Target at a time would fetch more better results? I wonder why this idea is not taken into account. Appreciate your comments. Thanks again.
AnonymousDecember 9, 2020 at 11:44 AM
Thank you! It is possible to do a feature importance as well? Multiple feature importance or it needs to be done separately?
The Gossip LoverJanuary 19, 2021 at 10:17 PM
loved it. thanks buddy!
MattJune 8, 2021 at 11:30 PM
Hi, thanks for sharing this interesting topic. I wonder what is the mathematic behind mutioutputregressor? Essentially you can plug it to any regression model, right?
MattJuly 28, 2021 at 12:01 AM
I am also keen to know the math behind multioutput regressor. It is true you can plug it in to any model. It seems that it fits one model to a set of independent variables and one target variable at a time.

Pages

Multi-output Regression Example with MultiOutputRegressor in Python

8 comments: