LightGBM is an open-source gradient boosting framework that based on tree learning algorithm and designed to process data faster and provide better accuracy. It can handle large datasets with lower memory usage and supports distributed learning. You can find all the information about the API in this link.
LightGBM can be used for regression, classification, ranking and other machine learning tasks. In
this tutorial, you'll briefly learn how to fit and predict regression
data by using LightGBM in Python. The
tutorial
covers:
- Preparing the data
- Building the model
- Prediction and accuracy check
- Visualizing the results
- Source code listing
import lightgbm as lgb
from sklearn.datasets import load_boston
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from pandas import DataFrame
import matplotlib.pyplot as plt
If you've not installed LightGBM yet, you can install it via pip in Python.
pip install lightgbm
Preparing the data
We use Boston Housing Price dataset as a target regression data and
we can easily load it from sklearn.datasets module. To keep the feature column names, I'll use pandas DataFrame type for feature data. Then, we'll splint data into train and test parts.
boston = load_boston()
x, y = boston.data, boston.target
x_df = DataFrame(x, columns= boston.feature_names)
x_train, x_test, y_train, y_test = train_test_split(x_df, y, test_size=0.15)
Building the model
First, we'll define regression model parameters as shown below. You can change values according to your evaluation targets.
# defining parameters
params = {
'task': 'train',
'boosting': 'gbdt',
'objective': 'regression',
'num_leaves': 10,
'learnnig_rage': 0.05,
'metric': {'l2','l1'},
'verbose': -1
}
Next, we'll load the train and test data into the LightGBM dataset object. Below code shows how to load train and evaluation test data.
# laoding data
lgb_train = lgb.Dataset(x_train, y_train)
lgb_eval = lgb.Dataset(x_test, y_test, reference=lgb_train)
Now, we can train the model with defined variables above.
# fitting the model
model = lgb.train(params,
train_set=lgb_train,
valid_sets=lgb_eval,
early_stopping_rounds=30)
Prediction and Accuracy Check
After training the model, we can predict test data and check prediction accuracy. We'll find the MSE and RMSE metrics of trained model.
# prediction
y_pred = model.predict(x_test)
# accuracy check
mse = mean_squared_error(y_test, y_pred)
rmse = mse**(0.5)
print("MSE: %.2f" % mse)
print("RMSE: %.2f" % rmse)
MSE: 7.66
RMSE: 2.77
Visualizing the results
To
visualize the original and predicted data, we can use 'matplotlib'
library. Below code shows how to plot original and predicted data in a graph.
# visualizing in a plot
x_ax = range(len(y_test))
plt.figure(figsize=(12, 6))
plt.plot(x_ax, y_test, label="original")
plt.plot(x_ax, y_pred, label="predicted")
plt.title("Boston dataset test and predicted data")
plt.xlabel('X')
plt.ylabel('Price')
plt.legend(loc='best',fancybox=True, shadow=True)
plt.grid(True)
plt.show()
LightGBM provides plot_importance() method to plot feature importance. Below code shows how to plot it.
In this tutorial, we've briefly learned how to fit and predict
regression data by using LightGBM regression method in Python. The full
source code is listed below.
Video tutorial
Source code listing
import lightgbm as lgb
from sklearn.datasets import load_boston
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from pandas import DataFrame
boston = load_boston()
x, y = boston.data, boston.target
x_df = DataFrame(x, columns= boston.feature_names)
x_train, x_test, y_train, y_test = train_test_split(x_df, y, test_size=0.15)
# defining parameters
params = {
'task': 'train',
'boosting': 'gbdt',
'objective': 'regression',
'num_leaves': 10,
'learnnig_rage': 0.05,
'metric': {'l2','l1'},
'verbose': -1
}
# laoding data
lgb_train = lgb.Dataset(x_train, y_train)
lgb_eval = lgb.Dataset(x_test, y_test, reference=lgb_train)
# fitting the model
model = lgb.train(params,
train_set=lgb_train,
valid_sets=lgb_eval,
early_stopping_rounds=30)
# prediction
y_pred = model.predict(x_test)
# accuracy check
mse = mean_squared_error(y_test, y_pred)
rmse = mse**(0.5)
print("MSE: %.2f" % mse)
print("RMSE: %.2f" % rmse)
# visualizing in a plot
x_ax = range(len(y_test))
plt.figure(figsize=(12, 6))
plt.plot(x_ax, y_test, label="original")
plt.plot(x_ax, y_pred, label="predicted")
plt.title("Boston dataset test and predicted data")
plt.xlabel('X')
plt.ylabel('Price')
plt.legend(loc='best',fancybox=True, shadow=True)
plt.grid(True)
plt.show()
# plotting feature importance
lgb.plot_importance(model, height=.5)
References:
Amazing Example helped me so much, thank you
ReplyDelete