- Preparing data
- Training the Adaboost Classifier model
- Predicting test data and checking the accuracy
- Testing iris dataset with different base classifiers
import pandas as pd import numpy as np from sklearn.preprocessing import LabelEncoder from sklearn.metrics import confusion_matrix from sklearn.model_selection import train_test_split from sklearn.ensemble import AdaBoostClassifier from sklearn.tree import DecisionTreeClassifier from sklearn.naive_bayes import GaussianNB from sklearn.ensemble import RandomForestClassifier from sklearn import datasets
Preparing data
In this tutorial, first, we'll generate dataset by random numbers with some rules and then check iris dataset with Adaboost Classifier. Below function helps us to create a dataset.
def CreateDataFrame(N): columns = ['a','b','c','y'] df = pd.DataFrame(columns=columns) for i in range(N): a = np.random.randint(10) b = np.random.randint(20) c = np.random.randint(5) y = "normal" if((a+b+c)>25): y="high" elif((a+b+c)<12): y= "low" df.loc[i]= [a, b, c, y] return df df = CreateDataFrame(200) print(df.head())
a b c y 0 4 11 3 normal 1 0 9 1 low 2 2 18 0 normal 3 9 11 1 normal 4 4 7 1 normal
Here, y is output data, and it is a categorical type. We need to change it numeric one. To encode the 'Y' value, we can use LabelEncoder().
le=LabelEncoder() y=le.fit_transform(Y) print(Y.head())
y 0 high 1 normal 2 normal 3 low 4 normal
print(y[0:5])
[0 2 2 1 2]
Next, we'll split X and y data into train and test parts with train_test_split().
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, random_state=0)
Training the Adaboost Classifier model
We use AdaboostClassfier class of 'sklearn.enseble' package to build the Adaboost Classifier model. As a base classifier, we implement DecisionTreeClassfier and train model with training data.
dtc = DecisionTreeClassifier(criterion="entropy", max_depth=3) ada_model=AdaBoostClassifier(base_estimator=dtc, n_estimators=100) ada_model=ada_model.fit(Xtrain,ytrain) print(ada_model) AdaBoostClassifier(algorithm='SAMME.R', base_estimator=DecisionTreeClassifier(class_weight=None,
criterion='entropy',max_depth=3, max_features=None, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, presort=False, random_state=None, splitter='best'), learning_rate=1.0, n_estimators=100, random_state=None)
Predicting test data and checking the accuracy
After the training, we can classify test data and check the accuracy of the model.
ytest_pred=ada_model.predict(Xtest) print(ada_model.score(Xtest, ytest)) 0.94
print(confusion_matrix(ytest, ytest_pred))
[[ 3 0 2]
[ 0 16 0]
[ 0 1 28]]
Testing iris dataset with different base classifiers
Next, we apply the Adaboost classification method to classify iris dataset. Here, we do the same process to prepare data as have done above.
iris= datasets.load_iris() X = iris.data Y = iris.target le=LabelEncoder() y=le.fit_transform(Y) Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, random_state=0)
We check the performance of the model by changing base classifier to Naive Bayes and Random Forest methods.
gnb = GaussianNB() rf = RandomForestClassifier(n_estimators=10)
base_methods=[rf, gnb, dtc] for bm in base_methods: print("Method: ", bm) ada_model=AdaBoostClassifier(base_estimator=bm) ada_model=ada_model.fit(Xtrain,ytrain) ytest_pred=ada_model.predict(Xtest) print(ada_model.score(Xtest, ytest)) print(confusion_matrix(ytest, ytest_pred))
Method: RandomForestClassifier(bootstrap=True, class_weight=None,criterion='gini', max_depth=None, max_features='auto', max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1, oob_score=False, random_state=None, verbose=0, warm_start=False) 0.9736842105263158 [[13 0 0] [ 0 15 1] [ 0 0 9]] Method: GaussianNB(priors=None) 0.9736842105263158 [[13 0 0] [ 0 15 1] [ 0 0 9]] Method: DecisionTreeClassifier(class_weight=None,criterion='entropy',max_depth=3, max_features=None, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, presort=False, random_state=None, splitter='best') 0.9736842105263158 [[13 0 0] [ 0 15 1] [ 0 0 9]]
Here, all base classifiers showed the same performance.
In this post, we have briefly learned how to use the Adaboost Classifier to classify data in Python.
Thank you for reading. The full source code is listed below.
import pandas as pd import numpy as np from sklearn.preprocessing import LabelEncoder from sklearn.metrics import confusion_matrix from sklearn.model_selection import train_test_split from sklearn.ensemble import AdaBoostClassifier from sklearn.tree import DecisionTreeClassifier from sklearn.naive_bayes import GaussianNB from sklearn.ensemble import RandomForestClassifier from sklearn import datasets def CreateDataFrame(N): columns = ['a','b','c','y'] df = pd.DataFrame(columns=columns) for i in range(N): a = np.random.randint(10) b = np.random.randint(20) c = np.random.randint(5) y = "normal" if((a+b+c)>25): y="high" elif((a+b+c)<12): y= "low" df.loc[i]= [a, b, c, y] return df df = CreateDataFrame(200) print(df.head()) X = df[["a","b","c"]] Y = df[["y"]] le=LabelEncoder() y=le.fit_transform(Y) print(Y.head()) print(y[0:5]) Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, random_state=0) dtc = DecisionTreeClassifier(criterion="entropy", max_depth=3) ada_model=AdaBoostClassifier(base_estimator=dtc, n_estimators=100) ada_model=ada_model.fit(Xtrain,ytrain) ytest_pred=ada_model.predict(Xtest) print(ada_model.score(Xtest, ytest)) print(confusion_matrix(ytest, ytest_pred)) iris= datasets.load_iris() X = iris.data Y = iris.target le=LabelEncoder() y=le.fit_transform(Y) Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, random_state=0) gnb = GaussianNB() rf = RandomForestClassifier(n_estimators=10) base_methods=[rf, gnb, dtc] for bm in base_methods: print("Method: ", bm) ada_model=AdaBoostClassifier(base_estimator=bm) ada_model=ada_model.fit(Xtrain,ytrain) ytest_pred=ada_model.predict(Xtest) print(ada_model.score(Xtest, ytest)) print(confusion_matrix(ytest, ytest_pred))
No comments:
Post a Comment