- Preparing the data
- Vectorizing text
- Building keras model
- Predicting test data and the accuracy check
- Source code listing
We'll start by loading the required libraries.
import pandas as pd from sklearn.feature_extraction.text import CountVectorizer from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score,confusion_matrix from keras.models import Sequential from keras import layers
Preparing the data
Here, I collected a simple sentiment data for this tutorial. The data contains imaginary random opinions that positive opinion labeled '1' and negative opinion with '0'. The below is a sample text for sentiment training data.
1,"I like it " 1,"like it a lot " 1,"It's really good " 1,"Recommend! I really enjoyed! " 1,"It's really good " 1,"recommend too " 1,"outstanding performance " ... 0,"it's mediocre! not recommend " 0,"Not good at all! " 0,"It is rude " 0,"I don't like this type " 0,"poor performance " 0,"Boring, not good at all! " 0,"not liked " 0,"I hate this type of things " ...
You can find the full list of the sentiment data below. Copy the text and save it as a sentiments.csv on your target folder.
Next, we'll load the sentiments.csv data and separate it into x and y parts.
df = pd.read_csv('datasets/sentiments.csv') df.columns = ["label","text"] x = df['text'].values y = df['label'].values
To train the model and to predict new data, we'll split the data into train and test parts.
x_train, x_test, y_train, y_test = \ train_test_split(x, y, test_size=0.12, random_state=123)
Vectorizing texts
CountVectorizer() class helps us to build a vector from the text data. We'll create matrix data from the train and test text vectors.
vectorizer = CountVectorizer() vectorizer.fit(x_train) Xtrain = vectorizer.transform(x_train) Xtest = vectorizer.transform(x_test) print(Xtrain.shape)
(49, 77)
print(Xtest.shape)
(7, 77)
Building keras model
Next, we'll build a keras sequential model. We'll use the input layer with 'relu' activation and the output layer with 'sigmoid' activation.
model = Sequential() model.add(layers.Dense(32, input_dim=Xtrain.shape[1], activation='relu')) model.add(layers.Dense(1, activation='sigmoid')) model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) model.summary()
_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense_1 (Dense) (None, 32) 2496 _________________________________________________________________ dense_2 (Dense) (None, 1) 33 ================================================================= Total params: 2,529 Trainable params: 2,529 Non-trainable params: 0 _________________________________________________________________
We can train the model with train data.
model.fit(Xtrain, y_train, epochs=50, batch_size=32,verbose=False)
Then, we'll check the training accuracy.
loss, accTrain = model.evaluate(Xtrain, y_train, verbose=False) print("Train accuracy:", accTrain.round(2)," loss: ", loss.round(2))
Train accuracy: 0.96 loss: 0.42
Predicting test data and the accuracy check
Finally, we'll predict test data and check the prediction accuracy.
ypred=model.predict(Xtest) ypred[ypred>0.5]=1 ypred[ypred<=0.5]=0 cm = confusion_matrix(y_test, ypred) print(cm)
acc=accuracy_score(y_test,ypred) print("Test accuracy:", acc)
[[2 1] [0 4]]
Test accuracy: 0.8571428571428571
We can also check the original and predicted outputs in test data.
result=zip(x_test, y_test, ypred) for i in result: print(i)
('I am excited a lot ', 1, array([1.], dtype=float32)) ('exciting, liked. ', 1, array([1.], dtype=float32)) ('terrible! I did not expect. ', 0, array([0.], dtype=float32)) ('What a nice restaurant.', 1, array([1.], dtype=float32)) ('not recommend, not satisfied ', 0, array([0.], dtype=float32)) ('What a nice show.', 1, array([1.], dtype=float32)) ('Offensive, it is a crap! ', 0, array([1.], dtype=float32)
In this tutorial, we've briefly learned sentiment classification with the Keras deep learning model in Python. To improve the accuracy of the prediction and training, we need a larger dataset to train the model.
The full source code is listed below.
Source code listing
import pandas as pd from keras.models import Sequential from keras import layers from sklearn.metrics import accuracy_score,confusion_matrix df = pd.read_csv('datasets/sentiments.csv') df.columns = ["label","text"] x = df['text'].values y = df['label'].values x_train, x_test, y_train, y_test = \ train_test_split(x, y, test_size=0.12, random_state=123) vectorizer = CountVectorizer() vectorizer.fit(x_train) Xtrain = vectorizer.transform(x_train) Xtest = vectorizer.transform(x_test) print(Xtrain.shape) print(Xtest.shape) model = Sequential() model.add(layers.Dense(32, input_dim=Xtrain.shape[1], activation='relu')) model.add(layers.Dense(1, activation='sigmoid')) model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) model.summary() model.fit(Xtrain, y_train, epochs=50, batch_size=32,verbose=False) model.evaluate(Xtest, y_test, verbose=False) loss, accTrain = model.evaluate(Xtrain, y_train, verbose=False) print("Train accuracy:", accTrain.round(2)," loss: ", loss.round(2)) ypred=model.predict(Xtest) ypred[ypred>0.5]=1 ypred[ypred<=0.5]=0 cm = confusion_matrix(y_test, ypred) print(cm) acc=accuracy_score(y_test,ypred) print("Test accuracy:", acc) result=zip(x_test, y_test, ypred) for i in result: print(i)
sentiments.csv data
1,"I like it " 1,"like it a lot " 1,"It's really good " 1,"Recommend! I really enjoyed! " 1,"It's really good " 1,"recommend too " 1,"outstanding performance " 1,"it's good! recommend! " 1,"Great! " 1,"really good. Definitely, recommend! " 1,"It is fun " 1,"Exceptional! liked a lot! " 1,"highly recommend this " 1,"fantastic show " 1,"exciting, liked. " 1,"it's ok " 1,"exciting show " 1,"amazing performance " 1,"it is great! " 1,"I am excited a lot " 1,"it is terrific " 1,"Definitely good one " 1,"Excellent, very satisfied " 1,"Glad we went " 1,"Once again outstanding! " 1,"awesome! excellent show " 1,"This is truly a good one! " 1,"What a nice restaurant." 1,"What a nice show." 1,"what a great place!" 1,"Great atmosphere" 1,"Definitely you should go" 1,"This is a great!" 1,"I really love it" 0,"it's mediocre! not recommend " 0,"Not good at all! " 0,"It is rude " 0,"I don't like this type " 0,"poor performance " 0,"Boring, not good at all! " 0,"not liked " 0,"I hate this type of things " 0,"not recommend, not satisfied " 0,"not enjoyed, I don't recommend this. " 0,"disgusting movie " 0,"waste of time, poor show " 0,"feel tired after watching this " 0,"horrible performance " 0,"not so good " 0,"so boring I fell asleep " 0,"a bit strange " 0,"terrible! I did not expect. " 0,"This is an awful " 0,"Nasty and horrible! " 0,"Offensive, it is a crap! " 0,"Disappointing! not liked. " 0,"The service is a nightmare"
No comments:
Post a Comment