In this post, we'll learn how to create one hot encoding array map in Python. The post covers:
- One hot encoding with the sklearn
- One hot encoding with Keras
- Iris dataset one hot encoding example
- Source code listing
from sklearn.preprocessing import LabelEncoder from sklearn.preprocessing import OneHotEncoder from keras.utils import to_categorical from sklearn import datasets
One hot encoding with sklearn
To represent labels in one hot encoding map, first, we need to create integer vector with unique integer value assigned to each label class like 'cat':0, 'dog':1, 'mouse':2, etc. Let's see an example.
labels=['dog','cat','cat','mouse','dog','dog']
label_encoder=LabelEncoder()
label_ids=label_encoder.fit_transform(labels)
print(labels)
['dog', 'cat', 'cat', 'mouse', 'dog', 'dog']
print(label_ids)
[1 0 0 2 1 1]
Then we can create a one hot encoded matrix that identifies label with the value 1. One hot matrix map is about the positions of unique label names with alphabetic order like {cat, dog, mouse}. The target label is defined by setting a '1' in its position in a matrix.
{ (0, 0, 1),
(0, 1, 0),
(1, 0, 0) }
Here, (0, 0, 1) represents 'mouse', (0, 1, 0) represents 'dog', and (1, 0, 0) represents 'cat'. We can create the matrix map as shown below.
onehot_encoder=OneHotEncoder(sparse=False)
reshaped=label_ids.reshape(len(label_ids), 1)
onehot=onehot_encoder.fit_transform(reshaped)
print(onehot)
[[0. 1. 0.]
[1. 0. 0.]
[1. 0. 0.]
[0. 0. 1.]
[0. 1. 0.]
[0. 1. 0.]]
One hot encoding with Keras
We can also create one hot encoding map with to_categorical() function of Keras. Here, we'll use label_ids vector data.
print(label_ids) [1 0 0 2 1 1]
to_cat=to_categorical(label_ids)
print(to_cat)
[[0. 1. 0.]
[1. 0. 0.]
[1. 0. 0.]
[0. 0. 1.]
[0. 1. 0.]
[0. 1. 0.]]
Iris dataset one hot encoding example
Next, we'll create one hot encoding map for iris dataset category values. As you may know, iris data contains 3 types of species; setosa, versicolor, and virginica. They are encoded as 0, 1, and 2 in a dataset. So we can reshape and transform with a OneHotEncoder().
iris= datasets.load_iris()
X = iris.data
Y = iris.target
onehot_encoder=OneHotEncoder(sparse=False)
reshaped=Y.reshape(len(Y), 1)
y_onehot=onehot_encoder.fit_transform(reshaped)
print(Y.shape)
(150,)
print(y_onehot.shape)
(150, 3)
print(Y[0:10])
[0 0 0 0 0 0 0 0 0 0]
print(y_onehot[1:10])
[[1. 0. 0.]
[1. 0. 0.]
[1. 0. 0.]
[1. 0. 0.]
[1. 0. 0.]
[1. 0. 0.]
[1. 0. 0.]
[1. 0. 0.]
[1. 0. 0.]]
In this post, we've briefly learned how to create one hot encoding map for labels in classification data. The full source is listed below.
Source code listing
from sklearn.preprocessing import LabelEncoder from sklearn.preprocessing import OneHotEncoder from keras.utils import to_categorical from sklearn import datasets labels=['dog','cat','cat','mouse','dog','dog'] label_encoder=LabelEncoder() label_ids=label_encoder.fit_transform(labels) print(labels) print(label_ids) onehot_encoder=OneHotEncoder(sparse=False) reshaped=label_ids.reshape(len(label_ids), 1) onehot=onehot_encoder.fit_transform(reshaped) print(onehot) to_cat=to_categorical(label_ids) print(to_cat) iris= datasets.load_iris() X = iris.data Y = iris.target onehot_encoder=OneHotEncoder(sparse=False) reshaped=Y.reshape(len(Y), 1) y_onehot=onehot_encoder.fit_transform(reshaped) print(Y.shape) print(y_onehot.shape) print(Y[0:10]) print(y_onehot[1:10])
No comments:
Post a Comment