Sparse Principal Component Analysis is a an extended version of PCA by applying sparsity structure. Various estimation methods are used to achieve sparsity based on sparse loadings or sparse weights.
The Scikit-learn API provides SparsePCA class to apply Sparse PCA method in Python. In this tutorial, we'll briefly learn how to project data by using SparsePCA and visualize the projected data in a graph. The tutorials covers:
- Iris dataset SparsePCA projection and visualizing
- MNIST dataset SparsePCA projection and visualizing
- Source code listing
We'll start by loading the required libraries and functions.
from sklearn.decomposition import SparsePCA
from keras.datasets import mnist
from sklearn.datasets import load_iris
from numpy import reshape
import seaborn as sns
import pandas as pd
Iris dataset SparsePCA projection and visualizing
After loading the Iris dataset, we'll extract the data and label parts of the dataset.
iris = load_iris()
x = iris.data
y = iris.target
We'll define the model by using the SparsePCA class, here the n_components
parameter defines the number of target dimensions.
spca = SparsePCA(n_components=2, random_state=123)
z = spca.fit_transform(x)
To visualize the result in a graph, we'll collect the output
component data in a pandas dataframe, then use 'seaborn' library's
scatterplot(). In color palette of scatter plot, we'll
set 3 which defines the categories in label data.
df = pd.DataFrame()
df["y"] = y
df["comp-1"] = z[:,0]
df["comp-2"] = z[:,1]
sns.scatterplot(x="comp-1", y="comp-2", hue=df.y.tolist(),
palette=sns.color_palette("hls", 3),
data=df).set(title="Iris data SparsePCA projection")
MNIST dataset SparsePCA projection and visualizing
We'll apply the same method to the larger dataset. MNIST handwritten
digit dataset works well for this purpose and we can use Keras API's
MNIST data. We'll extract only train part of the dataset and it is
enough for this example.
(x_train, y_train), (_ , _) = mnist.load_data()
print(x_train.shape)
(60000, 28, 28)
MNIST is a three-dimensional data, we'll reshape it into the two-dimensional one.
x_mnist = reshape(x_train, [x_train.shape[0], x_train.shape[1]*x_train.shape[2]])
print(x_mnist.shape)
(60000, 784)
Here, we have 784 features and 60000 samples. Now, we can project data SparsePCA and visualize it in a graph.
spca = SparsePCA(n_components=2, random_state=123)
z = spca.fit_transform(x_mnist)
df = pd.DataFrame()
df["y"] = y_train
df["comp-1"] = z[:,0]
df["comp-2"] = z[:,1]
sns.scatterplot(x="comp-1", y="comp-2", hue=df.y.tolist(),
palette=sns.color_palette("hls", 10),
data=df).set(title="MNIST data SparsePCA projection")
The
plot shows a two-dimensional visualization of the MNIST data. The colors define
the target digits and their feature data location in 2D space.
In this tutorial, we've briefly learned how to how to project data with Sparse PCA method and visualize the projected data in Python. The full source code is listed below.
Source code listing
from sklearn.decomposition import SparsePCA
from keras.datasets import mnist
from sklearn.datasets import load_iris
from numpy import reshape
import seaborn as sns
import pandas as pd
iris = load_iris()
x = iris.data
y = iris.target
spca = SparsePCA(n_components=2, random_state=123)
z = spca.fit_transform(x)
df = pd.DataFrame()
df["y"] = y
df["comp-1"] = z[:,0]
df["comp-2"] = z[:,1]
sns.scatterplot(x="comp-1", y="comp-2", hue=df.y.tolist(),
palette=sns.color_palette("hls", 3),
data=df).set(title="Iris data SparsePCA projection")
(x_train, y_train), (_ , _) = mnist.load_data()
print(x_train.shape)
x_mnist = reshape(x_train, [x_train.shape[0], x_train.shape[1]*x_train.shape[2]])
print(x_mnist.shape)
spca = SparsePCA(n_components=2, random_state=123)
z = spca.fit_transform(x_mnist)
df = pd.DataFrame()
df["y"] = y_train
df["comp-1"] = z[:,0]
df["comp-2"] = z[:,1]
sns.scatterplot(x="comp-1", y="comp-2", hue=df.y.tolist(),
palette=sns.color_palette("hls", 10),
data=df).set(title="MNIST data SparsePCA projection")
References:
No comments:
Post a Comment