DataTechNotes: Using Inception Model for Image Classification with PyTorch

In this tutorial, we'll learn about Inception model and how to use a pre-trained Inception-v3 model for image classification with PyTorch. We'll go through the steps of loading a pre-trained model, preprocessing image, and using the model to predict its class label, as well as displaying the results.The tutorial covers:

Introduction to Inception model
Loading a pre-trained Inception-v3 model
Defining Image Preprocessing
Loading ImageNet Class Labels
Making a Prediction
Conclusion
Full code listing

Introduction to Inception model

The Inception model is a deep convolutional neural network (CNN) architecture designed to efficiently handle image recognition tasks by capturing features at multiple scales. First introduced as Inception v1 (GoogleNet) in the paper "Going Deeper with Convolutions," the model uses new Inception module that processes input data through multiple filter sizes in parallel. This approach balances computational efficiency with high performance.

Inception module

The Inception module extracts features at different spatial scales using parallel convolutions and pooling. 1x1 convolutions reduce dimensionality, minimizing computational cost while retaining important information. Outputs from these operations are concatenated to create a rich feature map.

Key Characteristics of Inception model

Multi-Scale Feature Extraction: The parallel filters (1x1, 3x3, 5x5) within the Inception module allow the model to learn features at different spatial scales, capturing both local and global patterns.
Efficient Computation: By using dimensionality reduction with 1x1 convolutions, the model minimizes computational overhead without sacrificing performance.
Deep Architecture: The original Inception v1 model is 22 layers deep. Subsequent versions (v2, v3, v4) introduce deeper architectures with optimizations like factorized convolutions and residual connections.
Global Average Pooling (GAP): Similar to ResNet, Inception models often use GAP instead of fully connected layers to reduce the number of parameters and overfitting.
Auxiliary Classifiers: Auxiliary classifiers are introduced at intermediate layers during training to improve gradient flow and combat the vanishing gradient problem in very deep networks.

Limitations

Despite its efficiency, the Inception model has some drawbacks:

Complexity of Design: The architecture of the Inception module is intricate, requiring careful design and tuning to optimize performance.
Computational Resources: While efficient compared to other deep models, training and deploying Inception networks still require significant computational resources, particularly for larger variants like Inception v4.
Scalability: Extensions to the architecture (e.g., Inception-ResNet) add residual connections to improve training, but this increases complexity and resource demands.

Loading a Inception-v3 Model

Before starting, make sure you have the following Python libraries installed:

torch (PyTorch)
torchvision (for pre-trained models and transformations)
PIL (Python Imaging Library to handle image files)
matplotlib (for displaying images)
requests (for downloading class labels)

You can install these libraries using pip.

 
 pip install torch torchvision pillow matplotlib requests 
 

PyTorch provides a variety of pre-trained models via the torchvision library. In this tutorial, we use the Inception_v3 model, which has been pre-trained on the ImageNet dataset. We’ll load the model and set it to evaluation mode (which disables certain layers like dropout that are used only during training).

 import torch
import torchvision.transforms as transforms
from torchvision import models
from PIL import Image
import requests
import matplotlib.pyplot as plt
import torch.nn as nn
 
 
# Load the Inception model (pre-trained on ImageNet)
# Set pretrained=True to use the weights trained on ImageNet  
model = models.inception_v3(pretrained=True) 
 
# Set the model to evaluation mode 
model.eval() 
  

Defining Image Preprocessing

To use the Inception model, the input image needs to be preprocessed in the same way the model was trained. For Inception, this includes resizing, center-cropping, and normalizing the image. We’ll use torchvision.transforms to define the following transformations:

Resize the image to 256x256 pixels.
Center-crop the image to 224x224 pixels (Inception's input size).
Convert the image to a tensor.
Normalize the image with the same mean and standard deviation used in ImageNet training.

 
# Define the transformation for the input image
transform = transforms.Compose([
    transforms.Resize(256),  # Resize the image to 256x256 pixels
    transforms.CenterCrop(224),  # Crop the center 224x224 pixels
    transforms.ToTensor(),  # Convert the image to a tensor
    # Normalize with ImageNet mean and std
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) 
])

Loading ImageNet Class Labels

The model outputs a tensor of raw scores corresponding to ImageNet class labels. We need to download these labels to interpret the output. We'll fetch the class labels from PyTorch's GitHub repository using the requests library and convert them into a Python list.

Once you download the class label data, you can save it to a file and use it locally.

 
# URL to fetch the ImageNet class labels
url = "https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt"

# Send a GET request to download the class labels
response = requests.get(url)
response.raise_for_status()  # Check if the request was successful

# Convert the response text directly into a list of class labels
class_labels = [line.strip() for line in response.text.splitlines()]

# Print the first 10 class labels as a quick check
print(class_labels[:10])
 

The output of class_labels:

 
['tench', 'goldfish', 'great white shark', 'tiger shark', 'hammerhead', 'electric ray', 'stingray', 'cock', 'hen', 'ostrich']

Loading and Preprocessing the Image

Next, we’ll load a sample image, apply the transformations, and prepare it for the model. The image is loaded using the PIL library.

 
# Path to the local image file
image_path = "/test/vgg/images/eagle.jpg"  # Replace with your local image path

# Load and preprocess the image
img = Image.open(image_path)  # Open the image file
img_t = transform(img)  # Apply the transformations
img_t = img_t.unsqueeze(0)  # Add batch dimension (required by the model)

Making a Prediction

The image is ready, we can pass it through the Inception model to get predictions. The output will be a tensor of raw scores for each class. We’ll use the following steps:

Perform a forward pass through the network.
Get the predicted class index using torch.max().
Convert the predicted scores to probabilities using softmax.
Map the predicted index to the corresponding class label.

 
# Forward pass through the network
output = model(img_t)

# The output is a tensor of raw scores for each class
# Get the predicted class index
_, predicted = torch.max(output, 1)

# Convert the predicted scores to probabilities using softmax
probabilities = nn.Softmax(dim=1)(output)

# Get the predicted class label and its probability
predicted_class_label = class_labels[predicted.item()]
predicted_probability = probabilities[0, predicted].item()

# Print the result
print(f"Predicted: {predicted_class_label}, Probability: {predicted_probability:.4f}")

Finally, we’ll display the input image alongside its predicted class label and probability using matplotlib.

 
# Display the image with the predicted class and probability
plt.imshow(img)
plt.title(f'Predicted: {predicted_class_label}, Probability: {predicted_probability:.4f}')
plt.axis('off')  # Hide axes for a cleaner display
plt.show()

Conclusion

This tutorial provided an explanation of Inception model and how to use a pre-trained Inception-v3 model in PyTorch to classify an image. Here, we learned:

The architecture of Inception model
Loading the Inception-v3 model.
Preprocessing an image with the correct transformations.
Making predictions and interpret the results using class labels.

Complete code for this tutorial is listed below.

Full code listing

 
import torch
import torchvision.transforms as transforms
from torchvision import models
from PIL import Image
import requests
import matplotlib.pyplot as plt
import torch.nn as nn

# Define the transformations for the input image
transform = transforms.Compose([
    transforms.Resize(256),  # Resize the image to 256x256
    transforms.CenterCrop(224),  # Crop the center 224x224 region
    transforms.ToTensor(),  # Convert the image to a tensor
    # Normalize with ImageNet mean and std 
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])  
])

# URL to fetch the ImageNet class labels
url = "https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt"

# Send a GET request to download the class labels
response = requests.get(url)
response.raise_for_status()  # Check if the request was successful

# Convert the response text directly into a list of class labels
class_labels = [line.strip() for line in response.text.splitlines()]

# Print the first 10 class labels as a quick check
print(class_labels[:10])
 
# Path to the local image file, replace with your local image path
image_path = "/Users/user/data/sample_images/IMG_1249.JPG"
 
# Load and preprocess the image
img = Image.open(image_path)  # Open the image file
img_t = transform(img)  # Apply the transformations
img_t = img_t.unsqueeze(0)  # Add batch dimension (required by the model)
 
# Load the Inception model (pre-trained on ImageNet)
# Set pretrained=True to use the weights trained on ImageNet 
model = models.inception_v3(pretrained=True)  
model.eval()  # Set the model to evaluation mode

# Forward pass through the network
output = model(img_t)

# The output is a tensor of raw scores for each class
# Get the predicted class index
_, predicted = torch.max(output, 1)

# Convert the predicted scores to probabilities using softmax
probabilities = nn.Softmax(dim=1)(output)

# Get the predicted class label and its probability
predicted_class_label = class_labels[predicted.item()]
predicted_probability = probabilities[0, predicted].item()

# Display the image along with the predicted class and its probability
plt.imshow(img)
plt.title(f'Predicted: {predicted_class_label}, Probability: {predicted_probability:.4f}')
plt.axis('off')  # Hide axes for a cleaner image display
plt.show()
 

DataTechNotes

Pages

Using Inception Model for Image Classification with PyTorch

Key Characteristics of Inception model

Limitations

No comments:

Post a Comment