DataTechNotes: Hyperparameter Tuning of a PyTorch Model with Optuna

Hyperparameter tuning can significantly improve the performance of machine learning models. In this tutorial, we'll use Optuna library to optimize the hyperparameters of a simple PyTorch neural network model.

For demonstration and simplicity, we'll use the Iris dataset for classification and optimize the model's hyperparameters. This tutorial will cover:

Introduction to Optuna
Preparing the data
Defining the objective function
Creating study object and running
Conclusion

Let's get started.

Introduction to Optuna

Optuna is an open-source hyperparameter optimization framework designed to automate the process of finding optimal hyperparameters for machine learning models. It offers advanced features for efficiently tuning hyperparameters and supports a wide range of machine learning frameworks. You can install it using pip command.

 
 pip install optuna
 

Optuna uses several advanced optimization methods to efficiently explore the hyperparameter space and find optimal configurations. Here’s an overview of the primary optimization methods used by Optuna:

Tree-structured Parzen Estimator (TPE) is a Bayesian optimization method that models the distribution of good and bad hyperparameters using probabilistic models. The method can be defined as shown below.

 
 study = optuna.create_study(direction='maximize', sampler=optuna.samplers.TPESampler())

Random Search is a straightforward method that samples hyperparameters randomly from the defined search space.

 
 study = optuna.create_study(direction='maximize', sampler=optuna.samplers.RandomSampler())
 

CMA-ES (Covariance Matrix Adaptation Evolution Strategy) is an evolutionary strategy optimization algorithm that adapts the covariance matrix of the search distribution to guide the search. It’s useful for optimization problems with continuous parameters.

 study = optuna.create_study(direction='maximize', sampler=optuna.samplers.CmaEsSampler())
 

Multi-objective Optimization involves optimizing multiple objective functions simultaneously. Optuna supports this through the NSGA-II algorithm, which is a popular evolutionary algorithm for multi-objective optimization.

 
 study = optuna.create_study(sampler=NSGAIISampler())
  

Optuna uses the Tree-structured Parzen Estimator (TPE) as its default sampling algorithm. In this tutorial, we will use the default TPE sampler method.

 
 study = optuna.create_study(direction='maximize')

Preparing the Data

We'll start by importing all the necessary libraries for this tutorial.

 
import optuna
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

Next, we load the Iris dataset separate it into feature (X) and label (y) parts. We apply standardization for feature data, split the dataset into train and validation sets and convert them to PyTorch tensor types.

 
# Load Iris datase
iris = load_iris()
X = iris.data
y = iris.target

# Standardize the features
scaler = StandardScaler()
X = scaler.fit_transform(X)

# Split the data into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

# Convert to PyTorch tensors
X_train = torch.tensor(X_train, dtype=torch.float32)
X_val = torch.tensor(X_val, dtype=torch.float32)
y_train = torch.tensor(y_train, dtype=torch.long)
y_val = torch.tensor(y_val, dtype=torch.long)

Defining the model

The SimpleNN model below is designed for classifying data. It consists of three fully connected layers. The input size of the first layer is 4 (since the dataset has 4 features), with two hidden layers whose sizes we can tune, and the final output size is 3 (corresponding to 3 classes).

 
class SimpleNN(nn.Module):
    def __init__(self, hidden_size1, hidden_size2):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(4, hidden_size1)
        self.fc2 = nn.Linear(hidden_size1, hidden_size2)
        self.fc3 = nn.Linear(hidden_size2, 3)  # Output size 3 for 3 classes

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x
    

Defining the objective function

In the objective function, we define the hyperparameters to tune and specify the possible values for each. In this case, we tune the sizes of hidden layer 1 and hidden layer 2, the batch size, and the learning rate. We prepare the training and validation datasets and create data loaders. Then, we define the SimpleNN model with the specified hidden layer sizes and train the model for the defined number of epochs. After training, we evaluate the model using the validation data and calculate the accuracy.

 
def objective(trial):
    # Hyperparameters to tune
    hidden_size1 = trial.suggest_int('hidden_size1', 16, 128)
    hidden_size2 = trial.suggest_int('hidden_size2', 16, 64)
    batch_size = trial.suggest_int('batch_size', 16, 64)
    learning_rate = trial.suggest_loguniform('learning_rate', 0.0001, 0.1)
    num_epochs = 20  # Increase the number of epochs for better training

    train_dataset = torch.utils.data.TensorDataset(X_train, y_train)
    val_dataset = torch.utils.data.TensorDataset(X_val, y_val)

    trainloader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
    valloader = torch.utils.data.DataLoader(val_dataset, batch_size=batch_size, shuffle=False)

    model = SimpleNN(hidden_size1, hidden_size2)
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=learning_rate)

    # Training the model
    for epoch in range(num_epochs):
        model.train()
        running_loss = 0.0
        for inputs, labels in trainloader:
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            running_loss += loss.item()

    # Validation
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for inputs, labels in valloader:
            outputs = model(inputs)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    accuracy = correct / total
    return accuracy
 

Creating study object and running

To perform tuning, we first define the Optuna study object with the 'maximize' parameter set for accuracy. Then, we optimize the objective function by specifying the number of trials to perform. Finally, we print the best parameters and scores."

 
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=30)

print('Best hyperparameters:', study.best_params)
print('Best accuracy:', study.best_value)

The output is shown below:

 
[I 2024-08-12 09:12:37,560] A new study created in memory with name: 
 no-name-eaee3760-1417-4378-9542-d09ad099f962
[I 2024-08-12 09:12:39,647] Trial 0 finished with value: 1.0 and parameters: {'hidden_size1': 128, 
 'hidden_size2': 18, 'batch_size': 26, 'learning_rate': 0.0011607108454925912}. Best is trial 0 with value: 1.0.
[I 2024-08-12 09:12:39,723] Trial 1 finished with value: 0.9 and parameters: {'hidden_size1': 79, 
 'hidden_size2': 47, 'batch_size': 63, 'learning_rate': 0.000711124097875464}. Best is trial 0 with value: 1.0.
[I 2024-08-12 09:12:39,837] Trial 2 finished with value: 1.0 and parameters: {'hidden_size1': 32, 
 'hidden_size2': 49, 'batch_size': 37, 'learning_rate': 0.0037246512129149845}. Best is trial 0 with value: 1.0.
[I 2024-08-12 09:12:39,911] Trial 3 finished with value: 1.0 and parameters: {'hidden_size1': 41, 
 'hidden_size2': 50, 'batch_size': 62, 'learning_rate': 0.027811411012177233}. Best is trial 0 with value: 1.0. 
...
 
[I 2024-08-12 09:12:42,889] Trial 26 finished with value: 0.9 and parameters: {'hidden_size1': 61, 
 'hidden_size2': 47, 'batch_size': 64, 'learning_rate': 0.0008055747549512517}. Best is trial 0 with value: 1.0.
[I 2024-08-12 09:12:43,012] Trial 27 finished with value: 1.0 and parameters: {'hidden_size1': 46, 
 'hidden_size2': 40, 'batch_size': 30, 'learning_rate': 0.011150161727273534}. Best is trial 0 with value: 1.0.
[I 2024-08-12 09:12:43,141] Trial 28 finished with value: 1.0 and parameters: {'hidden_size1': 128, 
 'hidden_size2': 60, 'batch_size': 36, 'learning_rate': 0.005556706062926228}. Best is trial 0 with value: 1.0.
[I 2024-08-12 09:12:43,312] Trial 29 finished with value: 1.0 and parameters: {'hidden_size1': 82, 
 'hidden_size2': 47, 'batch_size': 22, 'learning_rate': 0.001930246150641657}. Best is trial 0 with value: 1.0.
 
Best hyperparameters: {'hidden_size1': 128, 'hidden_size2': 18, 'batch_size': 26, 
 'learning_rate': 0.0011607108454925912}
Best accuracy: 1.0
 

The result shows that the best hyperparameters for our model a'hidden_size1': 128, 'hidden_size2': 18, 'batch_size': 26, 'learning_rate': 0.0011607108454925912

Conclusion

In this tutorial, we demonstrated how to use Optuna library for hyperparameter tuning of a simple PyTorch model. We defined a basic neural network, created an objective function, and used Optuna to find the best hyperparameters. This approach can be applied to more complex models and datasets to improve performance.

DataTechNotes

Pages

Hyperparameter Tuning of a PyTorch Model with Optuna

No comments:

Post a Comment