DataTechNotes: Implementing Learning Rate Schedulers in PyTorch

In deep learning, optimizing the learning rate is an important for training neural networks effectively. Learning rate schedulers in PyTorch adjust the learning rate during training to improve convergence and performance. This tutorial will guide you through implementing and using various learning rate schedulers in PyTorch. The tutorial covers:

Introduction to learning rate
Setting Up the Environment
Initializing the Model, Loss Function, and Optimizer
Learning Rate Schedulers
Using schedulers in training
Implementation and performance check
Conclusion

Let's get started.

Introduction to learning rate

The learning rate is a critical hyperparameter in the training of machine learning models, particularly in neural networks and other iterative optimization algorithms. It determines the step size at each iteration while moving towards a minimum of the loss function.

Setting Up the Environment

Before you start, ensure you have the torch library installed:


 pip install torch

This command will download and install the necessary dependencies in your Python environment.

Next, we import the necessary libraries for this tutorial and create simple neural network for demonstration. It is a fully connected layer with input size 10 and output size 2. For demonstration purpose, we create simple synthetic data to train the model. The trainloader is a list of 1000 tuples, each containing a tensor of inputs and corresponding labels.

 
import torch
import torch.nn as nn
import torch.optim as optim
import torch.optim.lr_scheduler as lrs

# Define a simple neural network model
class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        # Define a fully connected layer with input size 10 and output size 2
        self.fc = nn.Linear(10, 2)

    def forward(self, x):
        # Forward pass: compute the output of the network
        return self.fc(x)


# Generate the trainloader set (replace with actual data loader)
# Here, trainloader is a list of 1000 tuples, each containing a tensor 
# of inputs and corresponding labels
trainloader = [(torch.randn(32, 10), torch.randint(0, 2, (32,))) for _ in range(1000)]
 

Initializing the Model, Loss Function, and Optimizer

We create an instance of the neural network, define a loss function, and set up the optimizer.

Loss Function
A loss function, also known as a cost function or objective function, quantifies the difference between the predicted output of the model and the actual target values. It measures how well or poorly the model is performing. The goal of training a neural network is to minimize this loss function.

In this example we use the nn.CrossEntropyLoss function as the loss function. This loss function is commonly used for classification tasks and calculates the cross-entropy loss between the predicted probabilities and the actual class labels.

Optimizer
An optimizer is an algorithm or method used to update the weights and biases of the neural network to minimize the loss function. It adjusts the model parameters based on the computed gradients during backpropagation. The choice of optimizer and its hyperparameters can significantly impact the model's convergence speed and performance.

The optim.SGD function is used as the optimizer. SGD stands for Stochastic Gradient Descent, which updates the model parameters using the gradients of the loss function. The learning rate is a crucial hyperparameter that controls the size of the steps the optimizer takes to reach the minimum of the loss function.

 
model = SimpleNN()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)

Learning Rate Schedulers

Learning rate schedulers are used to adjust the learning rate during training. Properly adjusting the learning rate can significantly improve training performance and convergence speed. PyTorch provides several learning rate schedulers that can be easily integrated into your training loop. Below are explanations and examples of commonly used learning rate schedulers.

StepLR

The StepLR decreases the learning rate by a factor of gamma every step_size epochs. The gamma is a factor by which the learning rate is multiplied to reduce it.

 
# StepLR: Decrease the learning rate by a factor of 0.1 every 30 epochs
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1)

MultiStepLR

The MultiStepLR decreases the learning rate byat specified epochs (milestones).

 
# MultiStepLR: Decrease the learning rate by a factor of 0.1 at epochs 30 and 80
scheduler = optim.lr_scheduler.MultiStepLR(optimizer, milestones=[30, 80], gamma=0.1)

ExponentialLR

The ExponentialLR decreases the learning rate by a factor of gamma every epoch.

 
# ExponentialLR: Decrease the learning rate by a factor of 0.95 every epoch
scheduler = optim.lr_scheduler.ExponentialLR(optimizer, gamma=0.95)

CosineAnnealingLR

CosineAnnealingLR adjusts the learning rate according to the cosine annealing schedule.

 
# Adjusts the learning rate following a cosine curve with a period of 50 epochs
scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=50)

ReduceLROnPlateau

The ReduceLROnPlateau reduces the learning rate when a metric (e.g., validation loss) has stopped improving.

 
# Reduces the learning rate by a factor of 0.1 when a monitored metric 
# (e.g., validation loss) has stopped improving for 10 epochs, with verbose output
scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, 
                                                 patience=10, verbose=True)

CyclicLR

The CyclicLR cycles the learning rate between two boundaries with a constant frequency.

 
# Cycles the learning rate between 0.001 and 0.1 over 2000 iterations in a triangular2 mode
scheduler = optim.lr_scheduler.CyclicLR(optimizer, base_lr=0.001, max_lr=0.1, 
                                        step_size_up=2000, mode='triangular2')

Using schedulers in training

Below code shows a training loop for a model with learning rate scheduler. The outer loop runs for a range of epochs from 1 to 100 (inclusive) in steps of 10. Inner loop iterates through the training data (trainloader), where inputs are the input features and labels are the target labels. In case of "ReduceLROnPlateau" scheduler, the training loss is used as the validation loss and adjusted the learning rate based on the validation loss.

 
# Training loop
for epoch in range(1, 110, 10):
    for inputs, labels in trainloader:
        optimizer.zero_grad()  # Clear the gradients
        outputs = model(inputs)  # Forward pass
        loss = criterion(outputs, labels)  # Compute the loss
        loss.backward()  # Backward pass
        optimizer.step()  # Update the weights

    if name == "ReduceLROnPlateau":
        # Simulate validation loss calculation
        val_loss = loss.item()  # For demonstration, we use training loss as validation loss

        # Step the scheduler based on validation loss
        scheduler.step(val_loss)
    else:
        # Step the scheduler at the end of each epoch
        scheduler.step()

    # Print the learning rate for the current epoch
    print(f'Epoch {epoch}, Learning Rate {scheduler.get_last_lr()[0]:.6f}')

Implementation and performance check

In the code below, we implement various learning rate schedulers and train a model using them.

Note that the purpose of this code is to demonstrate the implementation of different schedulers and observe changes in the learning rate. We do not consider any concerns regarding the model, training data, parameters, epoch numbers, or other details. The parameters were chosen to observe changes during training. When you apply these schedulers in your model training, you need to carefully set parameters according to the characteristics of your dataset.

Let's run the code and check the performance.

 
import torch
import torch.nn as nn
import torch.optim as optim
import torch.optim.lr_scheduler as lrs

# Define a simple neural network model
class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        # Define a fully connected layer with input size 10 and output size 2
        self.fc = nn.Linear(10, 2)

    def forward(self, x):
        # Forward pass: compute the output of the network
        return self.fc(x)

# Generate the trainloader set (replace with actual data loader)
# Here, trainloader is a list of 1000 tuples, each containing a 
# tensor of inputs and corresponding labels
trainloader = [(torch.randn(32, 10), torch.randint(0, 2, (32,))) for _ in range(1000)]

# Define different learning rate schedulers
schedulers = [
    ["StepLR", lrs.StepLR],
    ["MultiStepLR", lrs.MultiStepLR],
    ["ExponentialLR", lrs.ExponentialLR],
    ["CosineAnnealingLR", lrs.CosineAnnealingLR],
    ["ReduceLROnPlateau", lrs.ReduceLROnPlateau],
    ["CyclicLR", lrs.CyclicLR]
]

# Loop over each scheduler
for name, Scheduler in schedulers:
    print(f"{name}")

    # Reinitialize model, optimizer, and scheduler for each iteration
    model = SimpleNN()
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(model.parameters(), lr=0.1)

    # Initialize the scheduler with appropriate arguments
    if name == "StepLR":
        scheduler = Scheduler(optimizer, step_size=2, gamma=0.1)
    elif name == "MultiStepLR":
        scheduler = Scheduler(optimizer, milestones=[5, 80], gamma=0.1)
    elif name == "ExponentialLR":
        scheduler = Scheduler(optimizer, gamma=0.9)
    elif name == "CosineAnnealingLR":
        scheduler = Scheduler(optimizer, T_max=50)
    elif name == "ReduceLROnPlateau":
        scheduler = Scheduler(optimizer, mode='min', factor=0.1, patience=2, verbose=True)
    elif name == "CyclicLR":
        scheduler = Scheduler(optimizer, base_lr=0.001, max_lr=0.1, step_size_up=2000, 
                              mode='triangular2')

    # Training loop
    for epoch in range(1, 110, 10):
        for inputs, labels in trainloader:
            optimizer.zero_grad()  # Clear the gradients
            outputs = model(inputs)  # Forward pass
            loss = criterion(outputs, labels)  # Compute the loss
            loss.backward()  # Backward pass
            optimizer.step()  # Update the weights

        if name == "ReduceLROnPlateau":
            # Simulate validation loss calculation
            val_loss = loss.item()  #  We use training loss as validation loss

            # Step the scheduler based on validation loss
            scheduler.step(val_loss)
        else:
            # Step the scheduler at the end of each epoch
            scheduler.step()

        # Print the learning rate for the current epoch
        print(f'Epoch {epoch}, Learning Rate {scheduler.get_last_lr()[0]:.6f}')

The output is shown below:

 
StepLR
Epoch 1, Learning Rate 0.100000
Epoch 11, Learning Rate 0.010000
Epoch 21, Learning Rate 0.010000
Epoch 31, Learning Rate 0.001000
Epoch 41, Learning Rate 0.001000
Epoch 51, Learning Rate 0.000100
Epoch 61, Learning Rate 0.000100
Epoch 71, Learning Rate 0.000010
Epoch 81, Learning Rate 0.000010
Epoch 91, Learning Rate 0.000001
Epoch 101, Learning Rate 0.000001
MultiStepLR
Epoch 1, Learning Rate 0.100000
Epoch 11, Learning Rate 0.100000
Epoch 21, Learning Rate 0.100000
Epoch 31, Learning Rate 0.100000
Epoch 41, Learning Rate 0.010000
Epoch 51, Learning Rate 0.010000
Epoch 61, Learning Rate 0.010000
Epoch 71, Learning Rate 0.010000
Epoch 81, Learning Rate 0.010000
Epoch 91, Learning Rate 0.010000
Epoch 101, Learning Rate 0.010000
ExponentialLR
Epoch 1, Learning Rate 0.090000
Epoch 11, Learning Rate 0.081000
Epoch 21, Learning Rate 0.072900
Epoch 31, Learning Rate 0.065610
Epoch 41, Learning Rate 0.059049
Epoch 51, Learning Rate 0.053144
Epoch 61, Learning Rate 0.047830
Epoch 71, Learning Rate 0.043047
Epoch 81, Learning Rate 0.038742
Epoch 91, Learning Rate 0.034868
Epoch 101, Learning Rate 0.031381
CosineAnnealingLR
Epoch 1, Learning Rate 0.099901
Epoch 11, Learning Rate 0.099606
Epoch 21, Learning Rate 0.099114
Epoch 31, Learning Rate 0.098429
Epoch 41, Learning Rate 0.097553
Epoch 51, Learning Rate 0.096489
Epoch 61, Learning Rate 0.095241
Epoch 71, Learning Rate 0.093815
Epoch 81, Learning Rate 0.092216
Epoch 91, Learning Rate 0.090451
Epoch 101, Learning Rate 0.088526
ReduceLROnPlateau
Epoch 1, Learning Rate 0.100000
Epoch 11, Learning Rate 0.100000
Epoch 21, Learning Rate 0.100000
Epoch 31, Learning Rate 0.010000
Epoch 41, Learning Rate 0.010000
Epoch 51, Learning Rate 0.010000
Epoch 61, Learning Rate 0.010000
Epoch 71, Learning Rate 0.001000
Epoch 81, Learning Rate 0.001000
Epoch 91, Learning Rate 0.001000
Epoch 101, Learning Rate 0.001000
CyclicLR
Epoch 1, Learning Rate 0.001050
Epoch 11, Learning Rate 0.001099
Epoch 21, Learning Rate 0.001149
Epoch 31, Learning Rate 0.001198
Epoch 41, Learning Rate 0.001247
Epoch 51, Learning Rate 0.001297
Epoch 61, Learning Rate 0.001346
Epoch 71, Learning Rate 0.001396
Epoch 81, Learning Rate 0.001446
Epoch 91, Learning Rate 0.001495
Epoch 101, Learning Rate 0.001545

Conclusion

In this tutorial, we explored the implementation of learning rate schedulers in PyTorch model training. We reviewed several PyTorch schedulers and learned how to use them effectively in model training.

Learning rate schedulers in PyTorch dynamically adjust the learning rate during training to improve the model's performance and convergence.

DataTechNotes

Pages

Implementing Learning Rate Schedulers in PyTorch

No comments:

Post a Comment