Implementing a Multilayer Perceptron from Scratch
This section details the implementation of a Multilayer Perceptron (MLP) from the ground up. We begin by importing necessary libraries:
import torch
import numpy as np
import sys
sys.path.append("..\..") # Adjust path as necessary for your project structure
import d2lzh_pytorch as d2l
Data Loading
We will utilize the Fashion-MNIST dataset for image classification tasks. The following code snippet loads the data in batches:
batch_size = 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)
Model Parameters
The Fashion-MNIST dataset consists of images with dimensions $28 \times 28$ pixels and 10 distinct classes. Each image is flattened into a vector of length $28 \times 28 = 784$. Consequently, the input layer has 784 features, and the output layer has 10 classes. We configure a hidden layer with 256 units.
num_inputs, num_outputs, num_hiddens = 784, 10, 256
# Initialize weights and biases for the hidden layer
weight_hidden = torch.tensor(np.random.normal(0, 0.01, (num_inputs, num_hiddens)), dtype=torch.float)
bias_hidden = torch.zeros(num_hiddens, dtype=torch.float)
# Initialize weights and biases for the output layer
weight_output = torch.tensor(np.random.normal(0, 0.01, (num_hiddens, num_outputs)), dtype=torch.float)
bias_output = torch.zeros(num_outputs, dtype=torch.float)
# Store parameters and enable gradient computation
parameters = [weight_hidden, bias_hidden, weight_output, bias_output]
for param in parameters:
param.requires_grad_(requires_grad=True)
Activation Function
We implement the Rectified Linear Unit (ReLU) activation function using torch.max for a custom implementation.
def relu_activation(x):
return torch.max(input=x, other=torch.tensor(0.0))
Model Definition
The MLP model takes input images, flattens them into vectors, and passes them through a hidden layer with ReLU activation, followed by an output layer.
def mlp_network(x):
# Flatten the input image to a vector
x = x.view((-1, num_inputs))
# Hidden layer computation with ReLU activation
hidden_layer = relu_activation(torch.matmul(x, weight_hidden) + bias_hidden)
# Output layer computation
return torch.matmul(hidden_layer, weight_output) + bias_output
Loss Function
For numerical stability and convenience, we employ PyTorch's built-in CrossEntropyLoss, which combines the softmax operation and the cross-entropy loss calculation.
loss_criterion = torch.nn.CrossEntropyLoss()
Model Training
The training process for the MLP is analogous to that of the Softmax Regression model. We leverage the train_ch3 function from the d2lzh_pytorch library. The following hyperparameters are set: 5 epochs and a learning rate of 100.0.
Note: The original MXNet implementation of SoftmaxCrossEntropyLoss sums losses across the batch dimension, while PyTorch's default averages them. This discrepancy results in smaller loss and gradients in PyTorch. To compensate and achieve comparable learning outcomes, the learning rate is scaled significantly. The original learning rate was 0.5; here, it's set to 100.0. This large value might be further influenced by the sgd function in d2lzh_pytorch dividing by the batch size, which is already handled by PyTorch's loss averaging.
num_epochs, learning_rate = 5, 100.0
d2l.train_ch3(mlp_network, train_iter, test_iter, loss_criterion, num_epochs, batch_size, parameters, learning_rate)
Output:
epoch 1, loss 0.0030, train acc 0.714, test acc 0.753
epoch 2, loss 0.0019, train acc 0.821, test acc 0.777
epoch 3, loss 0.0017, train acc 0.842, test acc 0.834
epoch 4, loss 0.0015, train acc 0.857, test acc 0.839
epoch 5, loss 0.0014, train acc 0.865, test acc 0.845
Summary
- Simple MLPs can be implemented by manually defining the model architecture and its parameters.
- This manual approach becomes cumbersome for deeper networks, particularly during parameter initialization.