Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Implementing a Multilayer Perceptron from Scratch

Tech May 15 1

This section details the implementation of a Multilayer Perceptron (MLP) from the ground up. We begin by importing necessary libraries:

import torch
import numpy as np
import sys
sys.path.append("..\..") # Adjust path as necessary for your project structure
import d2lzh_pytorch as d2l

Data Loading

We will utilize the Fashion-MNIST dataset for image classification tasks. The following code snippet loads the data in batches:

batch_size = 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)

Model Parameters

The Fashion-MNIST dataset consists of images with dimensions $28 \times 28$ pixels and 10 distinct classes. Each image is flattened into a vector of length $28 \times 28 = 784$. Consequently, the input layer has 784 features, and the output layer has 10 classes. We configure a hidden layer with 256 units.

num_inputs, num_outputs, num_hiddens = 784, 10, 256

# Initialize weights and biases for the hidden layer
weight_hidden = torch.tensor(np.random.normal(0, 0.01, (num_inputs, num_hiddens)), dtype=torch.float)
bias_hidden = torch.zeros(num_hiddens, dtype=torch.float)

# Initialize weights and biases for the output layer
weight_output = torch.tensor(np.random.normal(0, 0.01, (num_hiddens, num_outputs)), dtype=torch.float)
bias_output = torch.zeros(num_outputs, dtype=torch.float)

# Store parameters and enable gradient computation
parameters = [weight_hidden, bias_hidden, weight_output, bias_output]
for param in parameters:
    param.requires_grad_(requires_grad=True)

Activation Function

We implement the Rectified Linear Unit (ReLU) activation function using torch.max for a custom implementation.

def relu_activation(x):
    return torch.max(input=x, other=torch.tensor(0.0))

Model Definition

The MLP model takes input images, flattens them into vectors, and passes them through a hidden layer with ReLU activation, followed by an output layer.

def mlp_network(x):
    # Flatten the input image to a vector
    x = x.view((-1, num_inputs))
    # Hidden layer computation with ReLU activation
    hidden_layer = relu_activation(torch.matmul(x, weight_hidden) + bias_hidden)
    # Output layer computation
    return torch.matmul(hidden_layer, weight_output) + bias_output

Loss Function

For numerical stability and convenience, we employ PyTorch's built-in CrossEntropyLoss, which combines the softmax operation and the cross-entropy loss calculation.

loss_criterion = torch.nn.CrossEntropyLoss()

Model Training

The training process for the MLP is analogous to that of the Softmax Regression model. We leverage the train_ch3 function from the d2lzh_pytorch library. The following hyperparameters are set: 5 epochs and a learning rate of 100.0.

Note: The original MXNet implementation of SoftmaxCrossEntropyLoss sums losses across the batch dimension, while PyTorch's default averages them. This discrepancy results in smaller loss and gradients in PyTorch. To compensate and achieve comparable learning outcomes, the learning rate is scaled significantly. The original learning rate was 0.5; here, it's set to 100.0. This large value might be further influenced by the sgd function in d2lzh_pytorch dividing by the batch size, which is already handled by PyTorch's loss averaging.

num_epochs, learning_rate = 5, 100.0
d2l.train_ch3(mlp_network, train_iter, test_iter, loss_criterion, num_epochs, batch_size, parameters, learning_rate)

Output:

epoch 1, loss 0.0030, train acc 0.714, test acc 0.753
epoch 2, loss 0.0019, train acc 0.821, test acc 0.777
epoch 3, loss 0.0017, train acc 0.842, test acc 0.834
epoch 4, loss 0.0015, train acc 0.857, test acc 0.839
epoch 5, loss 0.0014, train acc 0.865, test acc 0.845

Summary

  • Simple MLPs can be implemented by manually defining the model architecture and its parameters.
  • This manual approach becomes cumbersome for deeper networks, particularly during parameter initialization.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.