Home > Tech > Content

Implementing a Multilayer Perceptron from Scratch

Tech May 15 1

This section details the implementation of a Multilayer Perceptron (MLP) from the ground up. We begin by importing necessary libraries:

import torch
import numpy as np
import sys
sys.path.append("..\..") # Adjust path as necessary for your project structure
import d2lzh_pytorch as d2l

Data Loading

We will utilize the Fashion-MNIST dataset for image classification tasks. The following code snippet loads the data in batches:

batch_size = 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)

Model Parameters

The Fashion-MNIST dataset consists of images with dimensions $28 \times 28$ pixels and 10 distinct classes. Each image is flattened into a vector of length $28 \times 28 = 784$. Consequently, the input layer has 784 features, and the output layer has 10 classes. We configure a hidden layer with 256 units.

num_inputs, num_outputs, num_hiddens = 784, 10, 256

# Initialize weights and biases for the hidden layer
weight_hidden = torch.tensor(np.random.normal(0, 0.01, (num_inputs, num_hiddens)), dtype=torch.float)
bias_hidden = torch.zeros(num_hiddens, dtype=torch.float)

# Initialize weights and biases for the output layer
weight_output = torch.tensor(np.random.normal(0, 0.01, (num_hiddens, num_outputs)), dtype=torch.float)
bias_output = torch.zeros(num_outputs, dtype=torch.float)

# Store parameters and enable gradient computation
parameters = [weight_hidden, bias_hidden, weight_output, bias_output]
for param in parameters:
    param.requires_grad_(requires_grad=True)

Activation Function

We implement the Rectified Linear Unit (ReLU) activation function using torch.max for a custom implementation.

def relu_activation(x):
    return torch.max(input=x, other=torch.tensor(0.0))

Model Definition

The MLP model takes input images, flattens them into vectors, and passes them through a hidden layer with ReLU activation, followed by an output layer.

def mlp_network(x):
    # Flatten the input image to a vector
    x = x.view((-1, num_inputs))
    # Hidden layer computation with ReLU activation
    hidden_layer = relu_activation(torch.matmul(x, weight_hidden) + bias_hidden)
    # Output layer computation
    return torch.matmul(hidden_layer, weight_output) + bias_output

Loss Function

For numerical stability and convenience, we employ PyTorch's built-in CrossEntropyLoss, which combines the softmax operation and the cross-entropy loss calculation.

loss_criterion = torch.nn.CrossEntropyLoss()

Model Training

The training process for the MLP is analogous to that of the Softmax Regression model. We leverage the train_ch3 function from the d2lzh_pytorch library. The following hyperparameters are set: 5 epochs and a learning rate of 100.0.

Note: The original MXNet implementation of SoftmaxCrossEntropyLoss sums losses across the batch dimension, while PyTorch's default averages them. This discrepancy results in smaller loss and gradients in PyTorch. To compensate and achieve comparable learning outcomes, the learning rate is scaled significantly. The original learning rate was 0.5; here, it's set to 100.0. This large value might be further influenced by the sgd function in d2lzh_pytorch dividing by the batch size, which is already handled by PyTorch's loss averaging.

num_epochs, learning_rate = 5, 100.0
d2l.train_ch3(mlp_network, train_iter, test_iter, loss_criterion, num_epochs, batch_size, parameters, learning_rate)

Output:

epoch 1, loss 0.0030, train acc 0.714, test acc 0.753
epoch 2, loss 0.0019, train acc 0.821, test acc 0.777
epoch 3, loss 0.0017, train acc 0.842, test acc 0.834
epoch 4, loss 0.0015, train acc 0.857, test acc 0.839
epoch 5, loss 0.0014, train acc 0.865, test acc 0.845

Summary

Simple MLPs can be implemented by manually defining the model architecture and its parameters.
This manual approach becomes cumbersome for deeper networks, particularly during parameter initialization.

Tags: Deep Learning

Back to List

Prev: Deploying Open-Source LLMs in FISMA-Compliant Environments: A Practical Approach

Next: Converting JSON Data to Excel Worksheets Using Python

Fading Coder

Implementing a Multilayer Perceptron from Scratch

Data Loading

Model Parameters

Activation Function

Model Definition

Loss Function

Model Training

Summary

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a Comment

Copyright © fadingcoder.top

Fading Coder

Implementing a Multilayer Perceptron from Scratch

Data Loading

Model Parameters

Activation Function

Model Definition

Loss Function

Model Training

Summary

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a CommentCancel Reply

Copyright © fadingcoder.top

Leave a Comment