Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Implementing LeNet-5 for Image Classification with PyTorch

Tech May 19 2

Unlike fully-connected networks that flatten images into vectors and lose spatial relationships, convolutional architectures preserve the 2-D structure and dramatically reduce parameter count. LeNet-5, introduced by Yann LeCun at AT&T Bell Labs in 1989, was the first successful CNN trained with back-propagation and became the backobne for early ATM cheque-digit recognition systems—some of those machines still run the original 1990s code today.

Network Architecture

LeNet-5 is split into two conceptual blocks:

  • Convolutional encoder: two convolution stages.
  • Dense classifier: three fully-connected layers.

Each convolution stage contains a 5 × 5 convolution, sigmoid activation, and 2 × 2 average-pooling with stride 2. The first convolution outputs 6 feature maps; the second outputs 16. After the second pooling layer, the 3-D tensor is flattened into a vector and fed into three linear layers with 120, 84, and 10 units respectively. The final 10-unit layer corresponds to the ten Fashion-MNIST classes.

PyTorch Implementation

import torch
from torch import nn

lenet = nn.Sequential(
    # stage 1
    nn.Conv2d(1, 6, kernel_size=5, padding=2),  # 28×28 → 28×28
    nn.Sigmoid(),
    nn.AvgPool2d(2, stride=2),                  # 28×28 → 14×14

    # stage 2
    nn.Conv2d(6, 16, kernel_size=5),            # 14×14 → 10×10
    nn.Sigmoid(),
    nn.AvgPool2d(2, stride=2),                  # 10×10 → 5×5

    # classifier
    nn.Flatten(),                               # 16×5×5 = 400
    nn.Linear(400, 120),
    nn.Sigmoid(),
    nn.Linear(120, 84),
    nn.Sigmoid(),
    nn.Linear(84, 10)
)

We removed the original Gaussian RBF layer and replaced it with a plain linear layer for simplicity.

Shape Propagation Check

x = torch.randn(1, 1, 28, 28)
for layer in lenet:
    x = layer(x)
    print(f"{layer.__class__.__name__:12} -> {x.shape}")

Expected trace:

Conv2d        -> torch.Size([1, 6, 28, 28])
Sigmoid       -> torch.Size([1, 6, 28, 28])
AvgPool2d     -> torch.Size([1, 6, 14, 14])
Conv2d        -> torch.Size([1, 16, 10, 10])
Sigmoid       -> torch.Size([1, 16, 10, 10])
AvgPool2d     -> torch.Size([1, 16, 5, 5])
Flatten       -> torch.Size([1, 400])
Linear        -> torch.Size([1, 120])
Sigmoid       -> torch.Size([1, 120])
Linear        -> torch.Size([1, 84])
Sigmoid       -> torch.Size([1, 84])
Linear        -> torch.Size([1, 10])

Training Loop on Fashion-MNIST

from d2l import torch as d2l

batch_size = 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size=batch_size)

def accuracy_gpu(net, data_iter, device=None):
    if isinstance(net, nn.Module):
        net.eval()
        device = next(iter(net.parameters())).device if device is None else device
    metric = d2l.Accumulator(2)
    with torch.no_grad():
        for X, y in data_iter:
            X, y = X.to(device), y.to(device)
            metric.add(d2l.accuracy(net(X), y), y.numel())
    return metric[0] / metric[1]

def train(net, train_iter, test_iter, epochs, lr, device):
    def init_weights(m):
        if isinstance(m, (nn.Conv2d, nn.Linear)):
            nn.init.xavier_uniform_(m.weight)
    net.apply(init_weights)
    net.to(device)
    optimizer = torch.optim.SGD(net.parameters(), lr=lr)
    loss_fn = nn.CrossEntropyLoss()
    animator = d2l.Animator(xlabel='epoch', xlim=[1, epochs],
                            legend=['train loss', 'train acc', 'test acc'])
    timer, batches = d2l.Timer(), len(train_iter)
    for epoch in range(epochs):
        metric = d2l.Accumulator(3)
        net.train()
        for i, (X, y) in enumerate(train_iter):
            timer.start()
            optimizer.zero_grad()
            X, y = X.to(device), y.to(device)
            y_hat = net(X)
            l = loss_fn(y_hat, y)
            l.backward()
            optimizer.step()
            metric.add(l * X.shape[0], d2l.accuracy(y_hat, y), X.shape[0])
            timer.stop()
            train_l = metric[0] / metric[2]
            train_acc = metric[1] / metric[2]
            if (i + 1) % (batches // 5) == 0 or i == batches - 1:
                animator.add(epoch + (i + 1) / batches,
                             (train_l, train_acc, None))
        test_acc = accuracy_gpu(net, test_iter)
        animator.add(epoch + 1, (None, None, test_acc))
    print(f'loss {train_l:.3f}, train acc {train_acc:.3f}, test acc {test_acc:.3f}')
    print(f'{metric[2] * epochs / timer.sum():.1f} examples/sec on {device}')

lr, epochs = 0.9, 10
train(lenet, train_iter, test_iter, epochs, lr, d2l.try_gpu())

Typical results on a GPU:

loss 0.473, train acc 0.822, test acc 0.795
51744.9 examples/sec on cuda:0
Tags: LeNet-5

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.