Home > Tech > Content

Convolutional Neural Networks with PyTorch

Tech 1

6.1 From Fully Connected to Convolutional

Multilayer perceptrons are suitable for tabular data but not for high-dimensional perceptual data.

6.1.1 Invariance

6.1.2 Limitations of Multilayer Perceptrons

6.1.3 Convolution

Convolution measures the overlap between functions f and g when one is flipped and shifted by x. For discrete objects, the integral becomes a sum.

6.2 Image Convolution

6.2.1 Cross-Correlation Operation

Convolutional layers are misnamed because the operation they perform is actually cross-correlation, not convolution.

First, ignore the channel (third dimension) and see how to handle 2D image data and hidden representations. The shape of the convolution kernel window is determined by the kernel's height and width.

The output size is slightly smaller than the input size because the kernel's height and width are greater than 1. The output size is (input size - kernel size + 1) in both dimensions.

Next, implement this process in the corr2d function, which takes an input tensor X and a kernel tensor K and returns the output tensor Y.

import torch
from torch import nn
from d2l import torch as d2l

def corr2d(X, K):  #@save
    """Compute 2D cross-correlation"""
    h, w = K.shape
    Y = torch.zeros((X.shape[0] - h + 1, X.shape[1] - w + 1))
    for i in range(Y.shape[0]):
        for j in range(Y.shape[1]):
            Y[i, j] = (X[i:i + h, j:j + w] * K).sum()
    return Y

6.2.2 Convolutional Layer

Implement a 2D convolutional layer based on the corr2d function defined above. In the __init__ constructor, declare weight and bias as model parameters. The forward propagation function calls corr2d and adds the bias.

class Conv2D(nn.Module):
    def __init__(self, kernel_size):
        super().__init__()
        self.weight = nn.Parameter(torch.rand(kernel_size))
        self.bias = nn.Parameter(torch.zeros(1))

    def forward(self, x):
        return corr2d(x, self.weight) + self.bias

6.2.3 Edge Detection in Images

Here's a simple application of convolutional layers: detecting edges between different colors in an image by finding positions where pixel values change. First, construct a 6×8 pixel black-and-white image. The middle four columns are black (0), and the rest are white (1).

X = torch.ones((6, 8))
X[:, 2:6] = 0
"""
tensor([[1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.]])
"""

Next, construct a convolution kernel K of height 1 and width 2. During cross-correlation, if two horizontally adjacent elements are the same, the output is zero; otherwise, it's non-zero.

K = torch.tensor([[1.0, -1.0]])

Now, perform cross-correlation on X (input) and K (kernel). The output Y shows 1 for edges from white to black, -1 for edges from black to white, and 0 otherwise.

Y = corr2d(X, K)
"""
tensor([[ 0.,  1.,  0.,  0.,  0., -1.,  0.],
        [ 0.,  1.,  0.,  0.,  0., -1.,  0.],
        [ 0.,  1.,  0.,  0.,  0., -1.,  0.],
        [ 0.,  1.,  0.,  0.,  0., -1.,  0.],
        [ 0.,  1.,  0.,  0.,  0., -1.,  0.],
        [ 0.,  1.,  0.,  0.,  0., -1.,  0.]])
"""

6.2.4 Learning Convolution Kernels

Construct a convolutional layer with a randomly initialized kernel. Then, in each iteration, compare the output Y_hat with the target Y using squared error, compute gradients, and update the kernel. For simplicity, use PyTorch's built-in 2D convolutional layer and ignore the bias.

# Construct a 2D convolutional layer with 1 output channel and kernel size (1, 2), no bias
conv2d = nn.Conv2d(1, 1, kernel_size=(1, 2), bias=False)

# Use 4D input/output format (batch size, channels, height, width) with batch size and channels both 1
X = X.reshape((1, 1, 6, 8))
Y = Y.reshape((1, 1, 6, 7))
lr = 3e-2  # Learning rate

for i in range(10):
    Y_hat = conv2d(X)
    l = (Y_hat - Y) ** 2
    conv2d.zero_grad()
    l.sum().backward()
    # Update the kernel
    conv2d.weight.data[:] -= lr * conv2d.weight.grad
    if (i + 1) % 2 == 0:
        print(f'epoch {i+1}, loss {l.sum():.3f}')

6.2.5 Cross-Correlation vs. Convolution

Since convolution kernels are learned from data, the output of convolutional layers is unaffected whether they perform strict convolution or cross-correlation.

6.2.6 Feature Maps and Receptive Fields

The output of a convolutional layer is sometimes called a feature map because it acts as a transformer from input to spatial dimensions of the next layer. In a CNN, the receptive field of any element x in a layer refers to all elements from previous layers that could affect x during forward propagation.

The receptive field may be larger than the actual input size.

6.3 Padding and Stride

After applying consecutive convolutional layers, the output size may be much smaller than the input due to kernels larger then 1×1.

6.3.1 Padding

Adding padding (half on top/bottom and half on left/right) increases the output shape to (H + ph) × (W + pw). Often, ph = kh - 1 and pw = kw - 1.

6.3.2 Stride

When using vertical stride sh and horizontal stride sw, the output shape is [(H - kh + ph)/sh + 1] × [(W - kw + pw)/sw + 1].

6.4 Multiple Input and Output Channels

Color images have standard RGB channels. Input and hidden representations become 3D tensors with shape (channels, height, width). For example, RGB images have shape (3, H, W).

6.4.1 Multiple Input Channels

For multi-channel inputs, the convolution kernel must have the same number of input channels. Each input channel has a (kh × kw) kernel, and all channels are concatenated into a (ci, kh, kw) kernel.

import torch
from d2l import torch as d2l

def corr2d_multi_in(X, K):
    # Iterate over channel dimensions of X and K, then sum the results
    return sum(d2l.corr2d(x, k) for x, k in zip(X, K))

6.4.2 Multiple Output Channels

Increase output channels with depth to extract diverse features. The kernel shape is (co, ci, kh, kw).

def corr2d_multi_in_out(X, K):
    # Iterate over output channels of K, perform cross-correlation, and stack results
    return torch.stack([corr2d_multi_in(X, k) for k in K], 0)

6.4.3 1×1 Convolution Layers

1×1 convolutions lose the ability to detect spatial interactions but act as per-pixel fully connected layers, linearly combining channels.

def corr2d_multi_in_out_1x1(X, K):
    c_i, h, w = X.shape
    c_o = K.shape[0]
    X = X.reshape((c_i, h * w))
    K = K.reshape((c_o, c_i))
    Y = torch.matmul(K, X)
    return Y.reshape((c_o, h, w))

6.5 Pooling Layers

Pooling layers downsample feature maps to reduce parameters, computational complexity, and prevent overfitting.

6.5.1 Max and Average Pooling

Pooling layers have no learnable parameters.

import torch
from torch import nn
from d2l import torch as d2l

def pool2d(X, pool_size, mode='max'):
    p_h, p_w = pool_size
    Y = torch.zeros((X.shape[0] - p_h + 1, X.shape[1] - p_w + 1))
    for i in range(Y.shape[0]):
        for j in range(Y.shape[1]):
            if mode == 'max':
                Y[i, j] = X[i: i + p_h, j: j + p_w].max()
            elif mode == 'avg':
                Y[i, j] = X[i: i + p_h, j: j + p_w].mean()
    return Y

6.5.2 Padding and Stride

Use 4D input format (batch, channels, height, width).

X = torch.arange(16, dtype=torch.float32).reshape((1, 1, 4, 4))

# Default stride equals pool size
pool2d = nn.MaxPool2d(3)
print(pool2d(X))
# Output: tensor([[[[10.]]]])

# Custom pool size, stride, and padding
pool2d = nn.MaxPool2d((2, 3), stride=(2, 3), padding=(0, 1))
print(pool2d(X))

6.5.3 Multiple Channels

Pooling operates independently on each input channel, so output channels equal input channels.

6.6 LeNet

Using convolutional layers instead of fully connected layers makes models simpler and uses fewer parameters.

6.6.1 LeNet Architecture

import torch
from torch import nn
from d2l import torch as d2l

net = nn.Sequential(
    nn.Conv2d(1, 6, kernel_size=5, padding=2), nn.Sigmoid(),
    nn.AvgPool2d(kernel_size=2, stride=2),
    nn.Conv2d(6, 16, kernel_size=5), nn.Sigmoid(),
    nn.AvgPool2d(kernel_size=2, stride=2),
    nn.Flatten(),
    nn.Linear(16 * 5 * 5, 120), nn.Sigmoid(),
    nn.Linear(120, 84), nn.Sigmoid(),
    nn.Linear(84, 10))

# Test network structure
X = torch.rand(size=(1, 1, 28, 28), dtype=torch.float32)
for layer in net:
    X = layer(X)
    print(layer.__class__.__name__, 'output shape:	', X.shape)

6.6.2 Model Training

batch_size = 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size=batch_size)

def evaluate_accuracy_gpu(net, data_iter, device=None):
    """Compute accuracy using GPU"""
    if isinstance(net, nn.Module):
        net.eval()
        if not device:
            device = next(iter(net.parameters())).device
    metric = d2l.Accumulator(2)
    with torch.no_grad():
        for X, y in data_iter:
            if isinstance(X, list):
                X = [x.to(device) for x in X]
            else:
                X = X.to(device)
            y = y.to(device)
            metric.add(d2l.accuracy(net(X), y), y.numel())
    return metric[0] / metric[1]

def train_ch6(net, train_iter, test_iter, num_epochs, lr, device):
    """Train model using GPU"""
    def init_weights(m):
        if type(m) == nn.Linear or type(m) == nn.Conv2d:
            nn.init.xavier_uniform_(m.weight)
    net.apply(init_weights)
    print('training on', device)
    net.to(device)
    optimizer = torch.optim.SGD(net.parameters(), lr=lr)
    loss = nn.CrossEntropyLoss()
    animator = d2l.Animator(xlabel='epoch', xlim=[1, num_epochs],
                            legend=['train loss', 'train acc', 'test acc'])
    timer, num_batches = d2l.Timer(), len(train_iter)
    for epoch in range(num_epochs):
        metric = d2l.Accumulator(3)
        net.train()
        for i, (X, y) in enumerate(train_iter):
            timer.start()
            optimizer.zero_grad()
            X, y = X.to(device), y.to(device)
            y_hat = net(X)
            l = loss(y_hat, y)
            l.backward()
            optimizer.step()
            with torch.no_grad():
                metric.add(l * X.shape[0], d2l.accuracy(y_hat, y), X.shape[0])
            timer.stop()
            train_l = metric[0] / metric[2]
            train_acc = metric[1] / metric[2]
            if (i + 1) % (num_batches // 5) == 0 or i == num_batches - 1:
                animator.add(epoch + (i + 1) / num_batches,
                             (train_l, train_acc, None))
        test_acc = evaluate_accuracy_gpu(net, test_iter)
        animator.add(epoch + 1, (None, None, test_acc))
    print(f'loss {train_l:.3f}, train acc {train_acc:.3f}, ' \
          f'test acc {test_acc:.3f}')
    print(f'{metric[2] * num_epochs / timer.sum():.1f} examples/sec ' \
          f'on {str(device)}')

lr, num_epochs = 0.9, 10
train_ch6(net, train_iter, test_iter, num_epochs, lr, d2l.try_gpu())

Tags: pytorch LeNet

Back to List

Prev: Local Deployment Guide for ChatGLM3-6B Bilingual Language Model

Next: Understanding Tile-Based Mapping Systems

Fading Coder

Convolutional Neural Networks with PyTorch

6.1 From Fully Connected to Convolutional

6.1.1 Invariance

6.1.2 Limitations of Multilayer Perceptrons

6.1.3 Convolution

6.2 Image Convolution

6.2.1 Cross-Correlation Operation

6.2.2 Convolutional Layer

6.2.3 Edge Detection in Images

6.2.4 Learning Convolution Kernels

6.2.5 Cross-Correlation vs. Convolution

6.2.6 Feature Maps and Receptive Fields

6.3 Padding and Stride

6.3.1 Padding

6.3.2 Stride

6.4 Multiple Input and Output Channels

6.4.1 Multiple Input Channels

6.4.2 Multiple Output Channels

6.4.3 1×1 Convolution Layers

6.5 Pooling Layers

6.5.1 Max and Average Pooling

6.5.2 Padding and Stride

6.5.3 Multiple Channels

6.6 LeNet

6.6.1 LeNet Architecture

6.6.2 Model Training

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a Comment

Copyright © fadingcoder.top

Fading Coder

Convolutional Neural Networks with PyTorch

6.1 From Fully Connected to Convolutional

6.1.1 Invariance

6.1.2 Limitations of Multilayer Perceptrons

6.1.3 Convolution

6.2 Image Convolution

6.2.1 Cross-Correlation Operation

6.2.2 Convolutional Layer

6.2.3 Edge Detection in Images

6.2.4 Learning Convolution Kernels

6.2.5 Cross-Correlation vs. Convolution

6.2.6 Feature Maps and Receptive Fields

6.3 Padding and Stride

6.3.1 Padding

6.3.2 Stride

6.4 Multiple Input and Output Channels

6.4.1 Multiple Input Channels

6.4.2 Multiple Output Channels

6.4.3 1×1 Convolution Layers

6.5 Pooling Layers

6.5.1 Max and Average Pooling

6.5.2 Padding and Stride

6.5.3 Multiple Channels

6.6 LeNet

6.6.1 LeNet Architecture

6.6.2 Model Training

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a CommentCancel Reply

Copyright © fadingcoder.top

Leave a Comment