Understanding 1D, 2D, and 3D Convolution Layers
Understanding dilation in convolution operations:
https://blog.csdn.net/weixin_42363544/article/details/123920699 Dilated convolution, also known as hole convolution.
In PyTorch, a dilation value of 1 corresponds to a standard convolution without dilation. When dilation is 1, each element in the kernel is adjacent to one another. When dilation is 2, there is one element between each element in the kernel. Adjusting the dilation value controls the range of features extracted by the convolution operation and the size of the receptive field, wich affects the network's perforamnce and feature extraction capabilities.
1D Convolution Layer
PyTorch documentation for 1D convolution: https://pytorch.org/docs/stable/generated/torch.nn.Conv1d.html
import torch.nn as nn
import torch
m = nn.Conv1d(16, 33, 3, stride=2)
input = torch.randn(20, 16, 50)
print(input.shape)
output = m(input)
print(output.shape)
Output:
Let's understand the parameters and input for this 1D convolution operation.
The parameters for the layer nn.Conv1d(16, 33, 3, stride=2) are as follows:
Input channels: 16 Output channels: 33 Kernel size: 3 Stride: 2
The input tensor has a shape of (20, 16, 50), meaning 20 samples, 16 channels per sample, and a length of 50 per channel.
After the convolution operation, the output tensor shape can be calculated as follows:
Output length = (input length - kernel size) // stride + 1
Here, the input length is 50, the kernel size is 3, and the stride is 2, so the output length is (50 - 3) // 2 + 1 = 24.
Therefore, the output tensor shape will be (number of samples, number of output channels, output length), i.e., (20, 33, 24).
Combining the formula above, the default dilation value is 1.
2D Convolution Layer
PyTorch documentation: https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html
import torch.nn as nn
import torch
# With square kernels and equal stride
m = nn.Conv2d(16, 33, 3, stride=2)
input = torch.randn(20, 16, 50, 100)
print("1:")
print(input.shape)
output = m(input)
print(output.shape)
# non-square kernels and unequal stride and with padding
m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2))
input = torch.randn(20, 16, 50, 100)
print("2:")
print(input.shape)
output = m(input)
print(output.shape)
# non-square kernels and unequal stride and with padding and dilation
m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2), dilation=(3, 1))
input = torch.randn(20, 16, 50, 100)
print("3:")
print(input.shape)
output = m(input)
print(output.shape)
Output:
Calculation process:
3D Convolution Layer
Official documentation: htps://pytorch.org/docs/stable/generated/torch.nn.Conv3d.html
import torch.nn as nn
import torch
# With square kernels and equal stride
m = nn.Conv3d(16, 33, 3, stride=2)
input = torch.randn(20, 16, 10, 50, 100)
print("1:")
print(input.shape)
output = m(input)
print(output.shape)
# non-square kernels and unequal stride and with padding
m = nn.Conv3d(16, 33, (3, 5, 2), stride=(2, 1, 1), padding=(4, 2, 0))
input = torch.randn(20, 16, 10, 50, 100)
print("2:")
print(input.shape)
output = m(input)
print(output.shape)
Output:
Calculation process: