Understanding Convolutional Layers in PyTorch
To create a convolutional layer in PyTorch, you must first import the necessary module:
import torch.nn as nn
Then, there is a two part process to defining a convolutional layer and defining the feedforward behavior of a model (how an input moves through the layers of a network). First, you must define a Model class and fill in two functions.
init
You can define a convolutional layer in the init function of by using the following format:
self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0)
forward
Then, you refer to that layer in the forward function! Here, I am passing in an input image x and applying a ReLU function to the output of this layer.
x = F.relu(self.conv1(x))
Arguments
You must pass the following arguments:
- in_channels - The number of inputs (in depth), 3 for an RGB image, for example.
- out_channels - The number of output channels, i.e. the number of filtered “images” a convolutional layer is made of or the number of unique, convolutional kernels that will be applied to an input.
- kernel_size - Number specifying both the height and width of the (square) convolutional kernel.
There are some additional, optional arguments that you might like to tune:
- stride - The stride of the convolution. If you don’t specify anything, stride is set to 1.
- padding - The border of 0’s around an input array. If you don’t specify anything, padding is set to 0.
Pooling Layers
Pooling layers take in a kernel_size and a stride. Typically the same value as is the down-sampling factor. For example, the following code will down-sample an input’s x-y dimensions, by a factor of 2:
self.pool = nn.MaxPool2d(2,2)
forward
Here, we see that poling layer being applied in the forward function.
x = F.relu(self.conv1(x))
x = self.pool(x)
Convolutional Example #1
Say I’m constructing a CNN, and my input layer accepts grayscale images that are 200 by 200 pixels (corresponding to a 3D array with height 200, width 200, and depth 1). Then, say I’d like the next layer to be a convolutional layer with 16 filters, each filter having a width and height of 2. When performing the convolution, I’d like the filter to jump two pixels at a time. I also don’t want the filter to extend outside of the image boundaries; in other words, I don’t want to pad the image with zeros. Then, to construct this convolutional layer, I would use the following line of code:
self.conv1 = nn.Conv2d(1, 16, 2, stride=2)
Convolutional Example #2
Say I’d like the next layer in my CNN to be a convolutional layer that takes the layer constructed in Example 1 as input. Say I’d like my new layer to have 32 filters, each with a height and width of 3. When performing the convolution, I’d like the filter to jump 1 pixel at a time. I want this layer to have the same width and height as the input layer, and so I will pad accordingly. Then, to construct this convolutional layer, I would use the following line of code:
self.conv2 = nn.Conv2d(16, 32, 3, padding=1)
Sequential Models
We can also create a CNN in PyTorch by using a Sequential wrapper in the init function. Sequential allows us to stack different types of layers, specifying activation functions in between!
def __init__(self):
super(ModelName, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(1, 16, 2, stride=2),
nn.MaxPool2d(2, 2),
nn.ReLU(True),
nn.Conv2d(16, 32, 3, padding=1),
nn.MaxPool2d(2, 2),
nn.ReLU(True)
)
Formula for number of parameters in a convolutional layer
The number of parameters in a convolutional layer depends on the supplied values of filters/out_channels, kernel_size, and input_shape. Let’s define a few variables:
- K - the number of filters in the convolutional layer
- F - the height and width of the convolutional filters
- D_in - the depth of the previous layer
Notice that K = out_channels, and F = kernel_size. Likewise, D_in is the last value in the input_shape tuple, typically 1 or 3 (RGB and grayscale, respectively).
Since there are FFD_in weights per filter, and the convolutional layer is composed of K filters, the total number of weights in the convolutional layer is K x F x F x D_in. Since there is one bias term per filter, the convolutional layer has K biases. Thus, the number of parameters in the convolutional layer is given by K x F x F x D_in + K.
Formula: Shape of a Convolutional Layer
The shape of a convolutional layer depends on the supplied values of kernel_size, input_shape, padding, and stride. Let’s define a few variables:
- K - the number of filters in the convolutional layer
- F - the height and width of the convolutional filters
- S - the stride of the convolution
- P - the padding
- W_in - the width/height (square) of the previous layer
Notice that K = out_channels, F = kernel_size, and S = stride. Likewise, W_in is the first and second value of the input_shape tuple.
The depth of the convolutional layer will always equal the number of filters K.
The spatial dimensions of a convolutional layer can be calculated as: (W_in−F+2P)/S+1
Flattening
Part of completing a CNN architecture, is to flatten the eventual output of a series of convolutional and pooling layers, so that all parameters can be seen (as a vector) by a linear classification layer. At this step, it is imperative that you know exactly how many parameters are output by a layer.
For the following quiz questions, consider an input image that is 130x130 (x, y) and 3 in depth (RGB). Say, this image goes through the following layers in order:
nn.Conv2d(3, 10, 3)
nn.MaxPool2d(4, 4)
nn.Conv2d(10, 20, 5, padding=2)
nn.MaxPool2d(2, 2)