The KNet CNN architecture to be discussed is built using the following elements: (1) an input layer, (2) a convolutional layer, (3) Nonlinearity function, (4) Pooling layer, (5) Fully connected layer, (6) Drop out layer, (7) Softmax layer. There are two layers in this model. The layers contain the following elements.
The input layer takes in a 28 x 28 image as a matrix with pixels normalized between 0 and 1.
The convolution layer is abbreviated as Conv
. In this layer, kernels on the input image are learnt. Its description includes three parts: number of channels, kernel spatial extent (kernel size), padding and stride size. In KNet, a kernel size (receptive field) of 5x5 with a stride of 1 pixel. Zero padding ('SAME' padding) is used. At the first layer, it computes 32 features for each 5x5 patch. At the second layer, it computes 64 features for each 5x5 patch.
The ReLU
stands for rectified linear unit and serves to introduce nonlinearity in the model. It is the activation function between the layers of the CNN (after convolution operation). The ReLU can be seen in Figure 3.0.
The pooling layer is abbreviated as Pool
. Pooling is a subsampling procedure that helps to reduce the dimensionality of each feature map and at the same time, the number of parameters required for the model. It removes a lot of information from the image yet helps agains overfitting. Max pooling (largest element from a window) is used, the kernel size is 2x2 and the stride is set as 2x2 in KNet. The pooling layers will reduce the input size by a factor 2 pixels. We make use of zero padding here ('SAME').
The fully connected layer is abbreviated as FC
". Is more or less a traditional multilayer perceptron and uses a softmax function at the output. Every neuron in the previous layer is connected to every neuron in the next layer. It uses the learned features to classify the input image into various classes based on the training dataset.
The dropout layer is abbreviated as Drop
". Dropout is a technique to improve the generalization of deep learning methods. It sets the weights connected to a certain percentage of nodes in the network to 0. In KNet, the drop out percentage is set to 50% in the dropout layer).
Softmax is abbreviated as \(\sigma(z_j)\).
Finally, KNet is trained with the MNIST Dataset, which is a hand digit recognition problem with 10 classes. The last fully connected layer outputs a length 10 vector for every input image and the softmax layer converts this length 10 vector into the estimated posterior probability for the 10 classes (i.e. digits 0 through 9).
Then the softmax cross entropy is defined and Adam optimizer is used to perform training with a Learning rate of 0.001. Epochs of 5000 were used with a batch size of 50. The weights of the kernels were initialized randomly from a trucated normal distribution with mean 0 and standard deviation 0.1.