Max pooling is used in the pooling kernel size is always [1,2,2,1] and the stride is always [1,2,2,1] in KNet. The pooling layers will reduce the input size by a factor 2 pixels. We make use of zero padding here ('SAME').
The fully connected layer is abbreviated as \FC". Its size is shown in the format n1 _n2, where n1 is the size of the input tensor, and n2 is the size of the output tensor. Although n1 can be a triplet (such as 7 _ 7 _ 512, n2 is always an integer.
The dropout layer is abbreviated as \Drop". Dropout is a technique to improve the generalization of deep learning methods. It sets the weights connected to a certain percentage of nodes in the network to 0. In KNet, the percentage is set to 0.5 in the single dropout layers).
Softmax It is abbreviated as \(\sigma(z_j)\).
Finally, KNet is trained with the MNIST Dataset, which is a hand digit recognition problem with 10 classes. The last fully connected layer (4096?? _ 10) outputs a length 10 vector for every input image and the softmax layer converts this length 10 vector into the estimated
posterior probability for the 10 classes (i.e. digits 0 through 9).