(i) Convolutional layer:

The convolutional layers are characterized by the use of parameterized filters, usually 3x3 or 5x5 of size, which move across the image. For each part of the image covered by the filter, a value is calculated based on the filter's convolution operation. The advantage of this process is that it is possible to reduce the parameters across the network whereas keeping the information of nearby voxels. After the filters have passed over the image, a feature map is generated for each filter.

(ii) Activation layer:

The feature maps from a convolutional layer are fed through nonlinear activation functions. The activation functions are generally the very simple rectified linear units, or ReLUs, defined as ReLU(z) = max(0,z), or variants like leaky ReLUs or parametric ReLUs. Feeding the feature maps through an activation function produces new feature maps.

(iii) Pooling layers:

The output features maps can be sensitive to the location of the features in the input. To overcome this problem, pooling layers have been introduced. Pooling layers select specific values on the features maps and pass through the subsequent layers.  This has the effect of making the resulting down-sampled feature maps more robust to changes in the position of the feature in the image, referred to by the technical phrase “local translation invariance.” Two common pooling methods are average pooling and max pooling that summarize the average presence of a feature and the most activated presence of a feature respectively. A different way of getting the downsampling effect of pooling is to use convolutions with increased stride lengths (the amount by which the filter shifts). Removing the pooling  layers simplifies the network architecture without necessarily sacrificing performance.

(iv) Dropout regularization:

Dropout is used in the network to help avoiding overfitting. If the weights of the network are so tuned to the training examples they are given, the network doesn’t perform well when given new examples.
 In the dropout layer, basically a random set of activations is "drop out" in that layer by setting them to zero. By randomly removing neurons during training one ends up using slightly different networks for each batch of trainingdata, and the weightsof the trained network are tuned based on the optimization of multiple variations of the network. It makes sure that the network isn’t getting too “fitted” to the training data and thus helps alleviate the overfitting problem. An important note is that this layer is only used during training, and not during test time.

(v) Batch normalization:

These layers are typically placed after activation layers, producing normalized activation maps by subtracting the mean and dividing by the standard deviation for each training batch. Including batch normalization layers forces the network to periodically change its activations to zero mean and unit standard deviation as the training batch hits these layers, which works as a regularizer for the network, speeds up training, and makes it less dependent on careful parameter initialization.

CNNs architecture

CNNs can be designed by using different combinations of all these components. It depends on the task that need to be solved, on the complexity of the data, etc. Specific combinations were studied and they are known as for example Lenet, AlexNet, ResNet, etc [Table and article]
These neural networks are typically implemented in one or more of a small number of software frameworks that dominates machine learning research, all built on top of NVIDIA’s CUDA platform and the cuDNN library. Today’s deep learning methods are almost exclusively implemented in either Tensor-Flow, a framework originating from Google Research, Keras, a deep learning library originally built by Fran cois Cholletand recently incorporated in TensorFlow, or Pytorch, a frame-work associated with Facebook Research.