1 | INTRODUCTION
The convolution neural Network (CNN) has become the pioneer method for applying the neural network concept to computer vision tasks. The advancement of the basic CNN-LeNet-5 (5 layers) model to more profound and complex models like AlexNet (8 layers), VGG (11-19 layers), ResNet (152 layers), GoogleNet (22 layers) have achieved superior efficiency in real-life applications. Such models are called deep CNN (DCNN), the combinations of deep learning structure and CNN. Even though DCNN is based on specific mathematical models, due to its more profound and complex design and nonlinear activation functions, the fundamental insight remains as a Blackbox.
The fundamental constituents of DCNN are filters, activation functions, and classifiers. The structure is divided into two major sections: feature extraction and classification. Filters are often known as weights that do the actual magic when it comes to learning, and it is very critical to understand the exact end-to-end processing in them. In all the layers of DCNN, different filters perform convolution operations with the input to that layer for feature extraction. Activation functions control the dynamics of information flow from one layer to the next. A standard ML algorithm classifier categorizes the classes based on learned features through convolution. It is more suitable to say that the primary design attributes of filters are unknown, along with chosen training algorithm and activation functions. The interrelation of filter hyper-parameters, training algorithms, and activation functions lack concrete theoretical background for DCNN.