FIGURE 4 The 400 filters in the first layer of sparse RBM from the van Hateren raw image dataset
Convolutional RBM (CRBM) was proposed to tackle the curse of dimensionality to alleviate the computational cost. Another reason for adopting the CRBM is to keep the 2D structure of images, which is not preserved in traditional RBM-DBN. In CRBM, the weights between hidden and visible layers are shared over the input image. In a study, multiple CRBM stacked one over another with max-pooling would result in convolutional DBN (CDBN). The filter size was chosen 10x10 for both layers, with the first layer having 24 groups (bases) while the second layer has 100. The visualization of filters for both layers is shown in Figure 5. It was observed that features learned in deeper layers were high-level and more specific to a particular object category. Based on the CRBM, a deconvolutional network was proposed with only the decoder section, unlike traditional Encoder-decoder architecture. For rich feature learning, Sparse decomposition using convex \(l_{1}\) sparsity term was applied over the whole input image over the small image patches. The cost function was minimized over filters and feature maps during the learning. The process follows layer-wise starting from the first layer. The filters were 9, 45, and 150 for the first three layers. Notably, the filters from the first layer ensemble the outcome of the Gabor filters. The filter size was not discussed; however, it is mentioned to have a small size to avoid slow implementation of the proposed method. For larger filter size, FFT was recommended over spatial convolution for a faster process.