FIGURE
4 The 400 filters in the first layer of sparse RBM from the van Hateren
raw image dataset
Convolutional RBM (CRBM) was
proposed to tackle the curse of dimensionality to alleviate the
computational cost. Another reason for adopting the CRBM is to keep the
2D structure of images, which is not preserved in traditional RBM-DBN.
In CRBM, the weights between hidden and visible layers are shared over
the input image. In a study, multiple CRBM stacked one over another with
max-pooling would result in convolutional DBN (CDBN). The filter size
was chosen 10x10 for both layers, with the first layer having 24 groups
(bases) while the second layer has 100. The visualization of filters for
both layers is shown in Figure 5. It was observed that features learned
in deeper layers were high-level and more specific to a particular
object category. Based on the
CRBM, a deconvolutional network was proposed with only the decoder
section, unlike traditional Encoder-decoder architecture.
For rich feature learning, Sparse
decomposition using convex \(l_{1}\) sparsity term was applied over the
whole input image over the small image patches. The cost function was
minimized over filters and feature maps during the learning. The process
follows layer-wise starting from the first layer. The filters were 9,
45, and 150 for the first three layers. Notably, the filters from the
first layer ensemble the outcome of the Gabor filters. The filter size
was not discussed; however, it is mentioned to have a small size to
avoid slow implementation of the proposed method. For larger filter
size, FFT was recommended over spatial convolution for a faster process.