FIGURE 12 DSOM (left) and E-DSOM (Right). SOMs act as filters
When traditional SOM is used as a filter and convolved on the input
image, it was observed that the SOM tries to fit the dataset best which
may lead to poor performance. To overcome this issue, the Hebbian
learning-based masking layer was multiplied with the input patches
before convolution with SOM maps (filter). The two-layered architecture
used 10x10x1 and 16x16x1 sized SOM maps for the first and second layers,
respectively. In three-layered architecture, the layer-wise SOM map
sizes were selected as 12x12x3, 14x14x3, and 16x16x3. On a trained SOM
map, a three-layer MLP classifier was used. It was also observed that
more than one filter is required to learn enough distinct features.
Also, multiple smaller maps are better than fewer large maps. Valued-SOM
(VSOM) was proposed as an improvised version of DSOM with the
introduction of the Lethe term to each output neuron. The purpose of
this mechanism is to introduce a supervision mechanism for self-labeling
to the clusters created by DSOM. However, the filter design was borrowed
from the original DSOM. Kosmas et al. proposed Dendritic-S method, which
uses SOM as feature extraction filters followed by a hit matrix for
labeling. The cosine similarity was applied and compared with Euclidian
distance in determining the BMU, and accuracy was improved by nearly
20% in the experiments.
SOM maps are known to have 1D and 2D maps, but a 3D SOM map as a filter
was introduced in a model named deep convolutional self-organizing map
(DCSOM). A total of 256 nodes were applied in various dimensions ranging
from 1D to 6D. The 4D(4x4x4x4) was noted as the optimum map size
balancing performance and complexity. The map dimensions higher than
four resulted in overfitting. The input patch size of 5x5 was found
optimal from 3x3 to 15x15. The other focus of the research was the
radius of the neighborhood of the SOM map and the batch learning
technic. The two-layered convolutional model was followed by a
block-wise histogram for feature representation. A computationally
effective, Unsupervised-DSOM (UDSOM) method was proposed in which the
SOM maps were passed through ReLU activation after the learning phase.
The aim is to remove the neurons from the maps that never get activated
or become BMU, which could lead to fewer connections. The map size was
chosen in the four-layered model as 10x10, 8x8, 6x6, and 4x4,
respectively. The smaller-sized filters performed better for
higher-level features. A faster version of UDSOM was proposed as
G-UDSOM, which performed parallel processing of locating BMU for patches
over different maps. However, the UDSOM was used as the backbone
architecture, and there is no modification to the filter sizes (map).
4.2.3 | Sub-space learning (SSL)
The implementation of sub-space learning in object identification has
gained much attention recently. The core of those proposed architectures
is principal component analysis (PCA) which comes under unsupervised
learning and is mainly used for dimensionality reduction.
The subspace approximation and kernel augmentation (Saak) is an earlier
proposed SSL-based algorithm Saak is a one-pass feed-forward method that
was proposed as a solution to the limitations of the older RECOS
(REctified-COrrelations on a Sphere) method. The backpropagation and
nonlinear activation functions ReLU in the RECOS method cause
approximation and rectification losses, respectively. The structural
insight of the Saak method is shown in Figure 13. Filters are generated
in Saak using the subspace approximation with second-order statistics
and orthonormal eigenvectors of the covariance matrix. The filters are
based on truncated Karhunen-Loeve transform (KLT) or PCA, which are the
unit eigenvectors of the data covariance matrix. Such filters are
automated and generated based on the dataset, and their size can be
varied. Saak has non-overlapping convolutional operations with filters
and patches of size 2x2. ReLU is a widely used activation function where
the negative inputs are truncated to zero, resulting in rectification
loss. In the Saak, all the kernels are augmented with their opposite
counterpart. When the original kernels and their augmented parts pass
through ReLU, the positive sides survive, which can be either original
or augmented. This mechanism would result in no rectification loss. Saak
is an entirely new methodology for better interpretability of deep
networks. However, it has some limitations because of the increased
computation implicated by kernel augmentation.
Jay Kuo at el. proposed an improved version of Saak named Saab (Subspace
approximation with adjusted Bias). Saab, a variant of PCA, has
convolutional layers followed by MLP. Typically, the bias term varies
for all layers. However, Saab was set as a constant equivalent to the
most negative value of the input vector, making the nonlinear activation
function redundant. The convolutional filters were obtained from the
covariance matrix of bias-removed spatial-spectral cuboids and were
chosen of size 5x5. The convolutional filters generated by PCA capture a
large amount of energy, but the capacity decreases with more significant
indices. The higher the cross entropy, the lower the discriminant power.
After convolutional layers, the MLP was used for labeling, and
parameters were calculated using linear least-squares regression (LSR).
It was claimed as a novel approach to self-labeling.