A
relatively recent study proposed SOM-based multi-layer architecture
named convolutional SOM (CSOM). The method has convolutional layers
placed between SOM layers. The novelty was the convolutional-based
feature learning by the SOM. The SOM map was used as a filter for
learning, followed by a convolution layer with that learned filter.
Figure 11 shows a snapshot of filters generated by SOM. The structure
was tested with two types of pooling layers: Traditional max-pooling and
SOM-based pooling. The latter approach performed better with the feature
map with learned SOM maps (filter). The winner neurons were chosen using
Euclidean distance. The SOM map size was chosen 8x8 for input images of
size 256x256. However, no specific reason was mentioned for choosing a
specific map size.
FIGURE 11 Filter learned by SOM in the convolutional SOM (CSOM) method
Alexander and Ayanava proposed a
biologically plausible architecture named Resilient self-organizing
Tissue (ReST) that can be executed as a typical CNN. The continuous
energy function of SOM was the core of the study. It was noted that
traditional SOM suffers from optimum convergence state and model
parameter values, while an energy function can provide a simple quality
measure. With continuous energy function, Stochastic gradient descent
(SGD) can be extended to SOM learning in deep learning. Unlike the
traditional SOM, the learning rate over time was kept constant. The Map
size KxK was treated as a hyperparameter for K
∈ { 10, 15,20, 30, 50} and chosen 10x10 for varying input
batch size of NxN, where N ∈ {1, 5, 10, 20, 50, 100}. A larger map and
batch size would significantly increase the training time.
In a similar approach to CSOM,
two more architectures were proposed, named Deep SOM (DSOM) and extended
DSOM (E-DSOM). The block diagrams of both architectures are shown in
Figure 12. In DSOM, each activation space gives a winner on a SOM Map
(filter) during the SOM phase. The next layer is the sampling phase, in
which the feature map is formed. Each node in the feature map is a BMU
from an activation space and is stored in the relevant space of the
activation space. The E-DSOM has
multiple DSOM architectures running in parallel, and the feature maps
are combined at the end. In two-layered architecture, the output layer
gives a single SOM map and is used as an input to a classification
method. The map size varied from 4-24 for the first layer and 14-16 for
the second layer using MNIST, GSAD, and SP-HAR datasets. The E-DSOM
outperformed DSOM with a classification accuracy of up to 15% with
time-saving by 19%. The downside is the requirement for more
computational power for parallel architecture.