FIGURE 10 filter trained by K-means (left) and convolutional K-means (Right) over STL-10 dataset
Convolutional K-means was claimed to have significantly reduced the redundancy in centroids compared to classical K-means. In convolutional K-means, the size of windows from the input images is two times bigger than the filter size and randomly selected. The convolution operation would be performed over the entire window with centroids of K-means filter to obtain a similarity matrix at each location. The patch with the most significant activation from the window can be considered the closest to the centroid, and its specific location can be assigned to the matching centroid. The filter size and the number of filters were treated as hyperparameters for optimum performance. While keeping the filter size fixed at 11x11, the number of filters varied and increasing the number of filters resulted in better performance. Similarly, the number of filters was fixed at 96, with the filters’ size varied, and the results were compared. In both cases, it was observed that the filters learned through convolutional K-means consistently outperformed those learned by classical K-means. However, the method is not fully unsupervised, as a backpropagation-trained weight matrix was used for learning the connections.
4.2.2 | Self-organizing Map (SOM)
Self-organizing map (SOM) is another widely used clustering method. Like K-means, SOM is also based on distance measurement; however, SOM can be interpreted as a constraint K-means method. Traditional SOM can learn and cluster similar attributes on a 2D map. The Euclidian distance is measured between all the neurons on the map and input data. The node on the map with minimum Euclidian distance is the Best matching unit (BMU). Unlike K-means, the neighborhood neurons of BMU are also updated; subsequently, the whole map represents the learning model. Though SOM is a very simple and powerful clustering technic, the major drawback is its shallow structure which may not be sufficient to learn efficient information-carrying features from big datasets, specifically images.