2.0 Literature Survey
The automatic recognition of hand written digits can be safely regarded as a classical machine learning problem. Numerous methods, architechures, tricks, and techniques, have been proposed and good results have been achieved. Evidence of this can be found in the prevalence of different digit datasets used for benchmarking and analyzing such algorithms. Common classical datasets in this work include USPS (US Postal Service), NIST (US National Institute of Standards and Technology) dataset, and its variants such as the MNIST (modified NIST) and EMNIST (Extended MNIST) [1] datasets. The CIFAR-10 and CIFAR-100 [2] dataset, STL-10 dataset [3], Street View House Numbers (SVHN) dataset [4] and CEDAR dataset are commonly sought too.
One of the early works utilizing a relatively modern technique on this subject was [5]. In [6] however, the authors studied the performance of different classifier algorithms on the MNIST database of handwritten digits. They discussed measures that affect algorithmic implementation such as training time, run time, and memory requirements. In their work, they presented a Baseline Linear Classifier, Nearest Neighbor Classifier, Large Fully Connected Multi-Layer Neural Network, LeNet1, LeNet4, Boosted LeNet4 (based on the idea of Convolutional Networks), Tangent Distance Classifier (TDC), LeNet4 with K-NN, Local Learning with LeNet4, Optimal Margin Classifier (OMC). From these, the Boosted LeNet4 achieved the best error performance while LeNet4 required the least memory. LeCun et al.’s paper proposed a ConvNet approach to digit recognition problem, achieving 96% accuracy. The method involved considerable preprocessing without which accuracy falls to 60%.
Similarly, among the early works on Handwritten digit recognition was [7]. The authors developed a backpropagation network constrained for digit recognition on zipcode digits provided by USPS. The aimed to show that a large BP network can be applied to real image recognition tasts without extensive preprocessing. The method produced a 1% error rate and about a 9% reject rate. Work by Yawei Hou and Huailin Zhao utilized an Improved BP Neural Network for Handwritten Digit Recognition. The authors claim results obtained converged faster and the classification results were more accurate compared to results at that time [8]. In [9], the authors presented that Feedforward neural networks utilizing Extreme Learning machine algorithm had faster weight optimization, however, required larger number of hidden units to provide comparable results with a Backpropaogation based algorithm. In [10], the authors presented a method for recognizing handwritten digits by fitting generative spline models which would then be tuned by an Expectation Maximization Algorithm. While the method has it advntages, the main advantage is higher computational requirements compared to standard OCR techniques. Work by [11] on the other hand involved the combination of classifiers for digit recognition. This work was based on the idea that either by Bayesian combination, Dempster-Shafer evidential reasoning, and Dynamic classifier selection, the independent decisions by two high performance nearest-neighbor hand-printed digit classifiers can be combined to obtain improved digit classification systems.
Loo-Nin Teow and Kia-Fock Loe in [14], presented a method based on biological vision to solve automatic recognition of handwritten digits. They extracted linearly separable features from MNIST dataset and used a linear discriminant system for recognition, with the triowise linear support vector machines with soft voting yielding the best results. It doesn’t end there however. In 1999, [15] proposed contour information and Fourier descriptors for digit recognition. Models were built based on contour features, then test digits were analyzed by comparing the test digit’s features with built models. The recognition rate achieved as around 99.04%. In [16], a three-stage classifier was developed comprising of 2 Neural Networks and one Support Vector Machine (SVM). The two Neural Networks in tandem help to provide low misclassification rate, more complex features, and, a well-balanced rejection criterion. The SVM was optimized to take the top classes ranked by the Neural network. The authors claimed their work to achieve competitive results at the time. In 2003, Cheng-Lin Liu et al [12] summarized the performance of then state-of-the-art feature extraction and classifier techniques on three image databases: CEDAR, MNIST, CENPARMI. In total, 10 feature vectors and 8 classifiers were combined to give 80 accuracies to the test data sets used. Results obtained can be found in [12]. Similar work by the same author(s) evaluated normalization methods and direction feature extraction techniques with existing methods useful in digit recognition [13].
Numerous works, tricks, approaches, techniques, and systems can also be found on this subject. For instance: The use of Self organizing maps [17], Shape matching [18], [19] A method involving the division of the image into grids and computing the Hu moments as features was proposed. Artificial neural network was then implemented as classifier. The method yielded good processing times and accuracy. Methods such as Restricted Boltzmann Machines (RBMs) [28], SVM with inverse fringe feature [29], Echo state networks [30], Discrete Cosine S-Transform (DCST) features with Artificial Neural Networks classifier [31], Neural Dynamics Classification algorithm [32], Bat Algorithm-Optimized SVM [33] have been applied. Similarly, promising results from numerous algorithms have prompted the extension to numerous languages and characters. Indian numerals were treated in [23], Persian digits in [24], Bangla Digits [25], Hindu and Arabic digits in [26], Sindhi Numerals [27].
Most recently, due to the advent of powerful computational systems such as GPUs and TPUs, more solutions have been proposed, especially, with Deep learning. In [21] for instance, the authors made a case for Online digit recognition using deep learning. They developed a software application to record a dataset which included user information such as age, sex, nationality, and handedness. Thereafter they presented a 1D and 2D ConvNet model which obtained results of 95.86% (using distance and angle), and 98.50% respectively. Unfortunately, as deep learning methods have yielded exceptional results, they have also empowered Adversarial systems. It was shown by [22] that the changing of 1 pixel can lead to significant misclassification rates. The authors showed that 70.97% of the natural images can be perturbed to at least one target class simply by modifying a single pixel with 97.47% confidence on average. Further information can be found from academic resources.