Introduction
There have been a lot of developments in modern warfare in today's era. But very few can be considered as important as detection of one's artillery based on certain conditions. An accurate measure of the enemy's artillery has led to fewer surprises in warfare than earlier. This has also led to a huge amount of improvement in one's own defense. One such type of detection is required in the case of submarine protection.
A Naval mine is a self explosive device placed in water to destroy the surface of the submarines. Mines are deposited and left to wait until they are triggered, Triggering happens when mines are approached or contacted by an enemy vessel; you don't need to bump into mines to set off.
Submarines are set to consistently send the signals,and to receive the signals reflected from the obstacles. This type of detection is very necessary in such an endangering case. Any type of misclassification can lead to a huge loss of life, not including the resources that will be destroyed. Generally, there are so many rocks on the bed of the ocean that they can be easily misclassified as a mine. Moreover, a mine can be easily concealed among the rocks to avoid detection.
So, a very accurate detection of mines is highly necessary for the safety of the submarine's passengers. Even if the vice versa happens , i.e., a rock is detected as a mine, it will still lead to the submarine using evasive measures, leading to loss of resources, which can prove fatal, considering the environment they are in.
The detection of mines were earlier done intuitively with the help of sonar signals. The sonar signals that were returned were manually used to determine whether the surface on which it was reflected upon was a mine or a rock. Now, with the advancement of technology, machine learning can be used for a better detection of mines.
Detection of mines
The data used for the experiments were sonar returns collected from a metal cylinder and a cylindrical shaped rock positioned on a sandy ocean floor. Both targets were approximately 5 ft in length and the impinging pulse was a wide-band linear FM chirp (ka = 55.6). Returns were collected at a range of 10 meters and obtained from the cylinder at aspect angles spanning 90 ° and from the rock at aspect angles spanning 180 °. A set of 208 returns (111 cylinder returns and 97 rock returns) were selected from a total set of 1200 returns on the basis of the strength of the secular return (4.0 to 15.0 dB signal-to-noise ratio).
This processed signals were then divided into 60 frequency bands and the records were normalized.These normalized records depict the strength of the returned signal on each of the frequency band. This saved a lot of time as the strength were now being depicted on a scale of 0 to 1.
This normalization of data also helped in the manual evaluation of the dataset. The raw data collected was recorded in scientific terms and was much more difficult to understand. However, since the data has now been normalized, it is much easier to realize the strength of the signal on each frequency band without even modeling. Still, detection using modeling techniques will still be considered a more viable option given the available technology nowadays.
Dataset Description
The 60 samples of reflected frequencies are given in the data. In each frequency band the strength of the reflected frequency is very low. That indicates that the obstacle through which the reflected frequency received can generally not be considered a mine. In the same way the strength of the reflected frequency is high when bumped on the some obstacle that can be predicted as a mine. All the frequency band collectively forming a pattern which is similar to a bell shaped curve (normal distribution).
The given dataset has 60 attributes, each depicting a frequency band. The values in those attributes are the strength of the signal on each frequency band in each pass of the record. The data collected was hence, numeric. Since it was a classification problem, the target variable had two classes - rocks and mines. There are total of 208 instances recorded for the experiment. Since the data was recorded in a scientific environment, there were no missing values and the outliers, if any, were experimental and could not be neglected.
However, there were a lot of correlated variables in the dataset which were later dealt with using dimensionality reduction and by checking the importance of each variables and selecting only the important attributes.
Modeling
Different classification techniques were applied for the classification of sonar returns classifying it into 2 undersea targets, a mine and a similarly shaped rock, namely logistic regression, decision trees, support vector machines, random forests, ensemble modeling using xgboost and multi layered perceptron.
Logistic Regression was used as a benchmark model as it is a basic classification algorithm that uses a odds ratio as a classifier. It achieved a classification accuracy of 74.5% with recall and precision at 76% and 77% respectively. This performance was better than that of a Decision tree, which gave an accuracy of 76.4% with recall and precision of 77% and 76% respectively.
For improving the accuracy of the training set, dimensionality reduction was done using Principle Component Analysis. Selected 20 components were based on the cumulative variance ratio. Applying Logistic Regression on the said components achieved the accuracy to 80.5% which was better than the previous model with all the features, while using these components on decision trees led to a decline in accuracy as it dipped to slightly less than 76%.
SVM technique was used as the data was entirely numeric. Since the underlying intuition of SVM is based on distance of support vectors from the hyperplane, it performed well on the dataset. It gave an accuracy of 87% with recall and precision also reaching up to 87%.
Gradient Boosting was done for the error metric to reach a global minimum. It also gave a good enough result of 84% accuracy throughout the dataset.
Neural networks with one added hidden layer gave a accuracy of 82% on test data. After hyper parameter tuning in which the network was trained by increasing the number of neurons and increasing the epoch and default batch size, it reached a maximum accuracy of 87%. But due to the small number of records, it was very easy to over fit the said neural network.
The model that gave the best results was random forest. Initial random forest using 10 decision trees gave an accuracy of 82% while giving a recall and precision slightly above 80%. Using important features by checking importance of each variable, the recall was increased to 85%. However, the accuracy dipped down to 81%. But increasing the number of decision trees used to 40 and increasing the number of predicted samples to 50 led to a huge increase in accuracy to 88% which was an improvement above all the models built. The recall and precision also touched 89%.
Conclusion
Many techniques can be used for better detection of mines to avoid any losses. Every technique has their own unique advantage, from SVM being numeric friendly to random forest being highly adaptive. Implementation of these techniques can help undermine the dangers that mines pose in open oceans.