5.1 Error Data Prediction using Bayes Theorem Classifier Algorithm
Error data prediction can be categorized as a type of anomaly detection (or outlier detection), which is the identification of data or observations which differs significantly from majority of the data. The most common way to perform anomaly detection is using the classification algorithm.
Naive Bayes is a classification algorithm for binary (two-class) and multi-class classification problems (Brownlee, 2016). This method is a set of supervised learning algorithms based on applying Bayes’ theorem with the “naive” assumption of conditional independence between every pair of features given the value of the class variable (Zhang, 2004). The Theorem was named after English mathematician Thomas Bayes (1701-1761). Bayes’ Theorem is stated as:
\(P(A/B)\ =\frac{P(B/A).P(A)\ }{P(B)}\) … (22)
\(P(B/A)\ =\frac{P(B\bigcap A)\ }{P(B)}\) … (23)
where \(P(A/B)\ \)is the probability of class (A) given the provided data (B).
Bayes’ theorem allows users to figure out P(A|B) fromP(B|A). Rather than attempting to calculate the probabilities of each attribute value, the data are assumed to be conditionally independent given the class value. As a result, Naïve Bayes classifier approach requires only a small amount of training data to predict and classify the outcome. In spite of the assumption that the attributes do not interact (which is most unlikely in real data), the approach has worked quite well in many real-world situations.
Naive Bayes can be extended to real-valued attributes by assuming a Gaussian distribution. This extension of naive Bayes is called Gaussian Naive Bayes (Brownlee, 2016). A collection of data points typically has a certain distribution (e.g. a Gaussian distribution). To detect error data (anomalies), the probability distribution p(x) from the data points is first calculated. As a new datum, x, comes in, we compare p(x) with a threshold r. If p(x)<r, it is considered as an error or anomaly. This is because normal examples tend to have a large p(x) while anomalous examples tend to have a small p(x) (Flovik 2019). This is the easiest way to proceed because users only need to estimate the mean and the standard deviation from the training data.
In this exercise, 62% of our experimental data were used as the training data, while the remaining 38% were used as the test data. The filtration task is defined as removing data which contains large errors (i.e. \(C\) ≤ 0.85 or \(C\) ≥ 1.15). Three types of Gaussian Naive Bayes classifier algorithms were used to predict the probabilities of having data with large errors. The probabilities will be used to classify whether the data is the error data which should be removed or retained. Three types of Gaussian Naïve Bayes algorithm have been implemented, which are (a) Gaussian NB without calibration, (b) Gaussian NB based on a non-parametric isotonic regression calibration, and (c) Gaussian NB based on Platt’s sigmoid model calibration. This probability calibration is reported to improve the confidence on the prediction (Metzen, 2015).
Figure 12 (a) shows the background information of the test. The red-colored points represent data with an acceptable error (where 0.85 < \(C\) < 1.15), while dark colored points represent the targeted points to be filtered because of excessive error (i.e.\(C\) ≤ 0.85 or \(C\) ≥ 1.15). After the Gaussian Naïve Bayes scheme is implemented, the results are plotted in Figure 12 (b). The dark colored points have been filtered.