5.1 Error Data Prediction using Bayes Theorem Classifier
Algorithm
Error data prediction can be categorized as a type of anomaly detection
(or outlier detection), which is the identification of data or
observations which differs significantly from majority of the data. The
most common way to perform anomaly detection is using the classification
algorithm.
Naive Bayes is a classification algorithm for binary (two-class) and
multi-class classification problems (Brownlee, 2016). This method is a
set of supervised learning algorithms based on applying Bayes’ theorem
with the “naive” assumption of conditional independence between every
pair of features given the value of the class variable (Zhang, 2004).
The Theorem was named after English mathematician Thomas Bayes
(1701-1761). Bayes’ Theorem is stated as:
\(P(A/B)\ =\frac{P(B/A).P(A)\ }{P(B)}\) … (22)
\(P(B/A)\ =\frac{P(B\bigcap A)\ }{P(B)}\) … (23)
where \(P(A/B)\ \)is the probability of class (A) given the provided
data (B).
Bayes’ theorem allows users to figure out P(A|B) fromP(B|A). Rather than attempting to calculate the
probabilities of each attribute value, the data are assumed to be
conditionally independent given the class value. As a result, Naïve
Bayes classifier approach requires only a small amount of training data
to predict and classify the outcome. In spite of the assumption that the
attributes do not interact (which is most unlikely in real data), the
approach has worked quite well in many real-world situations.
Naive Bayes can be extended to real-valued attributes by assuming a
Gaussian distribution. This extension of naive Bayes is called Gaussian
Naive Bayes (Brownlee, 2016). A collection of data points typically has
a certain distribution (e.g. a Gaussian distribution). To detect error
data (anomalies), the probability distribution p(x) from the data points
is first calculated. As a new datum, x, comes in, we compare p(x) with a
threshold r. If p(x)<r, it is considered as an error or
anomaly. This is because normal examples tend to have a large p(x) while
anomalous examples tend to have a small p(x) (Flovik 2019). This is the
easiest way to proceed because users only need to estimate the mean and
the standard deviation from the training data.
In this exercise, 62% of our experimental data were used as the
training data, while the remaining 38% were used as the test data. The
filtration task is defined as removing data which contains large errors
(i.e. \(C\) ≤ 0.85 or \(C\) ≥ 1.15). Three types of Gaussian Naive Bayes
classifier algorithms were used to predict the probabilities of having
data with large errors. The probabilities will be used to classify
whether the data is the error data which should be removed or retained.
Three types of Gaussian Naïve Bayes algorithm have been implemented,
which are (a) Gaussian NB without calibration, (b) Gaussian NB based on
a non-parametric isotonic regression calibration, and (c) Gaussian NB
based on Platt’s sigmoid model calibration. This probability calibration
is reported to improve the confidence on the prediction (Metzen, 2015).
Figure 12 (a) shows the background information of the test. The
red-colored points represent data with an acceptable error (where 0.85
< \(C\) < 1.15), while dark colored points represent
the targeted points to be filtered because of excessive error (i.e.\(C\) ≤ 0.85 or \(C\) ≥ 1.15). After the Gaussian Naïve Bayes scheme is
implemented, the results are plotted in Figure 12 (b). The dark colored
points have been filtered.