INTRODUCTION AND PROBLEM DESCRIPTION

Gaussian Mixture Model (GMM) is a probability density function, which can be expressed as a weighted sum of multivariate Gaussian components.GMMs, which can effectively model complex probability distributions, are widely used in such applications as data clustering, data classification, speaker recognition, or image segmentation and classification.The standard method for estimation of GMM parameters is the expectation-maximization (EM) algorithm .A finite mixture model p(x,Θ) can be expressed by a weighted sum of K>1 components:
\(p({\bf x}\vert \Theta)=\sum\limits_{m=1}^{K}\alpha_{m}p_{m}({\bf x}\vert \theta_{m}), (1)\)
\(\)
where αm is mth mixing proportion and pm is the probability density function of the mth component. In (1) \(\theta\)m is the set of parameters defining the mth component and Θ={\(\theta\)1,\(\theta\)2,…,\(\theta\)K,α1,α2,…,αK} is the complete set of the parameters needed to define the mixture. The functional form of pm is assumed to be known; Θ is unknown and has to be estimated. The mixing proportions must satisfy the following conditions: αm>0, m=1,…,K and \(\Sigma\)αm=1. The number of components K is either known a priori or has to be determined during the mixture learning process. In our problem we know value of K.
In GMMs the probability density function of the mth component is given by:
\(p_{m}({\bf x}\vert \theta_{m})={1\over (2\pi)^{d/2}\vert \Sigma_{m}\vert ^{1/2}}\exp(-{1\over 2}({\bf x}-\mu_{m})^{T}\Sigma_{m}^{-1}({\bf x}-\mu_{m})), (2)\)
where μm and Σm denote the mean vector and covariance matrix, respectively, |·| denotes a determinant of a matrix, T denotes transposition of a matrix, and d is the dimension of the feature space. The set parameters of the mth component is \(\theta\)m={μmm}. Thus, for the GMM Θ is defined by: Θ={μ1,Σ1,…,μK,Σk,α1,…,αK}.
Estimation of the parameters of a GMM can be performed using the maximum likelihood approach. Given a training set of independent and identically distributed feature vectors X={x1,x2,…,xN}, where xi=[x1i,…,xdi] in Rd, the loglikelihood corresponding to the K-component GMM is given by:
\(\log p(X\vert \Theta)=\log\prod\limits_{i=1}^{N}p({\bf x}^{i}\vert \Theta)=\sum\limits_{i=1}^{N}\log\sum\limits_{m=1}^{K}\alpha_{m}p_{m}({\bf x}^{i}\vert \theta_{m}).\)
The maximum likelihood estimate of the parameters is given by: Θml=argmax{logp(X|Θ)}.
In our problem, we have to model the data using mixture of two multivariate (6-variate) normal distributions.Therefore the value of K=2 and we have a vector X in R6.We have been given Swiss-Bank data set on original(0) and fake(1) currency and after modeling the data using Gaussian mixture model, we have to answer whether we can use this model as a classification model in this case.I did the following method to conclude my results on this:
First using the EM Algorithm of multivariate gaussian mixture model, I have calculated the MLE's of the parameter space.(code has been attached in the appendix).Then using those values of Θ={μ1,Σ1,μ2,Σ2,\(\pi\)}, calculate the pdf value for each data set given and match it with their initial value(0 or 1).In this way by calculating the difference between these two values we will get the value of percentage error which will constitute our data analysis part.

EM ALGORITHM OF MULTIVARIATE MIXTURE MODEL

The most popular approach for maximizing log likelihood function is the EM algorithm. EM algorithm is an iterative optimization technique which is operated locally.It is an iterative algorithm, which, starting from initial guess of a parameters Θ(0), generates a sequence of estimations Θ(1)(2),…,Θ(j),…, with increasing loglikelihood (i.e., logp(X|Θ(j))>logp(X|Θ(j-1)). Each iteration j of the algorithm consists of two steps called expectation step (E-step) and maximization step (M-step) followed by a convergence check. For the GMMs these steps are defined as follows:
E-step: Given the set of mixture parameters Θ(j-1) from the previous iteration, for each m=1,…,K and i=1,…,N, the expected value that a feature vector xi was generated from mth component is computed as:
\(h_{m}^{(j)}({\bf x}^{i})={\alpha_{m}^{(j)}p_{m}({\bf x}^{i}\vert \theta_{m}^{(j-1)}) \over \sum\nolimits_{k=1}^{K}\alpha_{k}^{(j)}p_{k}({\bf x}^{i}\vert \theta_{k}^{(j-1)})}, \)