Mixture Models and the Expectation Maximization (EM) Algorithm

A Gaussian Mixture Model (GMM), which is a generative model for data \mathbf{x} \in \mathbb{R}^{d}, is defined by the following set of parameters:

  • K Number of mixture components
  • A d-dimensional Gaussian \mathcal{N}\left(\mu^{(j)}, \sigma_{j}^{2}\right) for every j=1, \ldots, K
  • p_{1}, \ldots, p_{K} : Mixture weights
  • The parameters of a K-component GMM can be represented as:
    \circ \quad \theta=\left{p_{1}, \ldots, p_{K}, \mu^{(1)}, \ldots, \mu^{(K)}, \sigma_{1}^{2}, \ldots, \sigma_{K}^{2}\right}
  • The likelihood of a point X in a GMM is given as

        \[\circ \quad p(\mathbf{x} \mid \theta)=\sum_{j=1}^{K} p_{j} \mathcal{N}\left(\mathbf{x}, \mu^{(j)}, \sigma_{j}^{2}\right)\]

Observed case

Let K=2 and let [-1.2-0.8]^{T},[-1-1.2]^{T},[-0.8-1]^{T} be three observed points in cluster 1 and \left[\begin{array}{cc}1.2 & 0.8\end{array}\right]^{T},\left[\begin{array}{cc}1 & 1.2\end{array}\right]^{T},\left[\begin{array}{ll}0.8 & 1\end{array}\right]^{T} be three observed points in cluster 2 .

  • \mu_{1,1}=-1
  • \mu_{1,2}=-1
  • \mu_{2,1}=1
  • \mu_{2,2}=1

Unobserved Case: EM Algorithm

Estimates of Parameters of GMM: The Expectation Maximization (EM) Algorithm

We observe \mathrm{n} data points \mathbf{x}{1}, \ldots, \mathbf{x}{n} in \mathbb{R}^{d}, we would like to maximize the GMM likelihood with respect to the parameter set:

    \[\theta=\left{p_{1}, \ldots, p_{K}, \mu^{(1)}, \ldots, \mu^{(K)}, \sigma_{1}^{2}, \ldots, \sigma_{K}^{2}\right}\]


Maximizing \log \left(\prod_{i=1}^{n} p\left(\mathbf{x}^{(i)} \mid \theta\right)\right) is not tractable using the settings of GMMs.
The EM algorithm is an iterative algorithm that finds a locally optimal solution \hat{\theta} to the GMM likelihood maximization problem.

E-Step

Assume that the initial means and variances of two clusters in a GMM are as follows:

    \[\mu^{(1)}=-3, \mu^{(2)}=2, \sigma_{1}^{2}=\sigma_{2}^{2}=4\]


Let p_{1}=p_{2}=0.5
Let x^{(1)}=0.2, x^{(2)}=-0.9, x^{(3)}=-1, x^{(4)}=1.2, x^{(5)}=1.8
Using the formula of E-Step:

    \[p(j \mid i)=\frac{p_{j} \mathcal{N}\left(x^{(i)} ; \mu^{(j)}, \sigma_{j}^{2}\right)}{p\left(x^{(i)} \mid \theta\right)}\]

p(1 \mid 1)=0.29421
p(1 \mid 2)=0.62246
p(1 \mid 3)=0.65135
p(1 \mid 4)=0.10669
p(1 \mid 5)=0.053403

M-Step

Using the formulae corresponding to the M-step,

\hat{n}{1}=\sum{i=1}^{5} p(1 \mid i)=1.7281
\hat{p}{1}=\frac{\hat{n}{1}}{n}=\frac{\hat{n}{1}}{5}=0.34562 \hat{\mu}{1}=\frac{1}{\hat{n}{1}} \sum{i=1}^{5} p(1 \mid i) x^{(i)}=-0.53733
\hat{\sigma}{1}^{2}=\frac{1}{\hat{n}{1}} \sum_{i=1}^{5} p(1 \mid i)\left(x^{(i)}-\hat{\mu}^{(1)}\right)^{2}=0.57579 .

A Gaussian mixture model can provide information about how likely it is that a given point belongs to each cluster.

Don’t miss these tips!

We don’t spam! Read our privacy policy for more info.