- Generative models model the probability distribution of each class
- Discriminative models learn the decision boundary between the classes
Simple Multinomial Generative model
denotes the probability of model M choosing a word w. its value must lie between
Likelihood Function
For simplicity let’s consider W={0,1}. We want to estimate a multinomial model to generate a document D=”0101″.
For this task, we consider two multinomial models where:
denotes the probability of the Model 1 generating D.
Again, if we consider:
is better than
Maximum Likelihood Estimate
Consider the vocabulary
Our model M can have 25 parameters to express the probability of each letter.
Let be the parameters of M* then:
MLE for Multinomial Distribution
Let be the probability of being generated by the simple model described above.
Stationary Points of the Lagrange Function
Maximizing \mathrm P(D\mid\theta)\mathrm{\ is equivalent to maximizing
We know that:
Define the Lagrange function:
Then, find the stationary points of L by solving the equation
for all
Predictions of a Generative Multinomial Model
Also, suppose that we classify a new document D to belong to the positive class iff:
The document is classified as positive iff
The generative classifier M can be shown to be equivalent to a linear classifier given by
Prior, Posterior and Likelihood
Consider a binary classification task with two labels ‘+’ (positive) and ‘-‘ (negative).
Let y denote the classification label assigned to a document D by a multinomial generative model M with parameters for the positive class and for the negative class.
is the posterior distribution
is the prior disctibution
Example
and
Gaussian Generative models
MLE for the Gaussian Distribution
The probability density function for a Gaussian random variable is given as follows:
Let be i.i.d r.v with mean and variance
Then their joint probability density function is given by:
Taking logarithm of the above function, we get:
MLE for the Mean and variance: