Machine Learning

# Nonlinear classification

## Higher Order Feature Vectors

Linear classifiers can be used to make non-linear predictions.

For example, the feature map Another example:

Since a possible boundary is an elipse,  ## Non-linear Classification The order 3 polynomial feature vector is given by the following formula:  For each of the feature transformations (power 1, power 2, power 3), there are n-multichoose-power combinations. Thus:

## Regression using Higher Order Polynomial feature

Assume we have n data points in the training set: is the training example:

The relationship between y and x can be roughly described by a cubic function, so a feature vector of minimum order 3 can minimize structural errors.

## Effect of Regularization on Higher Order Regression

The three figures below show the fitting result of a 9th order polynomial regression with different regularization parameter lambda on the same training data.

The smallest regularization parameter lambda to A

The largest regularization parameter lambda to B

The effect of regularization is to restrict the parameters of a model to freely take on large values. This will make the model function smoother, leveling the ‘hills’ and filling the ‘valleys’. It will also make the model more stable, as a small perturbation on x will not change y significantly with smaller .

## Kernels as Dot Products 1

Let’s assume:  ## The Kernel Perceptron Algorithm

The original Perceptron Algorithm is given as the following:

Given: The equivalent way to initialize if we want the same result as initializing is .

Now look at the line “Update appropriately” in the above algorithm:

Assuming that there was a mistake in classifying the data point i.e.  is equivalent to ### The Mistake Condition is equivalent to ### Kernel Composition Rules

If is a kernel so is If The radial basis kernel is given by: If   