**Higher Order Feature Vectors**

Linear classifiers can be used to make non-linear predictions.

For example, the feature map

Another example:

Since a possible boundary is an elipse,

## Non-linear Classification

**The order** 3 polynomial feature vector is given by the following formula:

For each of the feature transformations (power 1, power 2, power 3), there are n-multichoose-power combinations. Thus:

## Regression using Higher Order Polynomial feature

Assume we have n data points in the training set: is the training example:

The relationship between y and x can be roughly described by a cubic function, so a feature vector of minimum order 3 can minimize structural errors.

## Effect of Regularization on Higher Order Regression

The three figures below show the fitting result of a 9th order polynomial regression with different regularization parameter lambda on the same training data.

The smallest regularization parameter lambda to A

The largest regularization parameter lambda to B

The effect of regularization is to restrict the parameters of a model to freely take on large values. This will make the model function smoother, leveling the ‘hills’ and filling the ‘valleys’. It will also make the model more stable, as a small perturbation on x will not change y significantly with smaller .

## Kernels as Dot Products 1

Let’s assume:

## The Kernel Perceptron Algorithm

The original Perceptron Algorithm is given as the following:

Given:

The equivalent way to initialize if we want the same result as initializing is .

Now look at the line “Update appropriately” in the above algorithm:

Assuming that there was a mistake in classifying the data point i.e.

is equivalent to

### The Mistake Condition

is equivalent to

### Kernel Composition Rules

If is a kernel so is

If

### The Radial Basis Kernel

The radial basis kernel is given by:

If