Higher Order Feature Vectors
Linear classifiers can be used to make non-linear predictions.

For example, the feature map ![]()

Another example:

Since a possible boundary is an elipse,
![]()
![]()
Non-linear Classification
![]()
The order 3 polynomial feature vector is given by the following formula:
![]()
![]()
For each of the feature transformations (power 1, power 2, power 3), there are n-multichoose-power combinations. Thus:

Regression using Higher Order Polynomial feature
Assume we have n data points in the training set:
is the
training example:

The relationship between y and x can be roughly described by a cubic function, so a feature vector
of minimum order 3 can minimize structural errors.
Effect of Regularization on Higher Order Regression
The three figures below show the fitting result of a 9th order polynomial regression with different regularization parameter lambda on the same training data.

The smallest regularization parameter lambda to A
The largest regularization parameter lambda to B
The effect of regularization is to restrict the parameters of a model to freely take on large values. This will make the model function smoother, leveling the ‘hills’ and filling the ‘valleys’. It will also make the model more stable, as a small perturbation on x will not change y significantly with smaller
.
Kernels as Dot Products 1
Let’s assume:
![]()
![]()
The Kernel Perceptron Algorithm
The original Perceptron Algorithm is given as the following:

Given:
![]()

The equivalent way to initialize
if we want the same result as initializing
is
.
Now look at the line “Update appropriately” in the above algorithm:
Assuming that there was a mistake in classifying the
data point i.e.
![]()
is equivalent to ![]()
The Mistake Condition
is equivalent to ![]()
Kernel Composition Rules
If
is a kernel so is ![]()
If ![]()
The Radial Basis Kernel
The radial basis kernel is given by:
![]()
If ![]()
![]()