Higher Order Feature Vectors
Linear classifiers can be used to make non-linear predictions.
For example, the feature map
Since a possible boundary is an elipse,
The order 3 polynomial feature vector is given by the following formula:
For each of the feature transformations (power 1, power 2, power 3), there are n-multichoose-power combinations. Thus:
Regression using Higher Order Polynomial feature
Assume we have n data points in the training set: is the training example:
The relationship between y and x can be roughly described by a cubic function, so a feature vector of minimum order 3 can minimize structural errors.
Effect of Regularization on Higher Order Regression
The three figures below show the fitting result of a 9th order polynomial regression with different regularization parameter lambda on the same training data.
The smallest regularization parameter lambda to A
The largest regularization parameter lambda to B
The effect of regularization is to restrict the parameters of a model to freely take on large values. This will make the model function smoother, leveling the ‘hills’ and filling the ‘valleys’. It will also make the model more stable, as a small perturbation on x will not change y significantly with smaller .
Kernels as Dot Products 1
The Kernel Perceptron Algorithm
The original Perceptron Algorithm is given as the following:
The equivalent way to initialize if we want the same result as initializing is .
Now look at the line “Update appropriately” in the above algorithm:
Assuming that there was a mistake in classifying the data point i.e.
is equivalent to
The Mistake Condition
is equivalent to
Kernel Composition Rules
If is a kernel so is
The Radial Basis Kernel
The radial basis kernel is given by: