Inner product and Orthogonal vectors
Linear classier where sign = -1 or +1, the inner product of .
Linear classifier with an offset (general case): .
If the ith training dataset is a vector and is a scalar, is a label, is a classifier output.
If , it means the classification matches the label and vice versa.
Perceptron Algorithm
The perceptron algorithm takes T (the number of iterations) and the training set as input, and aims to learn the optimal . (case perceptron with offset .
the first is always greater than the latter. Considering that our goal is to minimize the training error, the update always makes the training error decrease, which is desirable.
Distance from a Line to a Point
Consider a line L in R2:
d = (P the end point of the vector x_0).
Proof:
Decision boundary vs margin boundary
The decision boundary is the set of points x which satisfy:
The Margin Boundary is the set of points x which satisfy:
The distance between the decision boundary and the margin boundary is:
Hinge Loss and Objective Function
Hinge loss is a loss function user to train classifiers. The Hinge Loss tells us how undesirable a training example is, with regard to the margin and the correctness of its classification.
Linear Classification and Generalization
SVMs
Support vector machines are supervised learning models with associated learning algorithms that analyze data for classification and regression tasks.
The objective function = average loss (average hinge loss) + regularization (bias the terms to get the optimum solution)
Support vectors refer to points that are exactly on the margin boundary
If we remove all points that are support vectors, we will get a different
If we remove one point that is not a support vector, we will get the same
As we increase d increases.
If the training loss is low and the validation loss is high, the model might be overfitting. In the opposite case, the model might be underfitting.