By definition, in regression, the observed value y is a real number(continuous), unlike y is discrete in classification. The predictor f, which tries to emulate/predict y is defined as
Compute Hinge Loss
The empirical risk is defined as:
Where is the tth training example (and there are n in total), and Loss is some loss function, such as hinge loss. the definition of hinge loss:
Example:
Given
Compute:
We can calculate :
Hence:
Geometrically Identifying Error
Here, the structural error occurs because the true underlying relationship is non-linear but the regression function is linear.
The larger the training set is, the smaller the estimation error will be. The structural error occurs when the true underlying relationship is highly non-linear, so it is not relevant to increasing n.
Obtaining 0 empirical risks for a large amount of data means that it is possible that the model is overfitted.
Necessary and Sufficient Condition for a Solution
Computing the gradient of:
We get:
For any square matrix has a unique solution if and only if A is invertible.
Regularization: extreme case 1
If we define the loss function:
where is the regularization factor.
If we increase to infinity, minimizing J is equivalent to minimizing . Thus will have to be a zero vector. Thus becomes , a horizontal line. Thus f converges to line 1.