By definition, in regression, the observed value y is a real number(continuous), unlike y is discrete in classification. The predictor f, which tries to emulate/predict y is defined as ![]()
Compute Hinge Loss
The empirical risk is defined as:
![]()
Where
is the tth training example (and there are n in total), and Loss is some loss function, such as hinge loss. the definition of hinge loss:
![]()
Example:

Given ![]()
Compute: ![]()
We can calculate :

Hence: ![]()
Geometrically Identifying Error

Here, the structural error occurs because the true underlying relationship is non-linear but the regression function is linear.
The larger the training set is, the smaller the estimation error will be. The structural error occurs when the true underlying relationship is highly non-linear, so it is not relevant to increasing n.
Obtaining 0 empirical risks for a large amount of data means that it is possible that the model is overfitted.
Necessary and Sufficient Condition for a Solution
Computing the gradient of:
![]()
We get:![]()
For any square matrix
has a unique solution
if and only if A is invertible.
Regularization: extreme case 1
If we define the loss function:
![]()
where
is the regularization factor.

If we increase
to infinity, minimizing J is equivalent to minimizing
. Thus
will have to be a zero vector. Thus
becomes
, a horizontal line. Thus f converges to line 1.