By definition, in regression, the observed value y is a real number(continuous), unlike y is discrete in classification. The predictor f, which tries to emulate/predict y is defined as
Compute Hinge Loss
The empirical risk is defined as:
Where is the tth training example (and there are n in total), and Loss is some loss function, such as hinge loss. the definition of hinge loss:
Example:

Given
Compute:
We can calculate :

Hence:
Geometrically Identifying Error

Here, the structural error occurs because the true underlying relationship is non-linear but the regression function is linear.
The larger the training set is, the smaller the estimation error will be. The structural error occurs when the true underlying relationship is highly non-linear, so it is not relevant to increasing n.
Obtaining 0 empirical risks for a large amount of data means that it is possible that the model is overfitted.
Necessary and Sufficient Condition for a Solution
Computing the gradient of:
We get:
For any square matrix has a unique solution
if and only if A is invertible.
Regularization: extreme case 1
If we define the loss function:
where is the regularization factor.

If we increase to infinity, minimizing J is equivalent to minimizing
. Thus
will have to be a zero vector. Thus
becomes
, a horizontal line. Thus f converges to line 1.