Our goal is to find
that minimizes:
![]()
through gradient descent we will:
Start at an arbitrary location:
, update theta until
change becomes insignificant.

Theta moves towards the origin. If we increase the step-size
the magnitude of change in each update gets larger.
Stochastic Gradient Descent
SGD and Hinge loss:
![]()
With SGD, we choose randomly
to update theta such that:
![]()
If
then: ![]()