Our goal is to find that minimizes:
through gradient descent we will:
Start at an arbitrary location: , update theta until change becomes insignificant.
Theta moves towards the origin. If we increase the step-size the magnitude of change in each update gets larger.
Stochastic Gradient Descent
SGD and Hinge loss:
With SGD, we choose randomly to update theta such that:
If then: