Back-propagation Algorithm

The goal is to minimize the following loss function using stochastic gradient descent algorithm:

\mathcal{L}\left(y, f_{L}\right)=\left(y-f_{L}\right)^{2}

Let \delta_{i}=\frac{\partial \mathcal{L}}{\partial z_{i}}
\frac{\partial \mathcal{L}}{\partial w_{1}}=x\left(1-f_{1}^{2}\right)\left(1-f_{2}^{2}\right) \cdots\left(1-f_{L}^{2}\right) w_{2} w_{3} \cdots w_{L}\left(2\left(f_{L}-y\right)\right)

SGD Convergence guarantees

  • For multi-layer neural networks, stochastic gradient descent (SGD) is not guaranteed to reach a global optimum
  • Larger models tend to be easier to learn because their units need to be adjusted so that they are, collectively sufficient to solve the task

Don’t miss these tips!

We don’t spam! Read our privacy policy for more info.

Open chat
Powered by