Gradient descent (GD) is a basic optimization technique ... The classical approach, batch GD, involves computing the gradient ...
Why does gradient descent work? Specifically, what can we guarantee about the point it converges to? In addition to gradient descent, we explore a large variety of optimization methods. What are the ...