where \(\mathbf{x}_{t+1}\) is (the position of) a solution at time
step \(t+1\), \(\alpha\) is the step size, and \(\mathbf{d}_{t}\) is
the direction of the movement.
Since there is no general a priori clue about which direction to
take to progress towards the optimal solution, various mathematical
properties are used to define the right track and the step size, better
leading to optimal points.
First-order derivative methods use the opposite direction of the
gradient vector with a certain step size to find minimal points. Despite
first-order methods have fast convergence, sometimes, depending of the
step size, an oscillatory process can start, and the optimal point is
never found. The classical well-known gradient method, a first-order
method, is shown in Eq. (2):