Gradient Vector Field Behavior
The gradient has always been an essential means in the optimisation topic because many other algorithms rely on it to measure “the force” which leads to the minimum. Gradient descent is a powerful tool to run an optimization algorithm on everything especially with the development of machine learning, powerful computers and huge databases.
In this post, I will show what the idea behind the behavior of the gradient. I’d like to revisit the derivation of the necessary condition for a point to be a minimum or a maximum. My purpose is to explain this idea in a geometrical way. In mathematics term, consider a function
In order to achieve this goal, I will restrict myself to the following two problems.
Unconstrained problem : the right to move in all directions of the workspace.
Constrained problem : only certain given directions are allowed to move
In this case, the directions to follow to move are those tangent to the constraint.
When it is possible to draw a graph, it shows the shape of a function especially the maximum and minimum values of a function. But in practice, the objective functions that we are dealing with involve several variables therefore we can not plot them.
Gradient vector field
The gradient vector field, or gradient in short, of a function is given by the vector field whose components are the partial derivatives of the function.
It ties a vector to each point of the function domain.
Think of the objective function as a potential energy and of the points in its domain as particles that undergoing a force that is opposite of the gradient. Thus this force generated by this potential energy is as follows:
with that in mind, an equilibrium point is a point where the gradient is zero. If we give it a small push and it will return to its location (stable point) then its a minimum: it realizes a local minimal value of potential energy. On the other hand if it flees under the force opposite to the gradient then it’s a maximum (unstable point) : it realizes a local maximal value of potential energy.
As you can see in the figure below if a point is at the maximum (maximum potential energy) in the top and if we push it a little, it will run away down . However if it is at the bottom in a minimu (minimal potential energy) and we undergo a little push on it then it will come back down .
Gradient and level curves belong to the domain of the objective function. As we can see from the figure above, the gradient is moving away from the minimum and heading towards the maximum (see equation 1). The contours of level curves are closed in the neighborhoods of the points of equilibrium.
There are three things to keep in mind when talking about the gradient of a function:
i) A level curve : subset of the the whole space where the objective function is a constant:
ii) The orthogonality of the gradient to the level curves
If we move along a level curve we do not change the value of the level (potential energy remains constant), which means that the gradient doesn’t have any effect as a force in the tangentielle direction to a level curve therefore it is normal to the level curves.
iii) The gradient is directed towards the highest levels.
Proof
The notion of minimum is local therefore, the above property is checked locally in a small neighborhood of a point.
If we consider the gradient at a point as a force applied to that point (the only force). Therefore, if you plan to move along a direction d in space using this force, the force acting on the point is (figure below)
if the direction is perpendicular to the gradient, you will be stuck and you will not be able to move forward. In fact, according to physics, the force perpendicular to the direction of movement does not change the speed of the point at which it is applied. If this quantity is negative, so the angle is greater than 90 °, then the direction d is what is called a direction of descent (it is directed towards the lower levels) . If the angle is 90 °, the point is stationary. However, if the angle is less than 90 °, the direction d is directed towards the higher levels.
Derivation of necessary conditions minimum
The minimum of a function endowed with a constraint must remain stationary somewhere on the constraint under the effect of the gradient of the function. Consequently, the gradient of the stress must step in to balance the gradient of the criterion to be minimized.
the gradient of the objective function is divided into its orthogonal component and its tangential component on the stress.
If you use the gradient as the displacement force on the constraint while staying on the constraint, then the only component that does the work is the gradient component that is tangent to the constraint. As we can see in the figure the tangent part moves away from the point at the gradient is orthogonal to the stress: i.e. this point represents the minimum of the objective function restricted to the stress.
Thus the resultant of the gradient of the objective function and an amount of the gradient of constraint is zero at this point:
where the parameter lambda is chosen in order these two gradients fulfill the above equation. This parameter is the so-called Lagrangian multiplicator.
In the literature, we speak of Lagrangian as the sum of the objective function and lambda times the constraint function. If we consider them as two potential energies, then the constraint function comes as a potential energy to balance that of the objective function.
The well known necessary condition of minimum leads to
The first equation is a vector equation and it has several solutions. The second allows us to specify the solution that belongs to the constraint.
Equivalences of the above equation
i) It’s equivalent to say that the tangent component of the gradient at the equilbrium point is zero:
ii) At the equilibrium x hat (minimum, or maximum) the gradient of the objective function is orthogonal to all the directions allowed to move on the constraint (in our figure we have two opposite directions : left and right)
iii) If we are allowed to move in all directions, there is no constraint, then a point is a minimum if the gradient is zero. Indeed, it must be orthogonal at this point to all the directions of space, therefore it mus tbe the zero vector at this point thus the above equation become
In this case, the whole space is the constraint space and the tangent part is the same as the whole gradient.
Dynamics point of view. Subsequently, we will see the behavior of -gradient as a vector field which generates a dynamic: denote by dot x the time derivative of x(t):
Let’s give the quadratic case
The steepest descent method : gradient descent algorithm