Gradient Vector Field Behavior

Driss Boutat
7 min readJan 12, 2021

--

The gradient has always been an essential means in the optimisation topic because many other algorithms rely on it to measure “the force” which leads to the minimum. Gradient descent is a powerful tool to run an optimization algorithm on everything especially with the development of machine learning, powerful computers and huge databases.

In this post, I will show what the idea behind the behavior of the gradient. I’d like to revisit the derivation of the necessary condition for a point to be a minimum or a maximum. My purpose is to explain this idea in a geometrical way. In mathematics term, consider a function

An objective function

In order to achieve this goal, I will restrict myself to the following two problems.

Unconstrained problem : the right to move in all directions of the workspace.

Unconstrainded problem

Constrained problem : only certain given directions are allowed to move

Constrained problem

In this case, the directions to follow to move are those tangent to the constraint.

When it is possible to draw a graph, it shows the shape of a function especially the maximum and minimum values of a function. But in practice, the objective functions that we are dealing with involve several variables therefore we can not plot them.

Figure 1 : Graph of an objective function with two variables.

Gradient vector field

The gradient vector field, or gradient in short, of a function is given by the vector field whose components are the partial derivatives of the function.

Gradient

It ties a vector to each point of the function domain.

The gradient vector field vanishes at the red point and then moves away however it goes towards the blue point to vanish there too.

Think of the objective function as a potential energy and of the points in its domain as particles that undergoing a force that is opposite of the gradient. Thus this force generated by this potential energy is as follows:

Equation 1

with that in mind, an equilibrium point is a point where the gradient is zero. If we give it a small push and it will return to its location (stable point) then its a minimum: it realizes a local minimal value of potential energy. On the other hand if it flees under the force opposite to the gradient then it’s a maximum (unstable point) : it realizes a local maximal value of potential energy.

As you can see in the figure below if a point is at the maximum (maximum potential energy) in the top and if we push it a little, it will run away down . However if it is at the bottom in a minimu (minimal potential energy) and we undergo a little push on it then it will come back down .

Figure 2

Gradient and level curves belong to the domain of the objective function. As we can see from the figure above, the gradient is moving away from the minimum and heading towards the maximum (see equation 1). The contours of level curves are closed in the neighborhoods of the points of equilibrium.

There are three things to keep in mind when talking about the gradient of a function:

i) A level curve : subset of the the whole space where the objective function is a constant:

Level curve : subset
Figure 3 : Level curves of the above graph figure 1. They are closed in neighborhood of minimums and maximums.

ii) The orthogonality of the gradient to the level curves

the curve gamma in blue.

If we move along a level curve we do not change the value of the level (potential energy remains constant), which means that the gradient doesn’t have any effect as a force in the tangentielle direction to a level curve therefore it is normal to the level curves.

iii) The gradient is directed towards the highest levels.

The gradient moves away from the minimums towards the maximums: showing the highest levels. Even the maximum is a level which contains only one point is the greatest level in this figure.

Proof

The notion of minimum is local therefore, the above property is checked locally in a small neighborhood of a point.

If we consider the gradient at a point as a force applied to that point (the only force). Therefore, if you plan to move along a direction d in space using this force, the force acting on the point is (figure below)

The amount of the gradient acting on a point in the direction of d

if the direction is perpendicular to the gradient, you will be stuck and you will not be able to move forward. In fact, according to physics, the force perpendicular to the direction of movement does not change the speed of the point at which it is applied. If this quantity is negative, so the angle is greater than 90 °, then the direction d is what is called a direction of descent (it is directed towards the lower levels) . If the angle is 90 °, the point is stationary. However, if the angle is less than 90 °, the direction d is directed towards the higher levels.

Derivation of necessary conditions minimum

The minimum of a function endowed with a constraint must remain stationary somewhere on the constraint under the effect of the gradient of the function. Consequently, the gradient of the stress must step in to balance the gradient of the criterion to be minimized.

x hat is a stationary point because the gradient is orthogonal to the constraint.

the gradient of the objective function is divided into its orthogonal component and its tangential component on the stress.

If you use the gradient as the displacement force on the constraint while staying on the constraint, then the only component that does the work is the gradient component that is tangent to the constraint. As we can see in the figure the tangent part moves away from the point at the gradient is orthogonal to the stress: i.e. this point represents the minimum of the objective function restricted to the stress.

Thus the resultant of the gradient of the objective function and an amount of the gradient of constraint is zero at this point:

Resultant of the two gradients

where the parameter lambda is chosen in order these two gradients fulfill the above equation. This parameter is the so-called Lagrangian multiplicator.

In the literature, we speak of Lagrangian as the sum of the objective function and lambda times the constraint function. If we consider them as two potential energies, then the constraint function comes as a potential energy to balance that of the objective function.

Lagrangian

The well known necessary condition of minimum leads to

The first equation is a vector equation and it has several solutions. The second allows us to specify the solution that belongs to the constraint.

Equivalences of the above equation

i) It’s equivalent to say that the tangent component of the gradient at the equilbrium point is zero:

The tangent part is zero.

ii) At the equilibrium x hat (minimum, or maximum) the gradient of the objective function is orthogonal to all the directions allowed to move on the constraint (in our figure we have two opposite directions : left and right)

iii) If we are allowed to move in all directions, there is no constraint, then a point is a minimum if the gradient is zero. Indeed, it must be orthogonal at this point to all the directions of space, therefore it mus tbe the zero vector at this point thus the above equation become

At a minimum point without constraint

In this case, the whole space is the constraint space and the tangent part is the same as the whole gradient.

Dynamics point of view. Subsequently, we will see the behavior of -gradient as a vector field which generates a dynamic: denote by dot x the time derivative of x(t):

In the first equation x dot denotes the time derivative of x(t).

Let’s give the quadratic case

The steepest descent method : gradient descent algorithm

--

--