


Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
The method of steepest descent is an iterative method used to find the minimum of a differentiable function when an analytical solution is not possible. It involves finding the direction of steepest descent and minimizing the function in that direction to obtain the next iterate. This process is repeated until convergence to the minimum is achieved.
What you will learn
Typology: Study notes
1 / 4
This page cannot be seen from the preview
Don't miss anything!
Jim Lambers MAT 419/ Summer Session 2011- Lecture 10 Notes
These notes correspond to Section 3.2 in the text.
When it is not possible to find the minimium of a function analytically, and therefore must use an iterative method for obtaining an approximate solution, Newton’s Method can be an effective method, but it can also be unreliable. Therefore, we now consider another approach. Given a function f : Rn^ → R that is differentiable at x 0 , the direction of steepest descent is the vector −∇f (x 0 ). To see this, consider the function
ϕ(t) = f (x 0 + tu),
where u is a unit vector; that is, ‖u‖ = 1. Then, by the Chain Rule,
ϕ′(t) =
∂f ∂x 1
∂x 1 ∂t
∂f ∂xn
∂xn ∂t
=
∂f ∂x 1
u 1 + · · · +
∂f ∂xn
un
= ∇f (x 0 + tu) · u,
and therefore ϕ′(0) = ∇f (x 0 ) · u = ‖∇f (x 0 )‖ cos θ,
where θ is the angle between ∇f (x 0 ) and u. It follows that ϕ′(0) is minimized when θ = π, which yields
u = −
∇f (x 0 ) ‖∇f (x 0 )‖
, ϕ′(0) = −‖∇f (x 0 )‖.
We can therefore reduce the problem of minimizing a function of several variables to a single- variable minimization problem, by finding the minimum of ϕ(t) for this choice of u. That is, we find the value of t, for t > 0, that minimizes
ϕ 0 (t) = f (x 0 − t∇f (x 0 )).
After finding the minimizer t 0 , we can set
x 1 = x 0 − t 0 ∇f (x 0 )
and continue the process, by searching from x 1 in the direction of −∇f (x 1 ) to obtain x 2 by minimizing ϕ 1 (t) = f (x 1 − t∇f (x 1 ), and so on. This is the Method of Steepest Descent: given an initial guess x 0 , the method computes a sequence of iterates {xk}, where
xk+1 = xk − tk∇f (xk), k = 0, 1 , 2 ,... ,
where tk > 0 minimizes the function
ϕk(t) = f (xk − t∇f (xk)).
Example We apply the Method of Steepest Descent to the function
f (x, y) = 4x^2 − 4 xy + 2y^2
with initial point x 0 = (2, 3). We first compute the steepest descent direction from
∇f (x, y) = (8x − 4 y, 4 y − 4 x)
to obtain ∇f (x 0 ) = ∇f (2, 3) = (4, 4).
We then minimize the function
ϕ(t) = f ((2, 3) − t(4, 4)) = f (2 − 4 t, 3 − 4 t)
by computing
ϕ′(t) = −∇f (2 − 4 t, 3 − 4 t) · (4, 4) = −(8(2 − 4 t) − 4(3 − 4 t), 4(3 − 4 t) − 4(2 − 4 t)) · (4, 4) = −(16 − 32 t − 12 + 16t, 12 − 16 t − 8 + 16t) · (4, 4) = −(− 16 t + 4, 4) · (4, 4) = 64 t − 32.
This strictly convex function has a strict global minimum when ϕ′(t) = 64t − 32, or t = 1/2, as can be seen by noting that ϕ′′(t) = 64 > 0. We therefore set
x 1 = x 0 −
∇f (x 0 ) = (2, 3) −
Continuing the process, we have
∇f (x 1 ) = ∇f (0, 1) = (− 4 , 4),
We have seen that Newton’s Method can fail to converge to a solution if the initial iterate is not chosen wisely. For certain functions, however, the Method of Steepest Descent can be shown to be much more reliable.
Theorem Let f : Rn^ → R be a coercive function with continuous first partial derivatives on Rn. Then, for any initial guess x 0 , the sequence of iterates produced by the Method of Steepest Descent from x 0 contains a subsequence that converges to a critical point of f.
This result can be proved by applying the Bolzano-Weierstrauss Theorem, which states that any bounded sequence contains a convergent subsequence. The sequence {f (xk)}∞ k=0 is a decreasing sequence, as indicated by a previous theorem, and it is a bounded sequence, because f (x) is continuous and coercive and therefore has a global minimum f (x∗). It follows that the sequence {xk} is also bounded, for a coercive function cannot be bounded on an unbounded set. By the Bolzano-Weierstrauss Theorem, {xk} has a convergent subsequence {xkp }, which can be shown to converge to a critical point of f (x). Intuitively, as xk+1 = xk − t∗∇f (xk) for some t∗^ > 0, convergence of {xkp } implies that
0 = lim p→∞ xkp+1 − xkp = −
kp ∑+1− 1
i=kp
t∗ i ∇f (xi), t∗ i > 0 ,
which suggests the convergence of ∇f (xkp ) to zero. If f (x) is also strictly convex, we obtain the following stronger result about the reliability of the Method of Steepest Descent.
Theorem Let f : Rn^ → R be a coercive, strictly convex function with continuous first partial derivatives on Rn. Then, for any initial guess x 0 , the sequence of iterates produced by the Method of Steepest Descent from x 0 converges to the unique global minimizer x∗^ of f (x) on Rn.
This theorem can be proved by noting that if the sequence {xk} of steepest descent iterates does not converge to x∗, then any subsequence that does not converge to x∗^ must contain a subsequence that converges to a critical point, by the previous theorem, but f (x) has only one critical point, which is x∗, which yields a contradiction.