Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Method of Steepest Descent: Finding the Minimum of a Function, Study notes of Linear Programming

The method of steepest descent is an iterative method used to find the minimum of a differentiable function when an analytical solution is not possible. It involves finding the direction of steepest descent and minimizing the function in that direction to obtain the next iterate. This process is repeated until convergence to the minimum is achieved.

What you will learn

  • How is the Method of Steepest Descent related to the direction of steepest descent?
  • How does the Method of Steepest Descent work?
  • What is the role of the gradient in the Method of Steepest Descent?

Typology: Study notes

2021/2022

Uploaded on 09/27/2022

dukenukem
dukenukem 🇬🇧

3.9

(8)

242 documents

1 / 4

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Jim Lambers
MAT 419/519
Summer Session 2011-12
Lecture 10 Notes
These notes correspond to Section 3.2 in the text.
The Method of Steepest Descent
When it is not possible to find the minimium of a function analytically, and therefore must use
an iterative method for obtaining an approximate solution, Newton’s Method can be an effective
method, but it can also be unreliable. Therefore, we now consider another approach.
Given a function f:RnRthat is differentiable at x0, the direction of steepest descent is the
vector −∇f(x0). To see this, consider the function
ϕ(t) = f(x0+tu),
where uis a unit vector; that is, kuk= 1. Then, by the Chain Rule,
ϕ0(t) = ∂f
∂x1
∂x1
∂t +· · · +∂f
∂xn
∂xn
∂t
=∂f
∂x1
u1+· · · +f
∂xn
un
=f(x0+tu)·u,
and therefore
ϕ0(0) = f(x0)·u=k∇f(x0)kcos θ,
where θis the angle between f(x0) and u. It follows that ϕ0(0) is minimized when θ=π, which
yields
u=f(x0)
k∇f(x0)k, ϕ0(0) = −k∇f(x0)k.
We can therefore reduce the problem of minimizing a function of several variables to a single-
variable minimization problem, by finding the minimum of ϕ(t) for this choice of u. That is, we
find the value of t, for t > 0, that minimizes
ϕ0(t) = f(x0tf(x0)).
After finding the minimizer t0, we can set
x1=x0t0f(x0)
1
pf3
pf4

Partial preview of the text

Download Method of Steepest Descent: Finding the Minimum of a Function and more Study notes Linear Programming in PDF only on Docsity!

Jim Lambers MAT 419/ Summer Session 2011- Lecture 10 Notes

These notes correspond to Section 3.2 in the text.

The Method of Steepest Descent

When it is not possible to find the minimium of a function analytically, and therefore must use an iterative method for obtaining an approximate solution, Newton’s Method can be an effective method, but it can also be unreliable. Therefore, we now consider another approach. Given a function f : Rn^ → R that is differentiable at x 0 , the direction of steepest descent is the vector −∇f (x 0 ). To see this, consider the function

ϕ(t) = f (x 0 + tu),

where u is a unit vector; that is, ‖u‖ = 1. Then, by the Chain Rule,

ϕ′(t) =

∂f ∂x 1

∂x 1 ∂t

∂f ∂xn

∂xn ∂t

=

∂f ∂x 1

u 1 + · · · +

∂f ∂xn

un

= ∇f (x 0 + tu) · u,

and therefore ϕ′(0) = ∇f (x 0 ) · u = ‖∇f (x 0 )‖ cos θ,

where θ is the angle between ∇f (x 0 ) and u. It follows that ϕ′(0) is minimized when θ = π, which yields

u = −

∇f (x 0 ) ‖∇f (x 0 )‖

, ϕ′(0) = −‖∇f (x 0 )‖.

We can therefore reduce the problem of minimizing a function of several variables to a single- variable minimization problem, by finding the minimum of ϕ(t) for this choice of u. That is, we find the value of t, for t > 0, that minimizes

ϕ 0 (t) = f (x 0 − t∇f (x 0 )).

After finding the minimizer t 0 , we can set

x 1 = x 0 − t 0 ∇f (x 0 )

and continue the process, by searching from x 1 in the direction of −∇f (x 1 ) to obtain x 2 by minimizing ϕ 1 (t) = f (x 1 − t∇f (x 1 ), and so on. This is the Method of Steepest Descent: given an initial guess x 0 , the method computes a sequence of iterates {xk}, where

xk+1 = xk − tk∇f (xk), k = 0, 1 , 2 ,... ,

where tk > 0 minimizes the function

ϕk(t) = f (xk − t∇f (xk)).

Example We apply the Method of Steepest Descent to the function

f (x, y) = 4x^2 − 4 xy + 2y^2

with initial point x 0 = (2, 3). We first compute the steepest descent direction from

∇f (x, y) = (8x − 4 y, 4 y − 4 x)

to obtain ∇f (x 0 ) = ∇f (2, 3) = (4, 4).

We then minimize the function

ϕ(t) = f ((2, 3) − t(4, 4)) = f (2 − 4 t, 3 − 4 t)

by computing

ϕ′(t) = −∇f (2 − 4 t, 3 − 4 t) · (4, 4) = −(8(2 − 4 t) − 4(3 − 4 t), 4(3 − 4 t) − 4(2 − 4 t)) · (4, 4) = −(16 − 32 t − 12 + 16t, 12 − 16 t − 8 + 16t) · (4, 4) = −(− 16 t + 4, 4) · (4, 4) = 64 t − 32.

This strictly convex function has a strict global minimum when ϕ′(t) = 64t − 32, or t = 1/2, as can be seen by noting that ϕ′′(t) = 64 > 0. We therefore set

x 1 = x 0 −

∇f (x 0 ) = (2, 3) −

Continuing the process, we have

∇f (x 1 ) = ∇f (0, 1) = (− 4 , 4),

We have seen that Newton’s Method can fail to converge to a solution if the initial iterate is not chosen wisely. For certain functions, however, the Method of Steepest Descent can be shown to be much more reliable.

Theorem Let f : Rn^ → R be a coercive function with continuous first partial derivatives on Rn. Then, for any initial guess x 0 , the sequence of iterates produced by the Method of Steepest Descent from x 0 contains a subsequence that converges to a critical point of f.

This result can be proved by applying the Bolzano-Weierstrauss Theorem, which states that any bounded sequence contains a convergent subsequence. The sequence {f (xk)}∞ k=0 is a decreasing sequence, as indicated by a previous theorem, and it is a bounded sequence, because f (x) is continuous and coercive and therefore has a global minimum f (x∗). It follows that the sequence {xk} is also bounded, for a coercive function cannot be bounded on an unbounded set. By the Bolzano-Weierstrauss Theorem, {xk} has a convergent subsequence {xkp }, which can be shown to converge to a critical point of f (x). Intuitively, as xk+1 = xk − t∗∇f (xk) for some t∗^ > 0, convergence of {xkp } implies that

0 = lim p→∞ xkp+1 − xkp = −

kp ∑+1− 1

i=kp

t∗ i ∇f (xi), t∗ i > 0 ,

which suggests the convergence of ∇f (xkp ) to zero. If f (x) is also strictly convex, we obtain the following stronger result about the reliability of the Method of Steepest Descent.

Theorem Let f : Rn^ → R be a coercive, strictly convex function with continuous first partial derivatives on Rn. Then, for any initial guess x 0 , the sequence of iterates produced by the Method of Steepest Descent from x 0 converges to the unique global minimizer x∗^ of f (x) on Rn.

This theorem can be proved by noting that if the sequence {xk} of steepest descent iterates does not converge to x∗, then any subsequence that does not converge to x∗^ must contain a subsequence that converges to a critical point, by the previous theorem, but f (x) has only one critical point, which is x∗, which yields a contradiction.

Exercises

  1. Chapter 3, Exercise 8
  2. Chapter 3, Exercise 11
  3. Chapter 3, Exercise 12