Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Mathematical Analysis: Differentiability and Hesse Form, Lecture notes of Construction

The concepts of differentiability and the Hesse form in mathematical analysis. It covers topics such as continuous functions, differentiability, partial derivatives, and Taylor's Formula. The document also explains the relationship between differentiability and continuity.

Typology: Lecture notes

2021/2022

Uploaded on 09/12/2022

aaroncastle1
aaroncastle1 🇬🇧

4.3

(8)

223 documents

1 / 34

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
MA225 Differentiation
Thomas Reddington
May 6, 2013
Contents
A Continuous functions from RnRm2
A1 Rnas a normed vector space, convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
A2 Topology of Rn............................................. 6
A3Continuousfunctions.......................................... 9
B Differentiable functions from RnRm12
B1Definitionofthederivative ...................................... 12
B2 Compositions of differentiable functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
B3Intermediatevaluetheorem ...................................... 19
B4 Higher derivatives and Taylor’s formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
B5BanachFixedPointTheorem ..................................... 27
B6ImplicitFunctionTheorem ...................................... 29
Preface
These lecture notes are a projection of the MA225 Differentiation course 2012/2013, delivered by Dr Markus
Kirkilionis at the University of Warwick. The up-to-date version of these notes should be found here:
http://www.tomred.org/lecture-notes.html
Markus’ original handwritten script should be found here:
http://lora.maths.warwick.ac.uk/groups/differentiation/wiki/39327/MA225_Lecture_
Manuscript.html
Students taking this course should also take a look at Alex Wendland’s Dropbox notes:
https://www.dropbox.com/sh/5m63moxv6csy8tn/HCmB8rY7va/Year%202/Differentiation
These notes are, to my knowledge, complete (except from diagrams), but the tedious treasure hunt of errors
will always be an open game. If you spot an error, or you want the source code to fiddle with the notes
in your way, e-mail me at me@tomred.org. Writing these up has been a benefit to me (there aren’t many
foolproof ways to avoid proper work), but most of all I hope they’re helpful, and good luck!
Tom
The lecture will be split into two parts:
A Continuous functions from RnRm.
B Differentiable functions from RnRm
Let us start with A :
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22

Partial preview of the text

Download Mathematical Analysis: Differentiability and Hesse Form and more Lecture notes Construction in PDF only on Docsity!

MA225 Differentiation

Thomas Reddington

May 6, 2013

Contents

A Continuous functions from Rn^ → Rm^2 A1 Rn^ as a normed vector space, convergence.............................. 2 A2 Topology of Rn^............................................. 6 A3 Continuous functions.......................................... 9

B Differentiable functions from Rn^ → Rm^12 B1 Definition of the derivative...................................... 12 B2 Compositions of differentiable functions............................... 15 B3 Intermediate value theorem...................................... 19 B4 Higher derivatives and Taylor’s formula............................... 22 B5 Banach Fixed Point Theorem..................................... 27 B6 Implicit Function Theorem...................................... 29

Preface

These lecture notes are a projection of the MA225 Differentiation course 2012/2013, delivered by Dr Markus Kirkilionis at the University of Warwick. The up-to-date version of these notes should be found here:

http://www.tomred.org/lecture-notes.html

Markus’ original handwritten script should be found here:

http://lora.maths.warwick.ac.uk/groups/differentiation/wiki/39327/MA225_Lecture_ Manuscript.html

Students taking this course should also take a look at Alex Wendland’s Dropbox notes:

https://www.dropbox.com/sh/5m63moxv6csy8tn/HCmB8rY7va/Year%202/Differentiation

These notes are, to my knowledge, complete (except from diagrams), but the tedious treasure hunt of errors will always be an open game. If you spot an error, or you want the source code to fiddle with the notes in your way, e-mail me at me@tomred.org. Writing these up has been a benefit to me (there aren’t many foolproof ways to avoid proper work), but most of all I hope they’re helpful, and good luck! Tom ♥

The lecture will be split into two parts:

A Continuous functions from Rn^ → Rm.

B Differentiable functions from Rn^ → Rm

Let us start with A :

A Continuous functions from Rn^ → Rm

A1 Rn^ as a normed vector space, convergence

The elements of Rn^ are ordered n-tuples of real numbers:

x := (x 1 , ..., xn)

We have componentwise addition:

x + y := (xn + yn, ..., xn + yn)

and componentwise multiplication with scalars:

cx := (cx 1 , ..., cxn)

This makes Rn^ a real-valued vector space of dimension n. The vectors:

e 1 := (1, ..., 0) .. . en := (0, ..., 1)

form the standard basis of Rn.

In one dimension concepts like convergence and continuity had to be discussed with the help of the absolute value, | · |. In Rn^ we replace this with a norm, interpreted as a mapping from Rn^ to R:

Definition A1.1 A function N : Rn^ → R with:

  1. N (x) > 0 for x 6 = 0.
  2. N (cx) = |c| · N (x)
  3. N (x + y) ≤ N (x) + N (y)

is called a norm on Rn.

For n = 1 we therefore must have N (x) = a|x|, with a some positive number. For n > 1 there are more possibilities.

Maximum norm:

‖x‖ = max{|x 1 |, ..., |xn|}

Properties (1) and (2) are clear. To prove (3) we use the triangle inequality for the absolute value. For every i we have: |xi| ≤ ‖x‖ and |yi| ≤ ‖y‖

and therefore: |xi + yi| ≤ |xi| + |yi| ≤ ‖x‖ + ‖y‖

As there is at least one i for which |xi + yi| = ‖x + y‖ we have:

‖x + y‖ ≤ ‖x‖ + ‖y‖

The sequence (xk) is convergent iff for every ε > 0 there exists k 0 such that for all k ≥ k 0 and all p > 0 we have: ‖xk+p − xk‖ < ε

Also the theorem of Bolzano-Weierstrass remains valid:

Theorem A1.2 Every bounded sequence (xk) (i.e. ‖xk‖ < K) has a convergent subsequence.

Proof. We look at all n components of (xk). They are all bounded. Therefore from B-W in dimension 1 they have each a convergent subsequence. We look at component 1 and choose a subsequence of (xk) such

that (x(1) k ) converges. Next we look at component 2 etc. until we reach n.

We next need to make sure that instead of the maximum norm we could have chosen any norm N in all previous definitions. If we can show that:

N (x) ≤ a‖x‖ and ‖x‖ ≤ bN (x)

then: lim k→∞

‖xk − a‖ = 0 ⇐⇒ lim k→∞

N (xk − a)

i.e. we are done and know independency on any specific norm.

Theorem A1.3 There are positive numbers a and b such that for every norm N we have:

N (x) ≤ a‖x‖ and ‖x‖ ≤ bN (x)

for all x ∈ Rn.

Proof. The estimate left follows from (1), (2) and (3) and |xi| ≤ ‖x‖ from:

N (x) = N (x 1 e 1 + ... + xnen) ≤ N (e 1 )|x 1 | + ... + N (en)|xn| ≤ (N (e 1 ) + ... + N (en))‖x‖

We can choose a := N (e 1 ) + ... + N (en). The estimate right is proven indirect. Assume there is no such b > 0. Then one can find for each b = 1, 2 , 3 , ..., k, ... a vector xk such that:

‖xk‖ > kN (xk)

holds.

With (2) we could have for k = 1, 2 , 3 , ...:

N

xk ‖xk‖

k

For yk := (^) ‖xxkk ‖ we then have N (yk) < (^1) k. This will create a contradiction. Because ‖yk‖ = 1 the sequence (yk) is bounded. Therefore there is a subsequence (zk) with limit z, i.e.:

lim z→∞ ‖zk − z‖ = 0

because N (z − zk) ≤ a‖z − zk‖ we have:

N (z) = N (z − zk + zk) ≤ N (z − zk) + N (zk) ≤ a‖z − zk‖ + N (zk)

Because N (yk) < (^1) k we have for (zk) limk→∞ N (zk) = 0. Therefore N (z) = 0, i.e. we would conclude z = 0. On the other hand we have: zk = z + (zk − z) and ‖zk‖ = 1

Therefore: 1 ≤ ‖z‖ + ‖zk − z‖

So ‖z‖ > 0, and therefore z 6 = 0. This proves:

‖x‖ ≤ bN (x)

for some positive b and for all x ∈ Rn.

This theorem should be generalised:

Definition A1.3 A norm N ′^ is called equivalent to norm N if there are positive numbers a, b such that for all x ∈ Rn^ we have: aN (x) ≤ N ′(x) ≤ bN (x)

We have:

Theorem A1.4 Every pair of norms is equivalent on Rn.

Proof. We have already shown that every norm N is equivalent to the maximum norm. We have to check equivalence of norms forms and equivalence relationship:

  1. Obviously every norm is equivalent to itself. [Reflexivity]
  2. From aN (x) ≤ N ′(x) ≤ bN (x) follows:

1 b N ′(x) ≤ N (x) ≤

a N ′(x) [Symmetry]

  1. Given aN (x) ≤ N ′(x) ≤ bN (x) and a′N ′(x) ≤ N ′′(x) ≤ b′N ′(x), we get:

aa′N (x) ≤ N ′′(x) ≤ bb′N (x) [Transitivity]

For example the maximum norm and Euclidean norm are equivalent. It holds:

‖x‖ ≤ |x| ≤

n‖x‖ (check!)

Infinite sums:

Like in R an infinite sum (row) is defined as a sequence (sk) with:

sk :=

∑^ k

j=

xj

If limk→∞ sk exists, the limit is called the sum s:

s :=

∑^ ∞

k=

sk

The infinite sum (sk) is called absolutely convergent if:

∑^ ∞

k=

‖xk‖

  • Cut out the open cube of the unit cube Q from the centre with boundary interval length 13 (see figure).
  • Do the same construction again in all remaining cubes not yet covered on the respective local scale.
  • Proceed to infinity.

Let us denote all area of Q not covered during the construction by M. Then Q \ M = C, with C closed. C is called the Cantor set.

  • C is the set of boundary points of M.
  • Every point in Q is a limit point of M (check!).

Compact sets:

Theorem A2.3 Let M ⊂ Rn. The following statements are equivalent:

  1. M is bounded and closed.
  2. Every cover of M with open sets allows the choice of finitely many of them, such that M is covered (Heine-Borel property).
  3. Every infinite subset of M has a limit point in M.

Proof. We show (1) =⇒ (2) =⇒ (3) =⇒ (1).

To show (1) =⇒ (2) we use an interaction. Because M is bounded there exists a closed cube W which contains M. Assume there is an open cover C of M , which does not have the Heine-Borel property. Next we can divide W into 2n^ closed cubes which all have half of the length of edges of previous iterations on W (see figure for dim = 2). With our assumption there is at least one such cube such that its dissection with M cannot be covered with finitely many sets from C, the cover. We choose such a cube and call it W (1). Following the construction we can derive a sequence (W (k)) of closed cubes such that:

(i) W ⊃ W (1)^ ⊃ ... ⊃ W (k)^ ⊃ ...,

(ii) limk→∞ δ(W (k)) = 0 (with δ(·) being the diameter of the cube), (iii) M ∩ W (k)^ cannot be covered with finitely many sets from C.

The remainder of the proof will be given after the following theorem.

The sequence M ∩ W (k)^ satisfies the condition of the theorem of Cantor:

Theorem A2.4 (Cantor) Let (Ak) be a decreasing sequence of bounded closed sets, which diameters δ(Ak) converge to zero, i.e. A 1 ⊃ A 2 ⊃ ... ⊃ Ak ⊃ ... and lim k→∞

δ(Ak) = 0

Then there is exactly one point x which belongs to all sets Ak:

⋂^ ∞

k=

Ak = {x}

Proof (Cantor). Choose an arbitrary point xk from each set Ak. One obtains a Cauchy-sequence because Ak+p⊂Ak for every p ≥ 0 and: ‖xk+p − xk‖ ≤ δ(Ak)

Therefore (xk) converges. Let us call the limit x. Every other possibly chosen sequence (yk) also converges to x because: ‖xk − yk‖ ≤ δ(Ak)

x is therefore unique, but it is also an element of all Ak. If not a k 0 would exist such that x 6 ∈ Ak 0 , and x 6 ∈ Ak 0 +p, with p ≥ 0. There would be a neighbourhood of x not contained in Ak 0 +p, p ≥ 0. This contradicts that we proved (xk) converges to x.

Proof of A2.3 (continued). We know now there is exactly one point x belonging to all sets M ∩ W (k), and therefore x ∈ M as well. By assumption (W has a nonempty open interior!) there is an open set O ∈ C with x ∈ O. For all sufficiently large k the cubes satisfy W (k)^ ⊂ O, as x is an interior point of O. To cover M ∩ W (k)^ we need only a single open set O ∈ C, whereas by (iii) we would need infinitely many sets from C. This is a contradiction, so (1) =⇒ (2) is proven.

We next prove (2) =⇒ (3), again indirectly. Assume M has an infinite subset (a set with infinitely many points) A, which has no limit points in M (“¬(3)”). Every point in A possesses an open ball which has no further elements in A. Also, every point in M \ A possesses an open ball around it with no elements in A. The set of all these open balls forms a cover of M. Because M has the Heine-Borel property finitely many open balls are sufficient to cover M. Therefore they also form a cover of A. Because each of these balls contains only one element of A, A must be finite (only contains finitely many points). This contradicts the assimption that A is infinite, so (2) =⇒ (3) is proven.

To prove (3) =⇒ (1) we assume M is not bounded. Then for every k ∈ N there exists xk ∈ M with:

‖xk‖ ≥ k

Assemble all xk, k = 1, 2 , ... in a set A. This set contains no limit point y, otherwise infinitely many elements of A would be in a 1-neighbourhood (ε=1!) of y, in contradiction to the construction. Therefore M is bounded. Assume M is not closed. Then there would be a limit point x of M which does not belong to M. For this x and every natural number k ≥ 1 we would have a point xk ∈ M such that:

‖xk − x‖ <

k

The infinite but countable set A has exactly one limit point in Rn, and this is x. By assumption (3) this limit point of A ⊂ M belongs to M : contradiction! Therefore (3) =⇒ (1) is proven.

Definition A2.1 A set M ⊂ Rn^ is called compact if it satisfies either (1), (2) or (3).

Remark: In infinite-dimensional vector spaces (1) is not equivalent to (2) or (3)!

Another property of M equivalent to (1) is:

  1. Let M ⊂ Rn. Every sequence with values in M possesses a convergent subsequence with a limit (point) in M.

We show equivalence with (3):

Proof. If (3) holds consider first a sequence with finitely many different values in M. Then most of the values of this sequence are constant, i.e. convergent. If this sequence has infinitely many different values in M then by (3) it has a limit point in M. Now let (4) be satisfied and let B be an arbitrary infinite subset of M. First choose a countable infinite subset A of B. Let the values of A be assembled as a sequence. As these values belong to M this sequence by property (4) has a convergent subsequence with limit in M. This limit is a limit point of A, and therefore of M.

As a simple consequence we get a generalised version of the theorem of Bolzano-Weierstrass.

Theorem A2.5 (Bolzano-Weierstrass) Every infinite bounded subset M of Rn^ possesses at least one limit point in Rn.

Proof. M is contained in a compact cube. Because of (3) M has a limit point in this cube.

Let now m = 1 and n = 2. Then for example:

y =

1 − x^21 − x^22 (x^21 + x^22 ≤ 1 defines D!)

maps the unit disk D ⊂ R^2 to R^1 , the result is the upper half of the unit ball in R^3.

So D = {x ∈ R^2 : x^21 + x^22 ≤ 1 } and I, the image is:

I = [0, 1], f : D → I

The set: Lc := {x ∈ Rn^ : f (x) = c},

With f : Rn^ → R is called the level set (for value c). In our upper ball example the set L 1 consists of any single point i.e. the origin { 0 = (0, 0)}. For 0 < c < 1 the level sets are circles. An important case of mappings f : Rn^ → Rm^ with m = n are “new coordinates”. For example:

x = f (r, ϕ) = r · cos ϕ y = g(r, ϕ) = r · sin ϕ

describes the transition to polar coordinates in R^2. Even more important are vector fields: for every point x ∈ D ⊂ R, f (x) ∈ Rn^ defines a vector of length ‖f (x)‖ located at x:

Continuity:

Definition A3.1 Let D ⊂ Rn^ and f : D → Rm. Then f is called continuous in a ∈ D if for every ε > 0 there is a δ > 0 such that for all x ∈ D with ‖x − a‖ < δ we have:

‖f (x) − f (a)‖ < ε

Remark: It is not important which norms we use, of course both norms appearing in the definition could be different.

Alternative definition of continuity:

Definition A3.2 f is called continuous in a ∈ D if for every neighbourhood V of f (a) there is a neigh- bourhood U of a such that: f (U ∩ D) ⊂ V

If f is continuous it maps a sequence (xk) ∈ D (with (xk) convergent to a) to a sequence (f (xk)), convergent to f (a) (check!).

Let a now be a limit point of D. We write:

lim x→a f (x) = b

If there is for every ε > 0 a δ > 0 such that:

x ∈ D, x 6 = a, ‖x − a‖ < δ =⇒ ‖f (x) − b‖ < ε

Remark: a needs not to be an element of D 1. If a ∈ D then f (a) does play no role in the definition.

We have therefore another definition of continuity:

f continuous ⇐⇒ lim x→a f (x) = f (a)

This is often written as: lim x→a f (x) = f

lim x→a x

All the previous examples of functions f have been continuous. Example of f being not continuous: Divide Rn^ into rational and non-rational points. If x 1 , ..., xn ∈ Q then define f (x) = 0 otherwise f (x) = 1 (i.e. for x 1 , ..., xn 6 ∈ Q).

Consider:

f (x, y) =

2 xy x^2 +y^2 for^ x

(^2) + y (^2) > 0

0 for x = 0, y = 0

Then f is continuous on R^2 with the exception of (0, 0). Consider first the x and y axis, here f is 0 (so could be continuous...), but in a neighbourhood of (0, 0) we find:

f (x, x) =

2 x^2 x^2 + x^2

= 1 (x 6 = 0)

This example shows we can have “partial continuity”.

the increment f (p + h) − f (p) for sufficiently small h is well approximated by the scalar product 〈∇f (p), h〉. We obtain the following interpretation:

∇f (b) points in direction of the largest increment of f at location p. The scalar product 〈∇f (p), h〉 (as a function of h) has its largest value if h is a positive multiple of f (p).

The function values of f can be illustrated if one looks at the level sets:

Lc = {x ∈ Rn^ : f (x) = c, c ∈ R}

Vectors h orthogonal to ∇f (p) lead to points in the neighbourhood of p which function values only differ minimally from f (p).

We next look at the graph of a function f : D → R, D ∈ Rn. The graph of f builds a hypersurface in Rn+1, at each x = (x 1 , ..., xn) ∈ D there is a y = f (x 1 , ..., xn). In a neighbourhood of p ∈ D this hypersurface (graph of f ) can be approximated by the hyperplane:

y = f (p) + 〈∇f (p, x − p)〉

We will call this hyperplane the tangential hyperplane of f at location p. The vector:

(−∇f (p), 1) ∈ Rn+

is a normal vector on the tangential hyperplane (and therefore at the hypersurface) at location (p, f (p))

Partial derivatives

How can we determine f ′(p) of f differentiable at point p ∈ D? The linear mapping f ′(p) is determined if we fix a basis v 1 , ..., vn of Rn. Let us consider v ∈ Rn, v 6 = 0. We like to compute f ′(p, v), and set x = p + tv, t ∈ R. So we get:

f (p + tv) = f (p) + f ′(p, tv) + R(p + tv)|t| · ‖v‖

For t 6 = 0 we get:

f ′(p, v) =

f (p + tv) − f (p) t

− R(p + tv)

|t| t

‖v‖

and:

f ′(p, v) = lim t→ 0

f (p + tv) − f (p) t The expression right of the limit is called directional derivative. Dvf (p) of f at location p with respect of v. If we choose for v one of the standard basis vectors e 1 , ..., en of Rn, then the respective directional derivatives are called partial derivatives. There are different notations for partial derivatives in the literature. The i-th partial derivative can be written:

Dif (x),

∂f ∂xi

(x), or fxi (x)

although the last one should be abandoned. Instead one sometimes uses just fi(x), but this can lead to misunderstandings if f is not real-valued anymore. The task to compute partial derivatives can be solved by means of differential calculus in one variable. All arguments besides the i-th argument remain fixed. We have to compute:

lim xi→pi

f (p 1 , ..., xi, .., pn) − f (p 1 , ..., pi, ..., pn) xi − pi

This limit can exist even if our function is not differentiable in P according to our definition. If some function f : D → Rm, D ⊂ Rn^ is differentiable at p then all partial derivatives exist. We have for h =

∑n i=1 hiei:

f ′(p, h) =

∑^ n

i=

f ′(p, ei) · hi =

∑^ n

i=

Dif (p) · hi

If the vectors Dif (p) are determined with the help of the standard basis of Rm, i.e. one dissects f into the components f 1 , ..., fm then f ′(p) can be written as the m × n matrix:

Jf (p) := f ′(p) =

D 1 f 1 (p) D 2 f 1 (p) · · · Dnf 1 (p) .. .

D 1 fm(p) D 2 fm(p) · · · Dnfm(p)

This matrix is also called the Jacobian-matrix. Now differentiability can be checked with partial derivatives. First the existence of all partial derivatives ∂f ∂xji (p) = Difj (p) must be assured. Then one checks that the matrix Jf (p) satisfies:

lim h→ 0

f (p + h) − f (p) − Jf (p)h ‖h‖

Important: The existence of the partial derivatives of f at p alone does not imply differentiability of f at p!

Example: Let f : R^2 → R^2 , with:

f 1 (x 1 , x 2 ) = x^21 − x^22 f 2 (x 2 , x 2 ) = 2x 1 x 2

Then Jf (p) becomes:

Jf (p) =

2 p 1 − 2 p 2 2 p 2 2 p 1

We compute R(h) related to Jf (p) component-wise:

R 1 (h) = (p 1 + h 1 )^2 − (p 2 + h 2 )^2 − p^21 − p^22 − 2 p 1 h 1 + 2p 2 h 2 ‖h‖

h^21 − h^22 ‖h‖

R 2 (h) =

2(p 1 + h 1 )(p 2 − h 2 ) − 2 p 1 p 2 − 2 p 2 h 1 − 2 p 1 h 2 ‖h‖

2 h 1 h 2 ‖h‖

Using the maximum norm we get:

|τ 1 (h)| ≤ 2 ‖h‖ |τ 2 (h)| ≤ 2 ‖h‖

i.e. lim h→ 0 R 1 (h) = lim h→ 0 R 2 (h) = 0

It follows f is differentiable at p. The derivative is:

f ′(p) =

2 p 1 − 2 p 2 2 p 2 2 p 1

= Jf (p)

Definition B1.3 If f : D → Rm, D ⊂ Rn^ is differentiable for all p ∈ D, D open, then f is called differen- tiable in D. If the derivative f ′(p) is continuous for all p ∈ D then f is called continuously differentiable in D. We write: f ∈ C^1 (D, Rm)

i.e. C^1 (D, Rm) is the set of all functions f : D → Rm^ which are continuously differentiable. C^1 (D, Rm) is a function space (of infinite dimension!).

By adding both equation families (brackets) we obtain the desired result. Using the first bracket and multiplying with c proves the second claim.

Theorem B2.2 Let f : D → R and g : D → R be differentiable at p ∈ D, D ⊂ Rn. Then f · g is differentiable at p ∈ D with:

(f · g)′(p, h) = f (p) · g′(p, h) + g(p) · f ′(p, h)

Proof. After multiplication and re-arrangement we get:

(f g)(p + h) = (f g)(p) + f (p) · g′(p, h) + g(p) · f ′(p, h)

  • f ′(p, h)g′(p, h) + (g(p) + g′(p, h))R 1 (p + h)‖h‖
  • (f (p) + f ′(p, h))R 2 (p + h)‖h‖

We can estimate |f ′(p, h) · g′(p, h)| by c‖h‖^2 from above. It follows that the last four terms of the sum (establishing the error term) can be written in the form: { R(h)‖h‖ limh→ 0 R(h) = 0

Remarks: The product rule can be written in the form:

∇f g(p) = f (p)∇g(p) + g(p)∇f (p)

If f (p) 6 = 0, f : D → R, D ⊂ Rn, f differentiable in p, then:

f

(p) = −

(f (p))^2

∇f (p)

If f : D → R, D ⊂ Rn, g : D → Rm, f and g differentiable in p, then:

(f g)′(p, h) = f (p)g′(p, h) + f ′(p, h)g(p)

Let [·, ·] be a bilinear product (not necessarily symmetric), with:

[·, ·] : Rp^ × Rq^ → Rm

Let f : D → Rp, g : D → Rq^ , D ⊂ Rn, f and g differentiable in p. Then:

[f, g]′(p, h) = [f (p), g′(p, h)] + [f ′(p, h), g(p)]

The order of terms is important as long as [·, ·] is not symmetric. The proof of this can be made directly or

with Theorem B2.2 component-wise. Check!

We are now turning to the chain rule: Theorem B2.3 Let f : D → Rm, D ⊂ Rn, f differentiable in p ∈ D, and g : E → Rk, E ⊂ Rm, g differentiable in q ∈ E, with f (D) ⊂ E, and q = f (p). Then g ◦ f is differentiable at p ∈ D and it holds:

(g ◦ f )′(p) = g′(f (p)) ◦ f ′(p)

Proof. By assumption, and using the compact notation introduced earlier in B2 we have:

f (p + h) = f (p) + F (h)h, lim h→ 0 F (h) = f ′(p)

g(q + l) = g(q) + G(l)l, lim l→ 0

G(l) = g′(q)

h ∈ Rn, l ∈ Rm. WLOG let p = 0 , f (p) = 0 , g(q) = 0. In this case:

f (h) = F (h)h, lim h→ 0 F (h) = f ′( 0 )

g(l) = G(l)l, lim l→ 0 G(l) = g′( 0 )

We obtain for l = f (h): g(f (h)) = (G(F (h)h) ◦ F (h))h.

This shows the chain rule, as we have:

lim h→ 0

G(F (h)h) ◦ F (h) = g′( 0 ) ◦ f ′( 0 ).

This means g ◦ f is differentiable at 0 , and the derivative of the chain is equal to the chained derivatives. If we go back to an arbitrary location p and q = f (p) then indeed:

(g ◦ f )′(p) = g′(f (p)) ◦ f ′(p)

Remark: Using the notation f ′(p, h) and g′(q, l) the chain rule becomes:

(g ◦ f )′(p, h) = g′(f (p), f ′(p, h)).

The chain rule implies the following structure of the Jacobian matrices Jg, Jf and J(g ◦ f ) =: JH, i.e. H := g ◦ f : (^) 

 

D 1 H 1 · · · DnH 1 .. .

D 1 Hk · · · DnHk

(k × n)-matrix

D 1 g 1 · · · Dmg 1 .. .

D 1 gk · · · Dmgk

(k × m)-matrix

D 1 f 1 · · · Dnf 1 .. .

D 1 fm · · · Dnfm

(m × n)-matrix

We have JH = JH(p) and Jf = Jf (p), but Jg = Jg(f (p)). For the partial derivatives of H 1 , ..., Hk we get:

DiHj =

∑^ m

l=

Dlgj · Difl (1 ≤ i ≤ n, 1 ≤ j ≤ k)

The vector ∇H at p ∈ D ⊂ Rn^ is the image of ∇g ∈ Rm^ at q = f (p) under the linear mapping f ′(p) = Jf (p).

Diffeomorphisms

Consider f : D → Rn, D ⊂ Rn^ possesses the inverse mapping g : E → Rn, E ⊂ Rn^ open. Moreover f is differentiable at p ∈ D and g is differentiable at q = f (p) ∈ E. Then using the chain rule it is possible to express g′(q) by f ′(p). We have g ◦ f = ID , (g ◦ f )′(p) = IRn (IX : identity mapping on X), so:

g′(q) ◦ f ′(p) = IRn , g′(f (p)) = (f ′(p))−^1.

We even have the following theorem with slightly weaker assumptions: Theorem B2.4 Let f : D → Rn^ (D ⊂ Rn) be invertible (i.e. bijective) and differentiable at p ∈ D with Jf (p) 6 = 0. The inverse map g : f (D) → Rn^ is continuous at q = f (p). Then g is differentiable at p and we have: g′(q) = (f ′(p))−^1

If f : D → Rn, D ⊂ Rn, continuously differentiable and invertible (for all x ∈ D), then g := f −^1 is not necessarily differentiable. But if g is even continuously differentiable, then f becomes a diffeomorphism: Definition B2.1 Let f : D → Rn, D ⊂ Rn^ be continuously differentiable and invertible on D, and If = f (D) ⊂ Rn^ open. If the inverse mapping:

f −^1 : If → Rn

is again continuously differentiable, then f and f −^1 are called differomorphisms. For example the polar coordinate transformation is a diffeomorphism.

B3 Intermediate value theorem

Differentiable curves and lines in Rn

Consider a mapping: γ : D → Rn, with D ⊂ R

i.e. γ = γ(t) ∈ Rn, t ∈ D ⊂ R. Then γ is called a curve in Rn^ with parameterisation t. We assume now γ is differentiable on D. Then at t 0 ∈ D we have:

γ′(t 0 ) = lim t→t 0

γ(t) − γ(t 0 ) t − t 0

and each component γ 1 ′(t 0 ), ..., γ′ n(t 0 ) is differentiable in t 0 in the one-dimensional sense. The derivative γ′(t 0 ) has the following interpretation. Define g(t) by:

g(t) := γ(t 0 ) + γ′(t 0 )(t − t 0 ).

Then g(t) is defining a line in Rn^ parameterised by t. This line approximates the curve γ(t) at t 0 very well, i.e.:

lim t→t 0

‖γ(t) − g(t)‖ t − t 0

We also have a kinematic interpretation. The limit vector γ′(t 0 ) is the current velocity vector at t 0 , and |γ′(t 0 )| the current velocity at t 0 , with | · | being the Euclidean norm. Also:

γ′(t 0 ) |γ′(t 0 )|

is called the unit tangential vector or normal vector of γ at t 0. If γ′^ : [a, b] =: D → Rn^ is continuous, then γ is called smooth. γ is called piecewise smooth if it is composed of finitely many(!) smooth curves.

Definition B3.1 Let γ : [a, b] → Rn^ be a piecewise smooth curve. Then:

L(γ) :=

∫ (^) b

a

|γ′(t)| dt

is called the length of curve γ.

Examples: Let:

γ(t) =

γ 1 (t) γ 2 (t)

cos t sin t

(t ∈ [0, 2 π])

be the unit circle (in R^2 ). The velocity vectors:

γ′(t) =

γ′ 1 (t) γ′ 2 (t)

− sin t cos t

has Euclidean norm 1, so L(γ) =

∫ (^2) π 0 dt^ = 2π. Let now^ a, b^ ∈^ R

n (^) and:

γ(t) = (1 − t)a + tb (t ∈ [0, 1])

Then:

γ′(t) = b − a, |γ′(t)| = |b − a| and L(γ) =

0

|b − a| dt = |b − a|.

γ(t) is the line segment between a and b.

Theorem B3.1 Let f : D → R, D ⊂ Rn, be differentiable, and let the line between points a, b ∈ D be contained in D. Then there is a point c on the line between a and b such that:

f (b) − f (a) = f ′(c, b − a)

Proof. We can describe the line between a and b by:

γ(t) = a + t(b − a) (0 ≤ t ≤ 1)

Then γ is clearly differentiable, and we have:

γ′(t) = b − a

which means γ is constant. Let F : [0, 1] → R be defined by:

F (t) = f (γ(t)).

We can apply the one-dimensional intermediate value theorem. Therefore there is a ν ∈]0, 1[ such that:

F (1) − F (0) = F ′(ν)

Using the chain rule we get: F ′(t) = f ′(γ(t), γ′(t))

Because F (0) = f (a), F (1) = f (b), γ(ν) =: c is a point on the line between a and b, moreover γ′(ν) = b − a, the theorem is proven.

Remark: It is possible to replace the line between a and b by a differentiable curve γ connecting a and b. In this case the “increment” f (b) − f (a) can be expressed as:

f (b) − f (a) = f ′(γ(ν), γ′(ν)) (0 < ν < 1)

Using x + h instead of a and b one obtains another formulation of the intermediate value theorem:

f (x + h) − f (x) = f ′(x + νh, h) (0 < ν < 1)

With the help of the intermediate value theorem we can characterise constant functions:

Theorem B3.2 Let D ⊂ Rn^ be open, and let there exist a differentiable curve connecting any two points of D such that the curve is entirely contained in D (D is called path connected ). Let f : D → R be differentiable. Then:

f is constant ︸ ︷︷ ︸ (a)

⇐⇒ f ′(x, h) = 0 for all x ∈ D, and all h ∈ Rn. ︸ ︷︷ ︸ (b)

Proof. (a) =⇒ (b) is trivial, even in case D is not path connected. Now consider (b) =⇒ (a). Choose two points in D, say a and b. Connect a and b by finitely many line segments, such that the sequence of vertices connected by lines becomes a = a 0 , a 1 , ..., ak = b. Apply the mean-value theorem to the line segment joining aj , aj+1. Because f ′(cj , aj+1 − aj ) = 0 for some cj on the line segment we have:

f (aj+1) = f (aj ) for j = 0, 1 , ..., k − 1

This implies f (b) = f (a).