


























Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
The concepts of differentiability and the Hesse form in mathematical analysis. It covers topics such as continuous functions, differentiability, partial derivatives, and Taylor's Formula. The document also explains the relationship between differentiability and continuity.
Typology: Lecture notes
1 / 34
This page cannot be seen from the preview
Don't miss anything!
A Continuous functions from Rn^ → Rm^2 A1 Rn^ as a normed vector space, convergence.............................. 2 A2 Topology of Rn^............................................. 6 A3 Continuous functions.......................................... 9
B Differentiable functions from Rn^ → Rm^12 B1 Definition of the derivative...................................... 12 B2 Compositions of differentiable functions............................... 15 B3 Intermediate value theorem...................................... 19 B4 Higher derivatives and Taylor’s formula............................... 22 B5 Banach Fixed Point Theorem..................................... 27 B6 Implicit Function Theorem...................................... 29
These lecture notes are a projection of the MA225 Differentiation course 2012/2013, delivered by Dr Markus Kirkilionis at the University of Warwick. The up-to-date version of these notes should be found here:
http://www.tomred.org/lecture-notes.html
Markus’ original handwritten script should be found here:
http://lora.maths.warwick.ac.uk/groups/differentiation/wiki/39327/MA225_Lecture_ Manuscript.html
Students taking this course should also take a look at Alex Wendland’s Dropbox notes:
https://www.dropbox.com/sh/5m63moxv6csy8tn/HCmB8rY7va/Year%202/Differentiation
These notes are, to my knowledge, complete (except from diagrams), but the tedious treasure hunt of errors will always be an open game. If you spot an error, or you want the source code to fiddle with the notes in your way, e-mail me at me@tomred.org. Writing these up has been a benefit to me (there aren’t many foolproof ways to avoid proper work), but most of all I hope they’re helpful, and good luck! Tom ♥
The lecture will be split into two parts:
A Continuous functions from Rn^ → Rm.
B Differentiable functions from Rn^ → Rm
Let us start with A :
The elements of Rn^ are ordered n-tuples of real numbers:
x := (x 1 , ..., xn)
We have componentwise addition:
x + y := (xn + yn, ..., xn + yn)
and componentwise multiplication with scalars:
cx := (cx 1 , ..., cxn)
This makes Rn^ a real-valued vector space of dimension n. The vectors:
e 1 := (1, ..., 0) .. . en := (0, ..., 1)
form the standard basis of Rn.
In one dimension concepts like convergence and continuity had to be discussed with the help of the absolute value, | · |. In Rn^ we replace this with a norm, interpreted as a mapping from Rn^ to R:
Definition A1.1 A function N : Rn^ → R with:
is called a norm on Rn.
For n = 1 we therefore must have N (x) = a|x|, with a some positive number. For n > 1 there are more possibilities.
Maximum norm:
‖x‖ = max{|x 1 |, ..., |xn|}
Properties (1) and (2) are clear. To prove (3) we use the triangle inequality for the absolute value. For every i we have: |xi| ≤ ‖x‖ and |yi| ≤ ‖y‖
and therefore: |xi + yi| ≤ |xi| + |yi| ≤ ‖x‖ + ‖y‖
As there is at least one i for which |xi + yi| = ‖x + y‖ we have:
‖x + y‖ ≤ ‖x‖ + ‖y‖
The sequence (xk) is convergent iff for every ε > 0 there exists k 0 such that for all k ≥ k 0 and all p > 0 we have: ‖xk+p − xk‖ < ε
Also the theorem of Bolzano-Weierstrass remains valid:
Theorem A1.2 Every bounded sequence (xk) (i.e. ‖xk‖ < K) has a convergent subsequence.
Proof. We look at all n components of (xk). They are all bounded. Therefore from B-W in dimension 1 they have each a convergent subsequence. We look at component 1 and choose a subsequence of (xk) such
that (x(1) k ) converges. Next we look at component 2 etc. until we reach n.
We next need to make sure that instead of the maximum norm we could have chosen any norm N in all previous definitions. If we can show that:
N (x) ≤ a‖x‖ and ‖x‖ ≤ bN (x)
then: lim k→∞
‖xk − a‖ = 0 ⇐⇒ lim k→∞
N (xk − a)
i.e. we are done and know independency on any specific norm.
Theorem A1.3 There are positive numbers a and b such that for every norm N we have:
N (x) ≤ a‖x‖ and ‖x‖ ≤ bN (x)
for all x ∈ Rn.
Proof. The estimate left follows from (1), (2) and (3) and |xi| ≤ ‖x‖ from:
N (x) = N (x 1 e 1 + ... + xnen) ≤ N (e 1 )|x 1 | + ... + N (en)|xn| ≤ (N (e 1 ) + ... + N (en))‖x‖
We can choose a := N (e 1 ) + ... + N (en). The estimate right is proven indirect. Assume there is no such b > 0. Then one can find for each b = 1, 2 , 3 , ..., k, ... a vector xk such that:
‖xk‖ > kN (xk)
holds.
With (2) we could have for k = 1, 2 , 3 , ...:
xk ‖xk‖
k
For yk := (^) ‖xxkk ‖ we then have N (yk) < (^1) k. This will create a contradiction. Because ‖yk‖ = 1 the sequence (yk) is bounded. Therefore there is a subsequence (zk) with limit z, i.e.:
lim z→∞ ‖zk − z‖ = 0
because N (z − zk) ≤ a‖z − zk‖ we have:
N (z) = N (z − zk + zk) ≤ N (z − zk) + N (zk) ≤ a‖z − zk‖ + N (zk)
Because N (yk) < (^1) k we have for (zk) limk→∞ N (zk) = 0. Therefore N (z) = 0, i.e. we would conclude z = 0. On the other hand we have: zk = z + (zk − z) and ‖zk‖ = 1
Therefore: 1 ≤ ‖z‖ + ‖zk − z‖
So ‖z‖ > 0, and therefore z 6 = 0. This proves:
‖x‖ ≤ bN (x)
for some positive b and for all x ∈ Rn.
This theorem should be generalised:
Definition A1.3 A norm N ′^ is called equivalent to norm N if there are positive numbers a, b such that for all x ∈ Rn^ we have: aN (x) ≤ N ′(x) ≤ bN (x)
We have:
Theorem A1.4 Every pair of norms is equivalent on Rn.
Proof. We have already shown that every norm N is equivalent to the maximum norm. We have to check equivalence of norms forms and equivalence relationship:
1 b N ′(x) ≤ N (x) ≤
a N ′(x) [Symmetry]
aa′N (x) ≤ N ′′(x) ≤ bb′N (x) [Transitivity]
For example the maximum norm and Euclidean norm are equivalent. It holds:
‖x‖ ≤ |x| ≤
n‖x‖ (check!)
Infinite sums:
Like in R an infinite sum (row) is defined as a sequence (sk) with:
sk :=
∑^ k
j=
xj
If limk→∞ sk exists, the limit is called the sum s:
s :=
k=
sk
The infinite sum (sk) is called absolutely convergent if:
∑^ ∞
k=
‖xk‖
Let us denote all area of Q not covered during the construction by M. Then Q \ M = C, with C closed. C is called the Cantor set.
Compact sets:
Theorem A2.3 Let M ⊂ Rn. The following statements are equivalent:
Proof. We show (1) =⇒ (2) =⇒ (3) =⇒ (1).
To show (1) =⇒ (2) we use an interaction. Because M is bounded there exists a closed cube W which contains M. Assume there is an open cover C of M , which does not have the Heine-Borel property. Next we can divide W into 2n^ closed cubes which all have half of the length of edges of previous iterations on W (see figure for dim = 2). With our assumption there is at least one such cube such that its dissection with M cannot be covered with finitely many sets from C, the cover. We choose such a cube and call it W (1). Following the construction we can derive a sequence (W (k)) of closed cubes such that:
(i) W ⊃ W (1)^ ⊃ ... ⊃ W (k)^ ⊃ ...,
(ii) limk→∞ δ(W (k)) = 0 (with δ(·) being the diameter of the cube), (iii) M ∩ W (k)^ cannot be covered with finitely many sets from C.
The remainder of the proof will be given after the following theorem.
The sequence M ∩ W (k)^ satisfies the condition of the theorem of Cantor:
Theorem A2.4 (Cantor) Let (Ak) be a decreasing sequence of bounded closed sets, which diameters δ(Ak) converge to zero, i.e. A 1 ⊃ A 2 ⊃ ... ⊃ Ak ⊃ ... and lim k→∞
δ(Ak) = 0
Then there is exactly one point x which belongs to all sets Ak:
⋂^ ∞
k=
Ak = {x}
Proof (Cantor). Choose an arbitrary point xk from each set Ak. One obtains a Cauchy-sequence because Ak+p⊂Ak for every p ≥ 0 and: ‖xk+p − xk‖ ≤ δ(Ak)
Therefore (xk) converges. Let us call the limit x. Every other possibly chosen sequence (yk) also converges to x because: ‖xk − yk‖ ≤ δ(Ak)
x is therefore unique, but it is also an element of all Ak. If not a k 0 would exist such that x 6 ∈ Ak 0 , and x 6 ∈ Ak 0 +p, with p ≥ 0. There would be a neighbourhood of x not contained in Ak 0 +p, p ≥ 0. This contradicts that we proved (xk) converges to x.
Proof of A2.3 (continued). We know now there is exactly one point x belonging to all sets M ∩ W (k), and therefore x ∈ M as well. By assumption (W has a nonempty open interior!) there is an open set O ∈ C with x ∈ O. For all sufficiently large k the cubes satisfy W (k)^ ⊂ O, as x is an interior point of O. To cover M ∩ W (k)^ we need only a single open set O ∈ C, whereas by (iii) we would need infinitely many sets from C. This is a contradiction, so (1) =⇒ (2) is proven.
We next prove (2) =⇒ (3), again indirectly. Assume M has an infinite subset (a set with infinitely many points) A, which has no limit points in M (“¬(3)”). Every point in A possesses an open ball which has no further elements in A. Also, every point in M \ A possesses an open ball around it with no elements in A. The set of all these open balls forms a cover of M. Because M has the Heine-Borel property finitely many open balls are sufficient to cover M. Therefore they also form a cover of A. Because each of these balls contains only one element of A, A must be finite (only contains finitely many points). This contradicts the assimption that A is infinite, so (2) =⇒ (3) is proven.
To prove (3) =⇒ (1) we assume M is not bounded. Then for every k ∈ N there exists xk ∈ M with:
‖xk‖ ≥ k
Assemble all xk, k = 1, 2 , ... in a set A. This set contains no limit point y, otherwise infinitely many elements of A would be in a 1-neighbourhood (ε=1!) of y, in contradiction to the construction. Therefore M is bounded. Assume M is not closed. Then there would be a limit point x of M which does not belong to M. For this x and every natural number k ≥ 1 we would have a point xk ∈ M such that:
‖xk − x‖ <
k
The infinite but countable set A has exactly one limit point in Rn, and this is x. By assumption (3) this limit point of A ⊂ M belongs to M : contradiction! Therefore (3) =⇒ (1) is proven.
Definition A2.1 A set M ⊂ Rn^ is called compact if it satisfies either (1), (2) or (3).
Remark: In infinite-dimensional vector spaces (1) is not equivalent to (2) or (3)!
Another property of M equivalent to (1) is:
We show equivalence with (3):
Proof. If (3) holds consider first a sequence with finitely many different values in M. Then most of the values of this sequence are constant, i.e. convergent. If this sequence has infinitely many different values in M then by (3) it has a limit point in M. Now let (4) be satisfied and let B be an arbitrary infinite subset of M. First choose a countable infinite subset A of B. Let the values of A be assembled as a sequence. As these values belong to M this sequence by property (4) has a convergent subsequence with limit in M. This limit is a limit point of A, and therefore of M.
As a simple consequence we get a generalised version of the theorem of Bolzano-Weierstrass.
Theorem A2.5 (Bolzano-Weierstrass) Every infinite bounded subset M of Rn^ possesses at least one limit point in Rn.
Proof. M is contained in a compact cube. Because of (3) M has a limit point in this cube.
Let now m = 1 and n = 2. Then for example:
y =
1 − x^21 − x^22 (x^21 + x^22 ≤ 1 defines D!)
maps the unit disk D ⊂ R^2 to R^1 , the result is the upper half of the unit ball in R^3.
So D = {x ∈ R^2 : x^21 + x^22 ≤ 1 } and I, the image is:
I = [0, 1], f : D → I
The set: Lc := {x ∈ Rn^ : f (x) = c},
With f : Rn^ → R is called the level set (for value c). In our upper ball example the set L 1 consists of any single point i.e. the origin { 0 = (0, 0)}. For 0 < c < 1 the level sets are circles. An important case of mappings f : Rn^ → Rm^ with m = n are “new coordinates”. For example:
x = f (r, ϕ) = r · cos ϕ y = g(r, ϕ) = r · sin ϕ
describes the transition to polar coordinates in R^2. Even more important are vector fields: for every point x ∈ D ⊂ R, f (x) ∈ Rn^ defines a vector of length ‖f (x)‖ located at x:
Continuity:
Definition A3.1 Let D ⊂ Rn^ and f : D → Rm. Then f is called continuous in a ∈ D if for every ε > 0 there is a δ > 0 such that for all x ∈ D with ‖x − a‖ < δ we have:
‖f (x) − f (a)‖ < ε
Remark: It is not important which norms we use, of course both norms appearing in the definition could be different.
Alternative definition of continuity:
Definition A3.2 f is called continuous in a ∈ D if for every neighbourhood V of f (a) there is a neigh- bourhood U of a such that: f (U ∩ D) ⊂ V
If f is continuous it maps a sequence (xk) ∈ D (with (xk) convergent to a) to a sequence (f (xk)), convergent to f (a) (check!).
Let a now be a limit point of D. We write:
lim x→a f (x) = b
If there is for every ε > 0 a δ > 0 such that:
x ∈ D, x 6 = a, ‖x − a‖ < δ =⇒ ‖f (x) − b‖ < ε
Remark: a needs not to be an element of D 1. If a ∈ D then f (a) does play no role in the definition.
We have therefore another definition of continuity:
f continuous ⇐⇒ lim x→a f (x) = f (a)
This is often written as: lim x→a f (x) = f
lim x→a x
All the previous examples of functions f have been continuous. Example of f being not continuous: Divide Rn^ into rational and non-rational points. If x 1 , ..., xn ∈ Q then define f (x) = 0 otherwise f (x) = 1 (i.e. for x 1 , ..., xn 6 ∈ Q).
Consider:
f (x, y) =
2 xy x^2 +y^2 for^ x
(^2) + y (^2) > 0
0 for x = 0, y = 0
Then f is continuous on R^2 with the exception of (0, 0). Consider first the x and y axis, here f is 0 (so could be continuous...), but in a neighbourhood of (0, 0) we find:
f (x, x) =
2 x^2 x^2 + x^2
= 1 (x 6 = 0)
This example shows we can have “partial continuity”.
the increment f (p + h) − f (p) for sufficiently small h is well approximated by the scalar product 〈∇f (p), h〉. We obtain the following interpretation:
∇f (b) points in direction of the largest increment of f at location p. The scalar product 〈∇f (p), h〉 (as a function of h) has its largest value if h is a positive multiple of f (p).
The function values of f can be illustrated if one looks at the level sets:
Lc = {x ∈ Rn^ : f (x) = c, c ∈ R}
Vectors h orthogonal to ∇f (p) lead to points in the neighbourhood of p which function values only differ minimally from f (p).
We next look at the graph of a function f : D → R, D ∈ Rn. The graph of f builds a hypersurface in Rn+1, at each x = (x 1 , ..., xn) ∈ D there is a y = f (x 1 , ..., xn). In a neighbourhood of p ∈ D this hypersurface (graph of f ) can be approximated by the hyperplane:
y = f (p) + 〈∇f (p, x − p)〉
We will call this hyperplane the tangential hyperplane of f at location p. The vector:
(−∇f (p), 1) ∈ Rn+
is a normal vector on the tangential hyperplane (and therefore at the hypersurface) at location (p, f (p))
Partial derivatives
How can we determine f ′(p) of f differentiable at point p ∈ D? The linear mapping f ′(p) is determined if we fix a basis v 1 , ..., vn of Rn. Let us consider v ∈ Rn, v 6 = 0. We like to compute f ′(p, v), and set x = p + tv, t ∈ R. So we get:
f (p + tv) = f (p) + f ′(p, tv) + R(p + tv)|t| · ‖v‖
For t 6 = 0 we get:
f ′(p, v) =
f (p + tv) − f (p) t
− R(p + tv)
|t| t
‖v‖
and:
f ′(p, v) = lim t→ 0
f (p + tv) − f (p) t The expression right of the limit is called directional derivative. Dvf (p) of f at location p with respect of v. If we choose for v one of the standard basis vectors e 1 , ..., en of Rn, then the respective directional derivatives are called partial derivatives. There are different notations for partial derivatives in the literature. The i-th partial derivative can be written:
Dif (x),
∂f ∂xi
(x), or fxi (x)
although the last one should be abandoned. Instead one sometimes uses just fi(x), but this can lead to misunderstandings if f is not real-valued anymore. The task to compute partial derivatives can be solved by means of differential calculus in one variable. All arguments besides the i-th argument remain fixed. We have to compute:
lim xi→pi
f (p 1 , ..., xi, .., pn) − f (p 1 , ..., pi, ..., pn) xi − pi
This limit can exist even if our function is not differentiable in P according to our definition. If some function f : D → Rm, D ⊂ Rn^ is differentiable at p then all partial derivatives exist. We have for h =
∑n i=1 hiei:
f ′(p, h) =
∑^ n
i=
f ′(p, ei) · hi =
∑^ n
i=
Dif (p) · hi
If the vectors Dif (p) are determined with the help of the standard basis of Rm, i.e. one dissects f into the components f 1 , ..., fm then f ′(p) can be written as the m × n matrix:
Jf (p) := f ′(p) =
D 1 f 1 (p) D 2 f 1 (p) · · · Dnf 1 (p) .. .
D 1 fm(p) D 2 fm(p) · · · Dnfm(p)
This matrix is also called the Jacobian-matrix. Now differentiability can be checked with partial derivatives. First the existence of all partial derivatives ∂f ∂xji (p) = Difj (p) must be assured. Then one checks that the matrix Jf (p) satisfies:
lim h→ 0
f (p + h) − f (p) − Jf (p)h ‖h‖
Important: The existence of the partial derivatives of f at p alone does not imply differentiability of f at p!
Example: Let f : R^2 → R^2 , with:
f 1 (x 1 , x 2 ) = x^21 − x^22 f 2 (x 2 , x 2 ) = 2x 1 x 2
Then Jf (p) becomes:
Jf (p) =
2 p 1 − 2 p 2 2 p 2 2 p 1
We compute R(h) related to Jf (p) component-wise:
R 1 (h) = (p 1 + h 1 )^2 − (p 2 + h 2 )^2 − p^21 − p^22 − 2 p 1 h 1 + 2p 2 h 2 ‖h‖
h^21 − h^22 ‖h‖
R 2 (h) =
2(p 1 + h 1 )(p 2 − h 2 ) − 2 p 1 p 2 − 2 p 2 h 1 − 2 p 1 h 2 ‖h‖
2 h 1 h 2 ‖h‖
Using the maximum norm we get:
|τ 1 (h)| ≤ 2 ‖h‖ |τ 2 (h)| ≤ 2 ‖h‖
i.e. lim h→ 0 R 1 (h) = lim h→ 0 R 2 (h) = 0
It follows f is differentiable at p. The derivative is:
f ′(p) =
2 p 1 − 2 p 2 2 p 2 2 p 1
= Jf (p)
Definition B1.3 If f : D → Rm, D ⊂ Rn^ is differentiable for all p ∈ D, D open, then f is called differen- tiable in D. If the derivative f ′(p) is continuous for all p ∈ D then f is called continuously differentiable in D. We write: f ∈ C^1 (D, Rm)
i.e. C^1 (D, Rm) is the set of all functions f : D → Rm^ which are continuously differentiable. C^1 (D, Rm) is a function space (of infinite dimension!).
By adding both equation families (brackets) we obtain the desired result. Using the first bracket and multiplying with c proves the second claim.
Theorem B2.2 Let f : D → R and g : D → R be differentiable at p ∈ D, D ⊂ Rn. Then f · g is differentiable at p ∈ D with:
(f · g)′(p, h) = f (p) · g′(p, h) + g(p) · f ′(p, h)
Proof. After multiplication and re-arrangement we get:
(f g)(p + h) = (f g)(p) + f (p) · g′(p, h) + g(p) · f ′(p, h)
We can estimate |f ′(p, h) · g′(p, h)| by c‖h‖^2 from above. It follows that the last four terms of the sum (establishing the error term) can be written in the form: { R(h)‖h‖ limh→ 0 R(h) = 0
Remarks: The product rule can be written in the form:
∇f g(p) = f (p)∇g(p) + g(p)∇f (p)
If f (p) 6 = 0, f : D → R, D ⊂ Rn, f differentiable in p, then:
f
(p) = −
(f (p))^2
∇f (p)
If f : D → R, D ⊂ Rn, g : D → Rm, f and g differentiable in p, then:
(f g)′(p, h) = f (p)g′(p, h) + f ′(p, h)g(p)
Let [·, ·] be a bilinear product (not necessarily symmetric), with:
[·, ·] : Rp^ × Rq^ → Rm
Let f : D → Rp, g : D → Rq^ , D ⊂ Rn, f and g differentiable in p. Then:
[f, g]′(p, h) = [f (p), g′(p, h)] + [f ′(p, h), g(p)]
The order of terms is important as long as [·, ·] is not symmetric. The proof of this can be made directly or
with Theorem B2.2 component-wise. Check!
We are now turning to the chain rule: Theorem B2.3 Let f : D → Rm, D ⊂ Rn, f differentiable in p ∈ D, and g : E → Rk, E ⊂ Rm, g differentiable in q ∈ E, with f (D) ⊂ E, and q = f (p). Then g ◦ f is differentiable at p ∈ D and it holds:
(g ◦ f )′(p) = g′(f (p)) ◦ f ′(p)
Proof. By assumption, and using the compact notation introduced earlier in B2 we have:
f (p + h) = f (p) + F (h)h, lim h→ 0 F (h) = f ′(p)
g(q + l) = g(q) + G(l)l, lim l→ 0
G(l) = g′(q)
h ∈ Rn, l ∈ Rm. WLOG let p = 0 , f (p) = 0 , g(q) = 0. In this case:
f (h) = F (h)h, lim h→ 0 F (h) = f ′( 0 )
g(l) = G(l)l, lim l→ 0 G(l) = g′( 0 )
We obtain for l = f (h): g(f (h)) = (G(F (h)h) ◦ F (h))h.
This shows the chain rule, as we have:
lim h→ 0
G(F (h)h) ◦ F (h) = g′( 0 ) ◦ f ′( 0 ).
This means g ◦ f is differentiable at 0 , and the derivative of the chain is equal to the chained derivatives. If we go back to an arbitrary location p and q = f (p) then indeed:
(g ◦ f )′(p) = g′(f (p)) ◦ f ′(p)
Remark: Using the notation f ′(p, h) and g′(q, l) the chain rule becomes:
(g ◦ f )′(p, h) = g′(f (p), f ′(p, h)).
The chain rule implies the following structure of the Jacobian matrices Jg, Jf and J(g ◦ f ) =: JH, i.e. H := g ◦ f : (^)
D 1 H 1 · · · DnH 1 .. .
D 1 Hk · · · DnHk
(k × n)-matrix
D 1 g 1 · · · Dmg 1 .. .
D 1 gk · · · Dmgk
(k × m)-matrix
D 1 f 1 · · · Dnf 1 .. .
D 1 fm · · · Dnfm
(m × n)-matrix
We have JH = JH(p) and Jf = Jf (p), but Jg = Jg(f (p)). For the partial derivatives of H 1 , ..., Hk we get:
DiHj =
∑^ m
l=
Dlgj · Difl (1 ≤ i ≤ n, 1 ≤ j ≤ k)
The vector ∇H at p ∈ D ⊂ Rn^ is the image of ∇g ∈ Rm^ at q = f (p) under the linear mapping f ′(p) = Jf (p).
Diffeomorphisms
Consider f : D → Rn, D ⊂ Rn^ possesses the inverse mapping g : E → Rn, E ⊂ Rn^ open. Moreover f is differentiable at p ∈ D and g is differentiable at q = f (p) ∈ E. Then using the chain rule it is possible to express g′(q) by f ′(p). We have g ◦ f = ID , (g ◦ f )′(p) = IRn (IX : identity mapping on X), so:
g′(q) ◦ f ′(p) = IRn , g′(f (p)) = (f ′(p))−^1.
We even have the following theorem with slightly weaker assumptions: Theorem B2.4 Let f : D → Rn^ (D ⊂ Rn) be invertible (i.e. bijective) and differentiable at p ∈ D with Jf (p) 6 = 0. The inverse map g : f (D) → Rn^ is continuous at q = f (p). Then g is differentiable at p and we have: g′(q) = (f ′(p))−^1
If f : D → Rn, D ⊂ Rn, continuously differentiable and invertible (for all x ∈ D), then g := f −^1 is not necessarily differentiable. But if g is even continuously differentiable, then f becomes a diffeomorphism: Definition B2.1 Let f : D → Rn, D ⊂ Rn^ be continuously differentiable and invertible on D, and If = f (D) ⊂ Rn^ open. If the inverse mapping:
f −^1 : If → Rn
is again continuously differentiable, then f and f −^1 are called differomorphisms. For example the polar coordinate transformation is a diffeomorphism.
Differentiable curves and lines in Rn
Consider a mapping: γ : D → Rn, with D ⊂ R
i.e. γ = γ(t) ∈ Rn, t ∈ D ⊂ R. Then γ is called a curve in Rn^ with parameterisation t. We assume now γ is differentiable on D. Then at t 0 ∈ D we have:
γ′(t 0 ) = lim t→t 0
γ(t) − γ(t 0 ) t − t 0
and each component γ 1 ′(t 0 ), ..., γ′ n(t 0 ) is differentiable in t 0 in the one-dimensional sense. The derivative γ′(t 0 ) has the following interpretation. Define g(t) by:
g(t) := γ(t 0 ) + γ′(t 0 )(t − t 0 ).
Then g(t) is defining a line in Rn^ parameterised by t. This line approximates the curve γ(t) at t 0 very well, i.e.:
lim t→t 0
‖γ(t) − g(t)‖ t − t 0
We also have a kinematic interpretation. The limit vector γ′(t 0 ) is the current velocity vector at t 0 , and |γ′(t 0 )| the current velocity at t 0 , with | · | being the Euclidean norm. Also:
γ′(t 0 ) |γ′(t 0 )|
is called the unit tangential vector or normal vector of γ at t 0. If γ′^ : [a, b] =: D → Rn^ is continuous, then γ is called smooth. γ is called piecewise smooth if it is composed of finitely many(!) smooth curves.
Definition B3.1 Let γ : [a, b] → Rn^ be a piecewise smooth curve. Then:
L(γ) :=
∫ (^) b
a
|γ′(t)| dt
is called the length of curve γ.
Examples: Let:
γ(t) =
γ 1 (t) γ 2 (t)
cos t sin t
(t ∈ [0, 2 π])
be the unit circle (in R^2 ). The velocity vectors:
γ′(t) =
γ′ 1 (t) γ′ 2 (t)
− sin t cos t
has Euclidean norm 1, so L(γ) =
∫ (^2) π 0 dt^ = 2π. Let now^ a, b^ ∈^ R
n (^) and:
γ(t) = (1 − t)a + tb (t ∈ [0, 1])
Then:
γ′(t) = b − a, |γ′(t)| = |b − a| and L(γ) =
0
|b − a| dt = |b − a|.
γ(t) is the line segment between a and b.
Theorem B3.1 Let f : D → R, D ⊂ Rn, be differentiable, and let the line between points a, b ∈ D be contained in D. Then there is a point c on the line between a and b such that:
f (b) − f (a) = f ′(c, b − a)
Proof. We can describe the line between a and b by:
γ(t) = a + t(b − a) (0 ≤ t ≤ 1)
Then γ is clearly differentiable, and we have:
γ′(t) = b − a
which means γ is constant. Let F : [0, 1] → R be defined by:
F (t) = f (γ(t)).
We can apply the one-dimensional intermediate value theorem. Therefore there is a ν ∈]0, 1[ such that:
F (1) − F (0) = F ′(ν)
Using the chain rule we get: F ′(t) = f ′(γ(t), γ′(t))
Because F (0) = f (a), F (1) = f (b), γ(ν) =: c is a point on the line between a and b, moreover γ′(ν) = b − a, the theorem is proven.
Remark: It is possible to replace the line between a and b by a differentiable curve γ connecting a and b. In this case the “increment” f (b) − f (a) can be expressed as:
f (b) − f (a) = f ′(γ(ν), γ′(ν)) (0 < ν < 1)
Using x + h instead of a and b one obtains another formulation of the intermediate value theorem:
f (x + h) − f (x) = f ′(x + νh, h) (0 < ν < 1)
With the help of the intermediate value theorem we can characterise constant functions:
Theorem B3.2 Let D ⊂ Rn^ be open, and let there exist a differentiable curve connecting any two points of D such that the curve is entirely contained in D (D is called path connected ). Let f : D → R be differentiable. Then:
f is constant ︸ ︷︷ ︸ (a)
⇐⇒ f ′(x, h) = 0 for all x ∈ D, and all h ∈ Rn. ︸ ︷︷ ︸ (b)
Proof. (a) =⇒ (b) is trivial, even in case D is not path connected. Now consider (b) =⇒ (a). Choose two points in D, say a and b. Connect a and b by finitely many line segments, such that the sequence of vertices connected by lines becomes a = a 0 , a 1 , ..., ak = b. Apply the mean-value theorem to the line segment joining aj , aj+1. Because f ′(cj , aj+1 − aj ) = 0 for some cj on the line segment we have:
f (aj+1) = f (aj ) for j = 0, 1 , ..., k − 1
This implies f (b) = f (a).