Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Handout A: Linear Algebra Cheat Sheet, Cheat Sheet of Linear Algebra

The purpose of this handout is to give a brief review of some of the basic concepts and results in linear algebra.

Typology: Cheat Sheet

2019/2020

Uploaded on 10/09/2020

anasooya
anasooya 🇺🇸

4

(12)

244 documents

1 / 9

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
G00TE1204: Convex Optimization and Its Applications in Signal Processing
Handout A: Linear Algebra Cheat Sheet
Instructor: Anthony Man–Cho So Updated: May 10, 2015
The purpose of this handout is to give a brief review of some of the basic concepts and results
in linear algebra. If you are not familiar with the material and/or would like to do some further
reading, you may consult, e.g., the books [1, 2, 3].
1 Basic Notations, Definitions and Results
1.1 Vectors and Matrices
We denote the set of real numbers (also referred to as scalars) by R. For positive integers m, n 1,
we use Rm×nto denote the set of m×narrays whose components are from R. In other words,
Rm×nis the set of n–dimensional real matrices, and an element ARm×ncan be written as
A=
a11 a12 ··· a1n
a21 a22 ··· a2n
.
.
..
.
.....
.
.
am1am2··· amn
,(1)
where aij Rfor i= 1, . . . , m and j= 1, . . . , n. A row vector is a matrix with m= 1, and
acolumn vector is a matrix with n= 1. The word vector will always mean a column vector
unless otherwise stated. The set of all n–dimensional real vectors is denoted by Rn, and an element
xRncan be written as x= (x1, . . . , xn). Note that we still view x= (x1, . . . , xn) as a column
vector, even though typographically it does not appear so. The reason for such a notation is simply
to save space. Now, given an m×nmatrix Aof the form (1), its transpose ATis defined as the
following n×mmatrix:
AT=
a11 a21 ··· am1
a12 a22 ··· am2
.
.
..
.
.....
.
.
a1na2n··· amn
.
An m×mreal matrix Ais said to be symmetric if A=AT. The set of m×mreal symmetric
matrices is denoted by Sm.
We use x0to indicate that all the components of xare non–negative, and xyto mean
that xy0. The notations x > 0,x0,x < 0,x>y,xy, and x<yare to be interpreted
accordingly.
We say that a finite collection C={x1, x2, . . . , xm}of vectors in Rnis
linearly dependent if there exist scalars α1, . . . , αmR, not all of them zero, such that
Pm
i=1 αixi=0;
affinely dependent if the collection C0={x2x1, x3x1, . . . , xmx1}is linearly dependent.
1
pf3
pf4
pf5
pf8
pf9

Partial preview of the text

Download Handout A: Linear Algebra Cheat Sheet and more Cheat Sheet Linear Algebra in PDF only on Docsity!

G00TE1204: Convex Optimization and Its Applications in Signal Processing

Handout A: Linear Algebra Cheat Sheet

Instructor: Anthony Man–Cho So Updated: May 10, 2015

The purpose of this handout is to give a brief review of some of the basic concepts and results in linear algebra. If you are not familiar with the material and/or would like to do some further reading, you may consult, e.g., the books [1, 2, 3].

1 Basic Notations, Definitions and Results

1.1 Vectors and Matrices

We denote the set of real numbers (also referred to as scalars) by R. For positive integers m, n ≥ 1, we use Rm×n^ to denote the set of m × n arrays whose components are from R. In other words, Rm×n^ is the set of n–dimensional real matrices, and an element A ∈ Rm×n^ can be written as

A =

a 11 a 12 · · · a 1 n a 21 a 22 · · · a 2 n .. .

am 1 am 2 · · · amn

where aij ∈ R for i = 1,... , m and j = 1,... , n. A row vector is a matrix with m = 1, and a column vector is a matrix with n = 1. The word vector will always mean a column vector unless otherwise stated. The set of all n–dimensional real vectors is denoted by Rn, and an element x ∈ Rn^ can be written as x = (x 1 ,... , xn). Note that we still view x = (x 1 ,... , xn) as a column vector, even though typographically it does not appear so. The reason for such a notation is simply to save space. Now, given an m × n matrix A of the form (1), its transpose AT^ is defined as the following n × m matrix:

AT^ =

a 11 a 21 · · · am 1 a 12 a 22 · · · am 2 .. .

a 1 n a 2 n · · · amn

An m × m real matrix A is said to be symmetric if A = AT^. The set of m × m real symmetric matrices is denoted by Sm. We use x ≥ 0 to indicate that all the components of x are non–negative, and x ≥ y to mean that x − y ≥ 0. The notations x > 0 , x ≤ 0 , x < 0 , x > y, x ≤ y, and x < y are to be interpreted accordingly. We say that a finite collection C = {x^1 , x^2 ,... , xm} of vectors in Rn^ is

  • linearly dependent∑ if there exist scalars α 1 ,... , αm ∈ R, not all of them zero, such that m i=1 αix i (^) = 0 ;
  • affinely dependent if the collection C′^ = {x^2 −x^1 , x^3 −x^1 ,... , xm^ −x^1 } is linearly dependent.

The collection C (resp. C′) is said to be linearly independent (resp. affinely independent) if it is not linearly dependent (resp. affinely dependent).

1.2 Inner Product and Vector Norms

Given two vectors x, y ∈ Rn, their inner product is defined as

xT^ y ≡

∑^ n

i=

xiyi.

We say that x and y are orthogonal if xT^ y = 0. The Euclidean norm of x ∈ Rn^ is defined as

‖x‖ 2 ≡

xT^ x =

( (^) n ∑

i=

|xi|^2

A fundamental inequality that relates the inner product of two vectors and their respective Eu- clidean norms is the Cauchy–Schwarz inequality: ∣ ∣xT^ y

∣ (^) ≤ ‖x‖ 2 · ‖y‖ 2.

Equality holds iff the vectors x and y are linearly dependent; i.e., x = αy for some α ∈ R. Note that the Euclidean norm is not the only norm one can define on Rn. In general, a function ‖ · ‖ : Rn^ → R is called a vector norm on Rn^ if for all x, y ∈ Rn, we have

(a) (Non–Negativity) ‖x‖ ≥ 0;

(b) (Positivity) ‖x‖ = 0 iff x = 0 ;

(c) (Homogeneity) ‖αx‖ = |α| · ‖x‖ for all α ∈ R;

(d) (Triangle Inequality) ‖x + y‖ ≤ ‖x‖ + ‖y‖.

For instance, for p ≥ 1, the `p–norm on Rn, which is given by

‖x‖p =

( (^) n ∑

i=

|xi|p

) 1 /p ,

is a vector norm on Rn. It is well known that

‖x‖∞ = lim p→∞ ‖x‖p = max 1 ≤i≤n |xi|.

1.3 Matrix Norms

We say that a function ‖ · ‖ : Rn×n^ → R is a matrix norm on the set of n × n matrices if for any A, B ∈ Rn×n, we have

(a) (Non–Negativity) ‖A‖ ≥ 0;

(b) (Positivity) ‖A‖ = 0 iff A = 0 ;

Moreover, we have rank(A) ≤ min{m, n}, and if equality holds, then we say that A has full rank. The nullspace of A is the set null(A) ≡ {x ∈ Rn^ : Ax = 0 }. It is a subspace of Rn^ and has dimension n−rank(A). The following summarizes the relationships among the subspaces range(A), range(AT^ ), null(A), and null(AT^ ):

(range(A))⊥^ = null(AT^ ), ( range(AT^ )

= null(A).

The above implies that given an m × n real matrix A of rank r ≤ min{m, n}, we have rank(AAT^ ) = rank(AT^ A) = r. This fact will be frequently used in the course.

1.5 Affine Subspaces

Let S 0 be a subspace of Rn^ and x^0 ∈ Rn^ be an arbitrary vector. Then, the set S =

x^0

+ S 0 =

{x + x^0 : x ∈ S 0 } is called an affine subspace of Rn, and its dimension is equal to the dimension of the underlying subspace S 0. Now, let C = {x^1 ,... , xm} be a finite collection of vectors in Rn, and let x^0 ∈ Rn^ be arbitrary. By definition, the set S =

x^0

  • span(C) is an affine subspace of Rn. Moreover, it is easy to verify that every vector y ∈ S can be written in the form

y =

∑^ m

i=

[

αi(x^0 + xi) + βi(x^0 − xi)

]

for some α 1 ,... , αm, β 1 ,... , βm ∈ R such that

∑m i=1 (αi^ +^ βi) = 1; i.e., the vector^ y^ ∈^ R n (^) is an

affine combination of the vectors x^0 ± x^1 ,... , x^0 ± xm^ ∈ Rn. Conversely, let C = {x^1 ,... , xm} be a finite collection of vectors in Rn, and define

S =

{ (^) m ∑

i=

αixi^ : α 1 ,... , αm ∈ R,

∑^ m

i=

αi = 1

to be the set of affine combinations of the vectors in C. We claim that S is an affine subspace of Rn. Indeed, it can be readily verified that

S =

x^1

  • span

{x^2 − x^1 ,... , xm^ − x^1 }

This establishes the claim. Given an arbitrary (i.e., not necessarily finite) collection C of vectors in Rn, the affine hull of C, denoted by aff(C), is the set of all finite affine combinations of the vectors in C. Equivalently, we can define aff(C) as the intersection of all affine subspaces containing C.

1.6 Some Special Classes of Matrices

The following classes of matrices will be frequently encountered in this course.

  • Invertible Matrix. An n × n real matrix A is said to be invertible if there exists an n × n real matrix A−^1 (called the inverse of A) such that A−^1 A = I, or equivalently, AA−^1 = I. Note that the inverse of A is unique whenever it exists. Morever, recall that A ∈ Rn×n^ is invertible iff rank(A) = n.

Now, let A be a non–singular n × n real matrix. Suppose that A is partitioned as

A =

[

A 11 A 12

A 21 A 22

]

where Aii ∈ Rni×ni^ for i = 1, 2, with n 1 + n 2 = n. Then, provided that the relevant inverses exist, the inverse of A has the following form:

A−^1 =

[ (

A 11 − A 12 A− 221 A 21

A− 111 A 12

A 21 A− 111 A 12 − A 22

A 21 A− 111 A 12 − A 22

A 21 A− 111

A 22 − A 21 A− 111 A 12

]

  • Submatrix of a Matrix. Let A be an m × n real matrix. For index sets α ⊂ { 1 , 2 ,... , m} and β ⊂ { 1 , 2 ,... , n}, we denote the submatrix that lies in the rows of A indexed by α and the columns indexed by β by A(α, β). If m = n and α = β, the matrix A(α, α) is called a principal submatrix of A and is denoted by A(α). The determinant of A(α) is called a principal minor of A. Now, let A be an n×n matrix, and let α ⊂ { 1 , 2 ,... , n} be an index set such that A(α) is non– singular. We set α′^ = { 1 , 2 ,... , n}\α. The following is known as the Schur determinantal formula: det(A) = det(A(α))det

[

A(α′) − A(α′, α)A(α)−^1 A(α, α′)

]

  • Orthogonal Matrix. An n × n real matrix A is called an orthogonal matrix if AAT^ = AT^ A = I. Note that if A ∈ Rn×n^ is an orthogonal matrix, then for any u, v ∈ Rn, we have uT^ v = (Au)T^ (Av); i.e., orthogonal transformations preserve inner products.
  • Positive Semidefinite/Definite Matrix. An n×n real matrix A is positive semidefinite (resp. positive definite) if A is symmetric and for any x ∈ Rn{ 0 }, we have xT^ Ax ≥ 0 (resp. xT^ Ax > 0). We use A º 0 (resp. A Â 0 ) to denote the fact that A is positive semidefinite (resp. positive definite). We remark that although one can define a notion of positive semidefiniteness for real matrices that are not necessarily symmetric, we shall not pursue that option in this course.
  • Projection Matrix. An n × n real matrix A is called a projection matrix if A^2 = A. Given a projection matrix A ∈ Rn×n^ and a vector x ∈ Rn, the vector Ax ∈ Rn^ is called the projection of x ∈ Rn^ onto the subspace range(A). Note that a projection matrix need not be symmetric. As an example, consider

A =

[

]

We say that A defines an orthogonal projection onto the subspace S ⊂ Rn^ if for every x = x^1 + x^2 ∈ Rn, where x^1 ∈ S and x^2 ∈ S⊥, we have Ax = x^1. Note that if A defines an orthogonal projection onto S, then I − A defines an orthogonal projection onto S⊥. Fur- thermore, it can be shown that A is an orthogonal projection onto S iff A is a symmetric projection matrix with range(A) = S. As an illustration, consider an m × n real matrix A, with m ≤ n and rank(A) = m. Then, the projection matrix corresponding to the orthogonal projection onto the nullspace of A is given by Pnull(A) = I − AT^ (AAT^ )−^1 A.

gives rise to a set of k eigenvectors of A whose associated eigenvalue is λ¯. It is worth noting that if {v^1 ,... , vk} is an orthonormal basis of L¯, then we can find an orthogonal matrix P 1 k ∈ Rk×k^ such that V 1 k = U 1 k P 1 k , where U 1 k (resp. V 1 k ) is the n × k matrix whose i–th column is ui^ (resp. vi), for i = 1,... , k. In particular, if A = U ΛU T^ = V ΛV T^ are two spectral decompositions of A with

λi 1 In 1 λi 2 In 2

... λil Inl

where λi 1 ,... , λil are the distinct eigenvalues of A, Ik denotes a k × k identity matrix, and n 1 + n 2 + · · · + nl = n, then there exists an orthogonal matrix P with the block diagonal structure

P =

Pn 1 Pn 2

... Pnl

where Pnj is an nj × nj orthogonal matrix for j = 1,... , l, such that V = U P. Now, suppose that we order the eigenvalues of A as λ 1 ≥ λ 2 ≥ · · · ≥ λn. Then, the Courant– Fischer theorem states that the k–th largest eigenvalue λk, where k = 1,... , n, can be found by solving the following optimization problems:

λk = min w^1 ,...,wk−^1 ∈Rn^ x 6 =max 0 ,x∈Rn x⊥w^1 ,...,wk−^1

xT^ Ax xT^ x

= max w^1 ,...,wn−k^ ∈Rn^ x 6 =min 0 ,x∈Rn x⊥w^1 ,...,wn−k

xT^ Ax xT^ x

2.2 Properties of Positive Semidefinite Matrices

By definition, a real positive semidefinite matrix is symmetric, and hence it has the properties listed above. However, much more can be said about such matrices. For instance, the following statements are equivalent for an n × n real symmetric matrix A:

(a) A is positive semidefinite.

(b) All the eigenvalues of A are non–negative.

(c) There exists a unique n × n positive semidefinite matrix A^1 /^2 such that A = A^1 /^2 A^1 /^2.

(d) There exists an k × n matrix B, where k = rank(A), such that A = BT^ B.

Similarly, the following statements are equivalent for an n × n real symmetric matrix A:

(a) A is positive definite.

(b) A−^1 exists and is positive definite.

(c) All the eigenvalues of A are positive.

(d) There exists a unique n × n positive definite matrix A^1 /^2 such that A = A^1 /^2 A^1 /^2.

Sometimes it would be useful to have a criterion for determining the positive semidefiniteness of a matrix from a block partitioning of the matrix. Here is one such criterion. Let

A =

[

X Y

Y T^ Z

]

be an n × n real symmetric matrix, where both X and Z are square. Suppose that Z is invertible. Then, the Schur complement of the matrix A is defined as the matrix SA = X − Y Z−^1 Y T^. If Z Â 0 , then it can be shown that A º 0 iff X º 0 and SA º 0. There is of course nothing special about the block Z. If X is invertible, then we can similarly define the Schur complement of A as S′ A = Z − Y T^ X−^1 Y. If X Â 0 , then we have A º 0 iff Z º 0 and S A′ º 0.

3 Singular Values and Singular Vectors

Let A be an m × n real matrix of rank r ≥ 1. Then, there exist orthogonal matrices U ∈ Rm×m and V ∈ Rn×n^ such that A = U ΛV T^ , (4)

where Λ ∈ Rm×n^ has Λij = 0 for i 6 = j and Λ 11 ≥ Λ 22 ≥ · · · ≥ Λrr > Λr+1,r+1 = · · · = Λqq = 0 with q = min{m, n}. The representation (4) is called the Singular Value Decomposition (SVD) of A; cf. (2). The entries Λ 11 ,... , Λqq are called the singular values of A, and the columns of U (resp. V ) are called the left (resp. right) singular vectors of A. For notational convenience, we write σi ≡ Λii for i = 1,... , q. Note that (4) can be equivalently written as

A =

∑^ r

i=

σiui(vi)T^ ,

where ui^ (resp. vi) is the i–th column of the matrix U (resp. V ), for i = 1,... , r. The rank of A is equal to the number of non–zero singular values. Now, suppose that we order the singular values of A as σ 1 ≥ σ 2 ≥ · · · ≥ σq, where q = min{m, n}. Then, the Courant–Fischer theorem states that the k–th largest singular value σk, where k = 1,... , q, can be found by solving the following optimization problems:

σk = min w^1 ,...,wk−^1 ∈Rn^ x 6 =max 0 ,x∈Rn x⊥w^1 ,...,wk−^1

‖Ax‖ 2 ‖x‖ 2 = max w^1 ,...,wn−k^ ∈Rn^ x 6 =min 0 ,x∈Rn x⊥w^1 ,...,wn−k

‖Ax‖ 2 ‖x‖ 2

The optimization problems (3) and (5) suggest that singular value and eigenvalue are closely related notions. Indeed, if A is an m × n real matrix, then

λk(AT^ A) = λk(AAT^ ) = σ k^2 (A) for k = 1,... , q,

where q = min{m, n}. Moreover, the columns of U and V are the eigenvectors of AAT^ and AT^ A, respectively. In particular, our discussion in Section 2.1 implies that the set of singular values of A is unique, but the sets of left and right singular vectors are not. Finally, we note that the largest singular value function induces a matrix norm, which is known as the spectral norm and is sometimes denoted by ‖A‖ 2 = σ 1 (A).