





Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
The purpose of this handout is to give a brief review of some of the basic concepts and results in linear algebra.
Typology: Cheat Sheet
1 / 9
This page cannot be seen from the preview
Don't miss anything!
G00TE1204: Convex Optimization and Its Applications in Signal Processing
Instructor: Anthony Man–Cho So Updated: May 10, 2015
The purpose of this handout is to give a brief review of some of the basic concepts and results in linear algebra. If you are not familiar with the material and/or would like to do some further reading, you may consult, e.g., the books [1, 2, 3].
We denote the set of real numbers (also referred to as scalars) by R. For positive integers m, n ≥ 1, we use Rm×n^ to denote the set of m × n arrays whose components are from R. In other words, Rm×n^ is the set of n–dimensional real matrices, and an element A ∈ Rm×n^ can be written as
a 11 a 12 · · · a 1 n a 21 a 22 · · · a 2 n .. .
am 1 am 2 · · · amn
where aij ∈ R for i = 1,... , m and j = 1,... , n. A row vector is a matrix with m = 1, and a column vector is a matrix with n = 1. The word vector will always mean a column vector unless otherwise stated. The set of all n–dimensional real vectors is denoted by Rn, and an element x ∈ Rn^ can be written as x = (x 1 ,... , xn). Note that we still view x = (x 1 ,... , xn) as a column vector, even though typographically it does not appear so. The reason for such a notation is simply to save space. Now, given an m × n matrix A of the form (1), its transpose AT^ is defined as the following n × m matrix:
a 11 a 21 · · · am 1 a 12 a 22 · · · am 2 .. .
a 1 n a 2 n · · · amn
An m × m real matrix A is said to be symmetric if A = AT^. The set of m × m real symmetric matrices is denoted by Sm. We use x ≥ 0 to indicate that all the components of x are non–negative, and x ≥ y to mean that x − y ≥ 0. The notations x > 0 , x ≤ 0 , x < 0 , x > y, x ≤ y, and x < y are to be interpreted accordingly. We say that a finite collection C = {x^1 , x^2 ,... , xm} of vectors in Rn^ is
The collection C (resp. C′) is said to be linearly independent (resp. affinely independent) if it is not linearly dependent (resp. affinely dependent).
Given two vectors x, y ∈ Rn, their inner product is defined as
xT^ y ≡
∑^ n
i=
xiyi.
We say that x and y are orthogonal if xT^ y = 0. The Euclidean norm of x ∈ Rn^ is defined as
‖x‖ 2 ≡
xT^ x =
( (^) n ∑
i=
|xi|^2
A fundamental inequality that relates the inner product of two vectors and their respective Eu- clidean norms is the Cauchy–Schwarz inequality: ∣ ∣xT^ y
∣ (^) ≤ ‖x‖ 2 · ‖y‖ 2.
Equality holds iff the vectors x and y are linearly dependent; i.e., x = αy for some α ∈ R. Note that the Euclidean norm is not the only norm one can define on Rn. In general, a function ‖ · ‖ : Rn^ → R is called a vector norm on Rn^ if for all x, y ∈ Rn, we have
(a) (Non–Negativity) ‖x‖ ≥ 0;
(b) (Positivity) ‖x‖ = 0 iff x = 0 ;
(c) (Homogeneity) ‖αx‖ = |α| · ‖x‖ for all α ∈ R;
(d) (Triangle Inequality) ‖x + y‖ ≤ ‖x‖ + ‖y‖.
For instance, for p ≥ 1, the `p–norm on Rn, which is given by
‖x‖p =
( (^) n ∑
i=
|xi|p
) 1 /p ,
is a vector norm on Rn. It is well known that
‖x‖∞ = lim p→∞ ‖x‖p = max 1 ≤i≤n |xi|.
We say that a function ‖ · ‖ : Rn×n^ → R is a matrix norm on the set of n × n matrices if for any A, B ∈ Rn×n, we have
(a) (Non–Negativity) ‖A‖ ≥ 0;
(b) (Positivity) ‖A‖ = 0 iff A = 0 ;
Moreover, we have rank(A) ≤ min{m, n}, and if equality holds, then we say that A has full rank. The nullspace of A is the set null(A) ≡ {x ∈ Rn^ : Ax = 0 }. It is a subspace of Rn^ and has dimension n−rank(A). The following summarizes the relationships among the subspaces range(A), range(AT^ ), null(A), and null(AT^ ):
(range(A))⊥^ = null(AT^ ), ( range(AT^ )
= null(A).
The above implies that given an m × n real matrix A of rank r ≤ min{m, n}, we have rank(AAT^ ) = rank(AT^ A) = r. This fact will be frequently used in the course.
Let S 0 be a subspace of Rn^ and x^0 ∈ Rn^ be an arbitrary vector. Then, the set S =
x^0
{x + x^0 : x ∈ S 0 } is called an affine subspace of Rn, and its dimension is equal to the dimension of the underlying subspace S 0. Now, let C = {x^1 ,... , xm} be a finite collection of vectors in Rn, and let x^0 ∈ Rn^ be arbitrary. By definition, the set S =
x^0
y =
∑^ m
i=
αi(x^0 + xi) + βi(x^0 − xi)
for some α 1 ,... , αm, β 1 ,... , βm ∈ R such that
∑m i=1 (αi^ +^ βi) = 1; i.e., the vector^ y^ ∈^ R n (^) is an
affine combination of the vectors x^0 ± x^1 ,... , x^0 ± xm^ ∈ Rn. Conversely, let C = {x^1 ,... , xm} be a finite collection of vectors in Rn, and define
{ (^) m ∑
i=
αixi^ : α 1 ,... , αm ∈ R,
∑^ m
i=
αi = 1
to be the set of affine combinations of the vectors in C. We claim that S is an affine subspace of Rn. Indeed, it can be readily verified that
S =
x^1
{x^2 − x^1 ,... , xm^ − x^1 }
This establishes the claim. Given an arbitrary (i.e., not necessarily finite) collection C of vectors in Rn, the affine hull of C, denoted by aff(C), is the set of all finite affine combinations of the vectors in C. Equivalently, we can define aff(C) as the intersection of all affine subspaces containing C.
The following classes of matrices will be frequently encountered in this course.
Now, let A be a non–singular n × n real matrix. Suppose that A is partitioned as
where Aii ∈ Rni×ni^ for i = 1, 2, with n 1 + n 2 = n. Then, provided that the relevant inverses exist, the inverse of A has the following form:
A(α′) − A(α′, α)A(α)−^1 A(α, α′)
We say that A defines an orthogonal projection onto the subspace S ⊂ Rn^ if for every x = x^1 + x^2 ∈ Rn, where x^1 ∈ S and x^2 ∈ S⊥, we have Ax = x^1. Note that if A defines an orthogonal projection onto S, then I − A defines an orthogonal projection onto S⊥. Fur- thermore, it can be shown that A is an orthogonal projection onto S iff A is a symmetric projection matrix with range(A) = S. As an illustration, consider an m × n real matrix A, with m ≤ n and rank(A) = m. Then, the projection matrix corresponding to the orthogonal projection onto the nullspace of A is given by Pnull(A) = I − AT^ (AAT^ )−^1 A.
gives rise to a set of k eigenvectors of A whose associated eigenvalue is λ¯. It is worth noting that if {v^1 ,... , vk} is an orthonormal basis of L¯, then we can find an orthogonal matrix P 1 k ∈ Rk×k^ such that V 1 k = U 1 k P 1 k , where U 1 k (resp. V 1 k ) is the n × k matrix whose i–th column is ui^ (resp. vi), for i = 1,... , k. In particular, if A = U ΛU T^ = V ΛV T^ are two spectral decompositions of A with
λi 1 In 1 λi 2 In 2
... λil Inl
where λi 1 ,... , λil are the distinct eigenvalues of A, Ik denotes a k × k identity matrix, and n 1 + n 2 + · · · + nl = n, then there exists an orthogonal matrix P with the block diagonal structure
Pn 1 Pn 2
... Pnl
where Pnj is an nj × nj orthogonal matrix for j = 1,... , l, such that V = U P. Now, suppose that we order the eigenvalues of A as λ 1 ≥ λ 2 ≥ · · · ≥ λn. Then, the Courant– Fischer theorem states that the k–th largest eigenvalue λk, where k = 1,... , n, can be found by solving the following optimization problems:
λk = min w^1 ,...,wk−^1 ∈Rn^ x 6 =max 0 ,x∈Rn x⊥w^1 ,...,wk−^1
xT^ Ax xT^ x
= max w^1 ,...,wn−k^ ∈Rn^ x 6 =min 0 ,x∈Rn x⊥w^1 ,...,wn−k
xT^ Ax xT^ x
By definition, a real positive semidefinite matrix is symmetric, and hence it has the properties listed above. However, much more can be said about such matrices. For instance, the following statements are equivalent for an n × n real symmetric matrix A:
(a) A is positive semidefinite.
(b) All the eigenvalues of A are non–negative.
(c) There exists a unique n × n positive semidefinite matrix A^1 /^2 such that A = A^1 /^2 A^1 /^2.
(d) There exists an k × n matrix B, where k = rank(A), such that A = BT^ B.
Similarly, the following statements are equivalent for an n × n real symmetric matrix A:
(a) A is positive definite.
(b) A−^1 exists and is positive definite.
(c) All the eigenvalues of A are positive.
(d) There exists a unique n × n positive definite matrix A^1 /^2 such that A = A^1 /^2 A^1 /^2.
Sometimes it would be useful to have a criterion for determining the positive semidefiniteness of a matrix from a block partitioning of the matrix. Here is one such criterion. Let
be an n × n real symmetric matrix, where both X and Z are square. Suppose that Z is invertible. Then, the Schur complement of the matrix A is defined as the matrix SA = X − Y Z−^1 Y T^. If Z Â 0 , then it can be shown that A º 0 iff X º 0 and SA º 0. There is of course nothing special about the block Z. If X is invertible, then we can similarly define the Schur complement of A as S′ A = Z − Y T^ X−^1 Y. If X Â 0 , then we have A º 0 iff Z º 0 and S A′ º 0.
Let A be an m × n real matrix of rank r ≥ 1. Then, there exist orthogonal matrices U ∈ Rm×m and V ∈ Rn×n^ such that A = U ΛV T^ , (4)
where Λ ∈ Rm×n^ has Λij = 0 for i 6 = j and Λ 11 ≥ Λ 22 ≥ · · · ≥ Λrr > Λr+1,r+1 = · · · = Λqq = 0 with q = min{m, n}. The representation (4) is called the Singular Value Decomposition (SVD) of A; cf. (2). The entries Λ 11 ,... , Λqq are called the singular values of A, and the columns of U (resp. V ) are called the left (resp. right) singular vectors of A. For notational convenience, we write σi ≡ Λii for i = 1,... , q. Note that (4) can be equivalently written as
∑^ r
i=
σiui(vi)T^ ,
where ui^ (resp. vi) is the i–th column of the matrix U (resp. V ), for i = 1,... , r. The rank of A is equal to the number of non–zero singular values. Now, suppose that we order the singular values of A as σ 1 ≥ σ 2 ≥ · · · ≥ σq, where q = min{m, n}. Then, the Courant–Fischer theorem states that the k–th largest singular value σk, where k = 1,... , q, can be found by solving the following optimization problems:
σk = min w^1 ,...,wk−^1 ∈Rn^ x 6 =max 0 ,x∈Rn x⊥w^1 ,...,wk−^1
‖Ax‖ 2 ‖x‖ 2 = max w^1 ,...,wn−k^ ∈Rn^ x 6 =min 0 ,x∈Rn x⊥w^1 ,...,wn−k
‖Ax‖ 2 ‖x‖ 2
The optimization problems (3) and (5) suggest that singular value and eigenvalue are closely related notions. Indeed, if A is an m × n real matrix, then
λk(AT^ A) = λk(AAT^ ) = σ k^2 (A) for k = 1,... , q,
where q = min{m, n}. Moreover, the columns of U and V are the eigenvectors of AAT^ and AT^ A, respectively. In particular, our discussion in Section 2.1 implies that the set of singular values of A is unique, but the sets of left and right singular vectors are not. Finally, we note that the largest singular value function induces a matrix norm, which is known as the spectral norm and is sometimes denoted by ‖A‖ 2 = σ 1 (A).