

































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
coding_theory-lecture-notes
Typology: Study notes
1 / 73
This page cannot be seen from the preview
Don't miss anything!
Abstract These are lecture notes for an advanced undergraduate (and beginning graduate) course in Coding Theory in the Computer Science Department at Bar-Ilan University. These notes contain the technical material covered but do not include much of the motivation and discussion that is given in the lectures. It is therefore not intended for self study, and is not a replacement for what we cover in class. This is a first draft of the notes and they may therefore contain errors.
∗These lecture notes are based on notes taken by Alon Levy in 2008. We thank Alon for his work.
i
iv
1 Introduction
The basic problem of coding theory is that of communication over an unreliable channel that results in errors in the transmitted message. It is worthwhile noting that all communication channels have errors, and thus codes are widely used. In fact, they are not just used for network communication, USB channels, satellite communication and so on, but also in disks and other physical media which are also prone to errors. In addition to their practical application, coding theory has many applications in the theory of computer science. As such it is a topic that is of interest to both practitioners and theoreticians.
Examples:
The repetition code demonstrates that the coding problem can be solved in principal. However, the problem with this code is that it is extremely wasteful.
The main questions of coding theory:
We now proceed to the basic definitions of codes.
Definition 1.1 Let A = {a 1 ,... , aq} be an alphabet; we call the ai values symbols. A block code C of length n over A is a subset of An. A vector c ∈ C is called a codeword. The number of elements in C, denoted |C|, is called the size of the code. A code of length n and size M is called an (n, M )-code. A code over A = { 0 , 1 } is called a binary code and a code over A = { 0 , 1 , 2 } is called a ternary code.
Remark 1.2 We will almost exclusively talk about “sending a codeword c” and then finding the codeword c that was originally sent given a vector x obtained by introducing errors into c. This may seem strange at first since we ignore the problem of mapping a message m into a codeword c and then finding m again from c. As we will see later, this is typically not a problem (especially for linear codes) and thus the mapping of original messages to codewords and back is not a concern.
The rate of a code is a measure of its efficiency. Formally:
Definition 1.3 Let C be an (n, M )-code over an alphabet of size q. Then, the rate of C is defined by
rate(C) =
logq M n
Restating what we have discussed above, the aim of coding theory is to construct a code with a short n, and large M and d; equivalently, the aim is to construct a code with a rate that is as close to 1 as possible and with d as large as possible. We now show a connection between the distance of a code and the possibility of detecting and correcting errors.
Definition 1.8 Let C be a code of length n over alphabet A.
Theorem 1.
The model that we have presented until now is a “worst-case model”. Specifically, what interests us is the amount of errors that we can correct and we are not interested in how or where these errors occur. This is the model that we will refer to in most of this course and was the model introduced by Hamming. However, there is another model (introduced by Shannon) which considers probabilistic errors. We will use this model later on (in particular in order to prove Shannon’s bounds). In any case, as we will see here, there is a close connection between the two models.
Definition 1.10 A communication channel is comprised of an alphabet A = {a 1 ,... , aq } and a set of forward channel probabilities of the form Pr[aj received | ai was sent] such that for every i:
∑^ q
j=
Pr[aj received | ai was sent] = 1
A communication channel is memoryless if for all vectors x = x 1... xn and c = c 1... cn it holds that
Pr[x received | c was sent] =
∏^ n
i=
Pr[xi received | ci was sent]
Note that in a memoryless channel, all errors are independent of each other. This is not a realistic model but is a useful abstraction. We now consider additional simplifications:
Definition 1.11 A symmetric channel is a memoryless communication channel for which there exists a p < (^12) such that for every i, j ∈ { 0 , 1 }n^ with i 6 = j it holds that
∑^ q
j=1(j 6 =i)
Pr[aj received | ai was sent] = p
Note that in a symmetric channel, every symbol has the same probability of error. In addition, if a symbol is received with error, then the probability that it is changed to any given symbol is the same as it being changed to any other symbol. A binary symmetric channel has two probabilities: Pr[1 received | 0 was sent] = Pr[0 received | 1 was sent] = p Pr[1 received | 1 was sent] = Pr[0 received | 0 was sent] = 1 − p
The probability p is called the crossover probability.
Maximum likelihood decoding. In this probabilistic model, the decoding rule is also a probabilistic one:
Definition 1.12 Let C be a code of length n over an alphabet A. The maximum likelihood decoding rule states that every x ∈ An^ is decoded to cx ∈ C when
Pr[x received | cx was sent] = max c∈C Pr[x received | c was sent]
If there exist more than one c with this maximum probability, then ⊥ is returned.
We now show a close connection between maximum likelihood decoding in this probabilistic model, and nearest neighbor decoding.
Theorem 1.13 In a binary symmetric channel with p < 12 , maximum likelihood decoding is equivalent to nearest neighbor decoding.
Proof: Let C be a code and x the received word. Then for every c and for every i we have that d(x, c) = i if and only if Pr[x received | c was sent] = pi(1 − p)n−i
Since p < 12 we have that 1 − p p> 1. Thus
pi(1 − p)n−i^ = pi+1(1 − p)n−i−^1 ·
1 − p p
pi+1(1 − p)n−i−^1.
This implies that p^0 (1 − p)n^ > p(1 − p)n−^1 >... > pn(1 − p)^0
and so the nearest neighbor yields the codeword that maximizes the required probability.
coordinates that are not zero. That is, wt(x) def = d(x, 0).
Notation. For y = (y 1 ,... , yn), x = (x 1 ,... , xn), define x ∗ y = (x 1 y 1 ,... , xnyn)
Proof: Looking at each coordinate separately we have:
wt(x + y) wt(x) wt(y) wt(x ∗ y) 0 0 0 0 1 0 1 0 1 1 0 0 0 1 1 1
The lemma is obtained by summing over all coordinates.
wt(x) + wt(y) ≥ wt(x + y) ≥ wt(x) − wt(y)
We leave the proof of this lemma as an exercise.
Definition 2.10 Let C be a code (not necessarily linear). The weight of C, denoted wt(C), is defined by
wt(C) = min c∈C;c 6 =
{wt(c)}
The following theorem only holds for linear codes:
Proof: Let d = d(C). By the definition of the distance of a code, there exist x′, y′^ ∈ C such that d(x′, y′) = d. Then by linearity we have that x′^ − y′^ ∈ C. Now, the weight of the codeword x′^ − y′^ is d and so we have found a codeword with weight d implying that wt(C) ≤ d = d(C). Now, let w = wt(C). By the definition of weight, there exists a codeword c ∈ C such that d(c, 0) = wt(C) = w. Now, since 0 ∈ C it follows that there exist two codewords in C with distance w from each other. Thus, d(C) ≤ w = wt(C). We have shown that wt(C) ≤ d(C) and d(C) ≤ wt(C). Thus, d(C) = wt(C), as required. The above theorem is interesting. In particular, it gives us the first step forward for determining the distance of a code. Previously, in order to calculate the distance of a code, we would have to look at all pairs of codewords and measure their distance (this is quadratic in the size of the code). Using Theorem 2.11 it suffices to look at each codeword in isolation and measure its weight (this is thus linear in the size of the code).
Definition 2.
Remarks:
Definition 2.
Note that the dimensions of X above are k-by-(n − k), and the dimensions of Y are (n − k)-by-k.
if its rows are linearly independent and H · GT^ = 0.
c =
∑^ k
i=
λiri
ri ∈ C). This implies that v · GT^ = 0, as required. For the other direction, if v · GT^ = 0 then for every i it holds that v · ri = 0. Let c be any codeword and
∑k i=1 λi^ ·^ ri. It follows that
v · c =
∑^ k
i=
v · (λi · ri) =
∑^ k
i=
λi · (v · ri) =
∑^ k
i=
λi · 0 = 0.
This holds for every c ∈ C and thus v ∈ C⊥.
(n−k)×n q.^ If^ H^ is a parity check matrix then it’s rows are linearly independent and in C⊥. Thus, by what we have proven it holds that H · GT^ = 0. For the other direction, if H · GT^ = 0 then for every row it holds that v · GT^ = 0 and so every row is in C⊥^ (by the first part of the proof). Since the rows of the matrix are linearly independent and since the matrix is of the correct dimension, we conclude that H is a parity check matrix for C, as required.
An equivalent formulation: Lemma 2.14 can be equivalently worded as follows.
Let C be a linear [n, k]-code with a parity-check matrix H. Then v ∈ C if and only if v · HT^ = 0.
This equivalent formulation immediately yields an efficient algorithm for error detection.
Definition 2.18 Two (n, M )-codes are equivalent if one can be derived from the other by a permutation of the coordinates and multiplication of any specific coordinate by a non-zero scalar.
Note that permuting the coordinates or multiplying by a non-zero scalar makes no difference to the parameters of the code. In this sense, the codes are therefore equivalent.
Theorem 2.19 Every linear code C is equivalent to a linear code C′^ with a generator matrix in standard form.
Proof: Let G be a generator matrix for C. Then, using Gaussian elimination, find the reduced row echelon form of G. (In this form, every row begins with a one, and there are zeroes below and above it). Given this reduced matrix, we apply a permutation to the columns so that the identity matrix appears in the first k rows. The code generated by the resulting matrix is equivalent to the original one.
First, we remark that it is possible to always work with standard form matrices. In particular, given a generator G for a linear code, we can efficiently compute G′^ of standard form using Gaussian elimination. We can then compute H′^ as we saw previously. Thus, given any generator matrix it is possible to efficiently find its standard-form parity-check matrix (or more exactly the standard-form parity-check matrix of an equivalent code). Note that given this parity-check matrix, it is then possible to compute d by looking for the smallest d for which there are d dependent columns. Unfortunately, we do not have any efficient algorithms for this last task.
∑k
follows that
∑k i=1 λivi^ ∈^ C. Therefore, the mapping
(λ 1 ,... , λk) →
∑^ k
i=
λivi
EC (λ) = λ · G
Observe that if G is in standard form, then EC (λ) = λ · (Ik | X) = (λ, λ · X). Thus, it is trivial to map a codeword EC (λ) back to its original message λ (just take its first k coordinates). Specifically, if we are only interested in error detection, then its possible to first compute xHT^ ; if the result equals 0 then just output the first k coordinates. The above justifies why we are only interested in the following decoding problem:
The problems of encoding an original message into a codeword and retrieving it back from the codeword are trivial (at least for linear codes).
Cosets – background. We recall the concept of cosets from algebra.
by u is defined to be the set C + u = {c + u | c ∈ C}
Example. Let C = { 000 , 101 , 010 , 111 } be a binary linear code. Then,
C + 000 = C, C + 010 = { 010 , 111 , 000 , 101 } = C, and C + 001 = { 001 , 100 , 011 , 110 }
Proof:
cosets are disjoint. Thus there are qn/qk^ = qn−k^ different cosets.
Remark. The above theorem shows that the cosets of C constitution a partitioning of the vector space
to the codeword u − v. The question remaining is which v should be taken?
Definition 2.22 The leader of a coset is defined to be the word with the smallest hamming weight in the coset.
The above yields a simple algorithm. Let C be a linear code. Assume that the code word v was sent and the word w received. The error word is e = w − v ∈ C + w. Therefore, given the vector w, we search for the word of the smallest weight in C + w. Stated differently, given w we find the leader e of the coset C + w and output v = w − e ∈ C. The problem with this method is that it requires building and storing an array of all the cosets, and this is very expensive.
Constructing an SDA. Naively, an SDA can be built in time qn^ by traversing over all the cosets and computing the leader and its syndrome. A faster procedure for small d is as follows:
⌊ (^) d− 1 2
define e to be the leader of a coset (it doesn’t matter what the coset is).
The complexity of this algorithm is linear in the number of words of weight at most (d − 1)/2. Thus, it takes time and memory ⌈ (^) ∑d− 2 1 ⌉
i=
n i
(q − 1)i.
This can also be upper bound by ( (^) n ⌈ d− 1 2
⌉
q⌈^
d− 2 1 ⌉
because you write all possible combinations of symbols in all possible ⌈ d− 2 1 ⌉ places. Importantly, if d is a constant, then this procedure runs in polynomial time.^1 In order to justify that this suffices, we remark that every coset has at most one leader with weight less than or equal to
⌊ (^) d− 1 2
. (Otherwise we can take the difference of the two vectors, which must be in C since they are in the same coset. However, by their assumed weight, the weight of the difference must be smaller than d in contradiction to the assumption regarding the distance of C). Thus, no coset leader could have been missed. On the other hand, if there is a coset with a leader of weight larger than
⌊ (^) d− 1 2
then that number of errors cannot anyway be corrected, and so there is no reason to store it in the table.
A linear code is a vector subspace. Each such code has a generator and parity-check matrix which can be used for encoding messages, computing error detection, and computing the distance of the code. There also exists an algorithm for decoding that is, unfortunately, not polynomial-time in general. However, for d that is constant, it does run in polynomial time. As we proceed in the course, we will see specific linear codes that have efficient decoding procedures.
(^1) We remark that this SDA is smaller than the previous one. This is possible because not every coset is necessarily relevant when we consider only (d − 1)/2 errors.
3 Bounds
For an (n, M, d)-code, the larger the value of M the more efficient the code. We now turn to study bounds on the size of M. In this context, a lower bound on M states that it is possible to construct codes that are at least as good as given in the bound, whereas an upper bound states that no code with M this large exists (for some given n and d). Observe that lower bounds and upper bounds here have a reverse meaning as in algorithms (here a lower bound is “good news” whereas an upper bound is “bad news”). Our aim is to find an optimal balance between parameters. We note that sometimes there are different goals that yield different optimality criteria.
Recall that the rate of the code is defined to be R(C) = logq M n ; for a linear [n, k]-code we can equivalently write R(C) = k n. We now define a similar notion that combines the distance and length:
δ(C) =
d − 1 n
We remark that relative distance is often defined as d/n; however taking (d − 1)/n makes some of the calculations simpler.
Examples:
Definition 3.2 Let A be an alphabet of size q > 1 and fix n, d. We define
Aq (n, d) = max{M | there exists an (n, M, d)-code over A}
An (n, M, d)-code for which M = Aq (n, d) is called an optimal code.
Definition 3.3 Let q > 1 be a prime power and fix n, d. We define
A linear [n, k, d]-code for which qk^ = Bq (n, d) is called an optimal linear code.
We remark that Aq (n, d) and Bq(n, d) depend only on the size q of the alphabet, and not on the alphabet itself.
Theorem 3.4 Let q ≥ 2 be a prime power. Then, for every n,
Proof:
Motivation for the bound. Let C be a code and draw a sphere of radius d − 1 around every codeword. In an optimal code, it must be the case that the spheres include all of An; otherwise, there exists a word that can be added to the code that would be of distance d from all existing codewords. This yields a larger code, in contradiction to the assumed optimality. Thus, the number of codewords in an optimal code is at least the size of the number of spheres that it takes to cover the entire space An. Formally:
1 ≤ d ≤ n it holds that
Aq(n, d) ≥ qn V (^) qn (d − 1)
Proof: Let C = {c 1 ,... , cM } be an optimal (n, M, d)-code over an alphabet of size q. That is, M = Aq (n, d). Since C is optimal, there does not exist any word in An^ of distance at least d from every ci ∈ C (otherwise, we could add this word to the code without reducing the distance, in contradiction to the optimality of the code). Thus, for every x ∈ An^ there exists at least one ci ∈ C such that x ∈ SA(ci, d − 1). This implies that
An^ ⊆
i=
SA(ci, d − 1)
and so
qn^ ≤
i=
|SA(ci, d − 1)| = M · V (^) qn (d − 1)
Since C is optimal we have M = Aq(n, d) and hence qn^ ≤ Aq (n, d) · V (^) qn (d − 1), implying that
Aq(n, d) ≥ qn V (^) qn (d − 1)
We now prove an upper bound, limiting the maximum possible size of any code. The idea behind the upper bound is that if we place spheres of radius ⌊ d− 2 1 ⌋ around every codeword, then the spheres must be disjoint
(otherwise there exists a word that is at distance at most ⌊ d− 2 1 ⌋ from two codewords and by the triangle inequality there are two codewords at distance at most d − 1 from each other). The bound is thus derived by computing how many disjoint spheres of this size can be “packed” into the space.
it holds that Aq (n, d) ≤
qn V (^) qn
(⌊ (^) d− 1 2
Proof: Let C = {c 1 ,... , cM } be an optimal code with |A| = q, and let e =
⌊ (^) d− 1 2
. Since d(C) = d the spheres SA(ci, e) are all disjoint. Therefore
⋃^ M
i=
SA(ci, e) ⊆ An
where the union is a disjoint one. Therefore:
M · V (^) qn
d − 1 2
≤ qn.
Using now the fact that M = Aq (n, d) we conclude that
Aq (n, d) ≤
qn Vq n
(⌊ (^) d− 1 2
We stress that it is impossible to prove the existence of a code in this way. This is due to the fact that a word that is not in any of the spheres (and so is at distance greater than (d − 1)/2 from all codewords) cannot necessarily be added to the code.
qn V (^) qn (d − 1)
≤ Aq (n, d) ≤
qn V (^) qn
(⌊ (^) d− 1 2
Note that there is a huge gap between these two bounds.
We now show that there exist codes that achieve the Hamming (sphere-packing) upper bound. Unfortunately, the codes that we show do not exist for all parameters.
Definition 3.11 A code C over an alphabet of size q with parameters (n, M, d) is called a perfect code if
qn V (^) qn
(⌊ (^) d− 1 2
We remark that every perfect code is an optimal code, but not necessarily the other way around.
3.4.1 The Binary Hamming Code
Definition 3.12 Let r ≥ 2 and let C be a binary linear code with n = 2r^ − 1 whose parity-check matrix H
of length 2 r^ − 1 , denoted Ham(r, 2).
The above definition does not specify the order of the columns and thus there are many Hamming codes. Before proceeding we remark that the matrix H specified in the definition is “legal” because it contains all of the r vectors of weight 1. Thus, H contains Ir and so its rows are linearly independent(since the columns
Example. We write the parity-check matrix for Ham(r, 2).
Proposition 3.