




























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Explain in define the statistics and probability, discrete the probability spaces, basic rule of probability, inclusion and exclusion formulas, mean, variance and central limit theorems.
Typology: Study Guides, Projects, Research
1 / 110
This page cannot be seen from the preview
Don't miss anything!
MANJUNATH KRISHNAPUR
Sometimes statistics is described as the art or science of decision making in the face of uncertainty. Here are some examples to illustrate what it means.
Example 1. Recall the apocryphal story of two women who go to King Solomon with a child, each claiming that it is her own daughter. The solution according to the story uses human psychology and is not relevant to recall here. But is this a reasonable question that the king can decide? Daughters resemble mothers to varying degrees, and one cannot be absolutely sure of guessing correctly. On the other hand, by comparing various features of the child with those of the two women, there is certainly a decent chance to guess correctly. If we could always get the right answer, or if we could never get it right, the question would not have been interesting. However, here we have uncertainty, but there is a decent chance of getting the right answer. That makes it interesting - for example, we can have a debate between eyeists and nosists as to whether it is better to compare the eyes or the noses in arriving at a decision.
Example 2. The IISc cricket team meets the Basavanagudi cricket club for a match. Unfortunately, the Basavanagudi team forgot to bring a coin to toss. The IISc captain helpfully offers his coin, but can he be trusted? What if he spent the previous night doctoring the coin so that it falls on one side with probability 3 / 4 (or some other number)? Instead of cricket, they could spend their time on the more interesting question of checking if the coin is fair or biased. Here is one way. If the coin is fair, in a large number of tosses, common sense suggests that we should get about equal number of heads and tails. So they toss the coin 100 times. If the number of heads is exactly 50, perhaps they will agree that it is fair. If the number of heads is 90, perhaps they will agree that it is biased. What if the number of heads is 60? Or 35? Where and on what basis to draw the line between fair and biased? Again we are faced with the question of making decision in the face of uncertainty.
Example 3. A psychic claims to have divine visions unavailable to most of us. You are assigned the task of testing her claims. You take a standard deck of cards, shuffle it well and keep it face down on the table. The psychic writes down the list of cards in some order - whatever her vision tells her about how the deck is ordered. Then you count the number of correct guesses. If the number is 1 or 2, perhaps you can dismiss her claims. If it is 45, perhaps you ought to be take her seriously. Again, where to draw the line? The logic is this. Roughly one may say that surprise is just the name for our reaction to an event that we ´a priori thought had low probability. Thus, we approach the experiment with the belief that the psychic is just guessing at random, and if the results are such that under that random-guess- hypothesis they have very small probability, then we are willing to discard our preconception and accept that she is a psychic. How low a probability is surprising? In the context of psychics, let us say, 1 / 10000. Once we fix that, we must find a number m ≤ 52 such that by pure guessing, the probability to get more than
m correct guesses is less that 1 / 10000. Then we tell the psychic that if she gets more than m correct guesses, we accept her claim, and otherwise, reject her claim. This raises the simple (and you can do it yourself)
Question 4. For a deck of 52 cards, find the number m such that
P(by random guessing we get more than m correct guesses) < 100001.
Summary: There are many situations in real life where one is required to make decisions under uncertainty. A general template for the answer could be to fix a small number that we allow as the probability of error, and deduce thresholds based on it. This brings us to the question of computing probabilities in various situations.
Probability: Probability theory is a branch of pure mathematics, and forms the theoretical basis of statistics. In itself, probability theory has some basic objects and their relations (like real num- bers, addition etc for analysis) and it makes no pretense of saying anything about the real world. Axioms are given and theorems are then deduced about these objects, just as in any other part of mathematics. But a very important aspect of probability is that it is applicable. In other words, there are many situations in which it is reasonable to take a model in probability In the example above, to compute the probability one must make the assumption that the deck of cards was completely shuffled. In other words, all possible 52! orders of the 52 cards are assumed to be equally likely. Whether this assumption is reasonable or not depends on how well the card was shuffled, whether the psychic was able to get a peek at the cards, whether some insider is informing the psychic of the cards etc. All these are non-mathematical questions, and must be decided on other basis.
However...: Probability and statistics are very relevant in many situations that do not involve any uncertainty on the face of it. Here are some examples.
Example 5. Compression of data. Large files in a computer can be compressed to a .zip format and uncompressed when necessary. How is it possible to compress data like this? To give a very simple analogy, consider a long English word like invertebrate. If we take a novel and replace every occurrence of this word with “zqz”, then it is certainly possible to recover the original novel (since “zqz” does not occur anywhere else). But the reduction in size by replacing the 12-letter word by the 3-letter word is not much, since the word invertebrate does not occur often. Instead, if we replace the 4-letter word “then” by “zqz”, then the total reduction obtained may be much higher, as the word “then” occurs quite often. This suggests the following optimal way to represent words in English. The 26 most frequent words will be represented by single letters. The next 26 × 26 most frequent words will be repre- sented by two letter words, the next 26 × 26 × 26 most frequent words by three-letter words, etc.
Example 10. Fix a positive integer n. Let
Ω = { 0 , 1 }n^ = {ω : ω = (ω 1 ,... , ωn) with ωi = 0 or 1 for each i ≤ n}.
Let pω = 2−n^ for each ω ∈ Ω. Since Ω has 2 n^ elements, it follows that this is a valid assignment of elementary probabilities. There are 2 #Ω^ = 2^2 n events. One example is Ak = {ω : ω ∈ Ω and ω 1 +... + ωn = k} where k is some fixed integer. In words, Ak consists of those n-tuples of zeros and ones that have a total of k many ones. Since there are
(n k
ways to choose where to place these ones, we see that #Ak =
(n k
Consequently,
P{Ak} =
ω∈Ak
pω = # 2 An k=
(n k
2 −n^ if 0 ≤ k ≤ n, 0 otherwise.
It will be convenient to adopt the notation that
(a b
= 0 if a, b are positive integers and if b > a or if b < 0. Then we can simply write P{Ak} =
(n k
2 −n^ without having to split the values of k into cases.
Example 11. Fix two positive integers r and m. Let
Ω = {ω : ω = (ω 1 ,... , ωr) with 1 ≤ ωi ≤ m for each i ≤ r}.
The cardinality of Ω is mr^ (since each co-ordinate ωi can take one of m values). Hence, if we set pω = m−r^ for each ω ∈ Ω, we get a valid probability space. Of course, there are 2 mr many events, which is quite large even for small numbers like m = 3 and r = 4. Some interesting events are A = {ω : ωr = 1}, B = {ω : ωi 6 = 1 for all i}, C = {ω : ωi 6 = ωj if i 6 = j}. The reason why these are interesting will be explained later. Because of equal elementary probabilities, the probability of an event S is just #S/mr.
r mr^ = (1^ −^ m^1 )r.
2.1. Probability in the real world. In real life, there are often situations where there are several possible outcomes but which one will occur is unpredictable in some way. For example, when we toss a coin, we may get heads or tails. In such cases we use words such as probability or chance, event or happening, randomness etc. What is the relationship between the intuitive and mathematical meanings of words such as probability or chance? In a given physical situation, we choose one out of all possible probability spaces that we think captures best the chance happenings in the situation. The chosen probability space is then called a model or a probability model for the given situation. Once the model has been chosen, calculation of probabilities of events therein is a mathematical problem. Whether the model really captures the given situation, or whether the model is inadequate and over-simplified is a non-mathematical question. Nevertheless that is an important question, and can be answered by observing the real life situation and comparing the outcomes with predictions made using the model^2. Now we describe several “random experiments” (a non-mathematical term to indicate a “real- life” phenomenon that is supposed to involve chance happenings) in which the previously given examples of probability spaces arise. Describing the probability space is the first step in any prob- ability problem.
Example 12. Physical situation: Toss a coin. Randomness enters because we believe that the coin may turn up head or tail and that it is inherently unpredictable.
The corresponding probability model: Since there are two outcomes, the sample space Ω = { 0 , 1 } (where we use 1 for heads and 2 for tails) is a clear choice. What about elementary probabilities? Under the equal chance hypothesis, we may take p 0 = p 1 = 12. Then we have a probability model for the coin toss. If the coin was not fair, we would change the model by keeping Ω = { 0 , 1 } as before but letting p 1 = p and p 0 = 1 − p where the parameter p ∈ [0, 1] is fixed. Which model is correct? If the coin looks very symmetrical, then the two sides are equally likely to turn up, so the first model where p 1 = p 0 = 12 is reasonable. However, if the coin looks irregular, then theoretical considerations are usually inadequate to arrive at the value of p. Experimenting with the coin (by tossing it a large number of times) is the only way. There is always an approximation in going from the real-world to a mathematical model. For example, the model above ignores the possibility that the coin can land on its side. If the coin is very thick, then it might be closer to a cylinder which can land in three ways and then we would have to modify the model...
(^2) Roughly speaking we may divide the course into two parts according to these two issues. In the probability part of the course, we shall take many such models for granted and learn how to calculate or approximately calculate probabilities. In the statistics part of the course we shall see some methods by which we can arrive at such models, or test the validity of a proposed model.
A possible probability model: Let there be n genes in each human, and each of the genes can take two possible values (Mendel’s “factors”), which we denote as 0 or 1. Then, let Ω = { 0 , 1 }n^ = {x = (x 1 ,... , xn) : xi = 0 or 1 }. In this sense, each human being can be encoded as a vector in { 0 , 1 }n. To assign probabilities, one must know the parents. Let the two parents have gene sequences a = (a 1 ,... , an) and =¯ (b 1 ,... , bn). Then the possible offsprings gene sequences are in the set Ω 0 := {x ∈ { 0 , 1 }n^ : xi = ai or bi, for each i ≤ n}. Let L := #{i : ai 6 = bi}. One possible assignment of probabilities is that each of these offsprings is equally likely. In that case we can capture the situation in the following probability models.
(1) Let Ω 0 be the sample space and let px = 2−L^ for each x ∈ Ω 0. (2) Let Ω be the sample space and let
px =
2 −L^ if x ∈ Ω 0 0 if x 6 ∈ Ω 0.
The second one has the advantage that if we change the parent pair, we don’t have to change the sample space, only the elementary probabilities. What are some interesting events? Hypotheti- cally, the susceptibility to a disease X could be determined by the first ten genes, say the person is likely to get the disease if there are at-most four 1 s among the first ten. This would correspond to the event that A = {x ∈ Ω 0 : x 0 +... + x 10 ≤ 4 }. (Caution: As far as I know, reading the genetic sequence to infer about the phenotype is still an impractical task in general).
Reasonable model? There are many simplifications involved here. Firstly, genes are somewhat ill-defined concepts, better defined are nucleotides in the DNA (and even then there are two copies of each gene). Secondly, there are many “errors” in real DNA, even the total number of genes can change, there can be big chunks missing, a whole extra chromosome etc. Thirdly, the assumption that all possible gene-sequences in Ω 0 are equally likely is incorrect - if two genes are physically close to each other in a chromosome, then they are likely to both come from the father or both from the mother. Lastly, if our interest originally was to guess the eventual height of the child or its intelligence, then it is not clear that these are determined by the genes alone (environmental factors such as availability of food etc. also matter). Finally, in case of the problem that Solomon faced, the information about genes of the parents was not available, the model as written would be use.
Remark 14. We have discussed at length the reasonability of the model in this example to indicate the enormous effort needed to find a sufficiently accurate but also reasonably simple probability model for a real-world situation. Henceforth, we shall omit such caveats and simply switch back- and-forth between a real-world situation and a reasonable-looking probability model as if there is no difference between the two. However, thinking about the appropriateness of the chosen models is much encouraged.
Example 15. Toss n coins. We saw this before, but assumed that the coins are fair. Now we do not. The sample space is
Ω = { 0 , 1 }n^ = {ω = (ω 1 ,... , ωn) : ωi = 0 or 1 for each i ≤ n}.
Further we assign pω = α(1) ω 1... α( ωnn). Here α( 0 j )and α( 1 j )are supposed to indicate the probabilities
that the jth coin falls tails up or heads up, respectively. Why did we take the product of α( · j)s and not some other combination? This is a non-mathematical question about what model is suited for the given real-life example. For now, the only justification is that empirically the above model seems to capture the real life situation accurately.
In particular, if the n coins are identical, we may write p = α( 1 j )(for any j) and the elementary probabilities become pω = p
P i ωi^ qn−
P i ωi^ where q = 1 − p. Fix 0 ≤ k ≤ n and let Bk = {ω : ∑ni=1 ωi = k} be the event that we see exactly k heads out of n tosses. Then P(Bk) =
(n k
pkqn−k. If Ak is the event that there are at least k heads, then
P(Ak) = ∑n `=k
(n `
)pqn−
.
Example 16. Toss a coin n times. Again
Ω = { 0 , 1 }n^ = {ω = (ω 1 ,... , ωn) : ωi = 0 or 1 for each i ≤ n}, pω = p
P i ωi qn−
P i ωi .
This is the same probability space that we got for the tossing of n identical looking coins. Implicit is the assumption that once a coin is tossed, for the next toss it is as good as a different coin but with the same p. It is possible to imagine a world where coins retain the memory of what happened before (or as explained before, we can make a “coin” that remembers previous tosses!), in which case this would not be a good model for the given situation. We don’t believe that this is the case for coins in our world, and this can be verified empirically.
Example 17. Shuffle a deck of 52 cards. Ω = S 52 , the set of all permutations^3 of [52] and pπ = (^) 52!^1 for each π ∈ S 52.
Example 18. “Psychic” guesses a deck of cards. The sample space is Ω = S 52 × S 52 and p(π,σ) = 1 /(52!)^2 for each pair (π, σ) of permutations. In a pair (π, σ), the permutation π denotes the actual
(^3) We use the notation [n] to denote the set { 1 , 2 ,... , n}. A permutation of [n] is a vector (i 1 , i 2 ,... , in) where i 1 ,... , in are distinct elements of [n], in other words, they are 1 , 2 ,... , n but in some order. Mathematically, we may define a permutation as a bijection π : [n] → [n]. Indeed, for a bijection π, the numbers π(1),... , π(n) are just 1 , 2 ,... , n in some order.
probabilities pM B^. Or to put it another way, pick the balls one by one and assign them randomly to one of the urns. This suggests that pM B^ is the “right one”. This leaves open the question of whether there is a natural mechanism of assigning balls to urns so that the probabilities pBE^ shows up. No such mechanism has been found. But this probability space does occur in the physical world. If r photons (“indistinguishable balls”) are to occupy m energy levels (“urns”), then empirically it has been verified that the correct probability space is
the second one!^4
Example 23. Sampling with replacement from a population. Define Ω = {ω ∈ [N ]k^ : ωi ∈ [N ] for 1 ≤ i ≤ k} with pω = 1/N k^ for each ω ∈ Ω. Here [N ] is the population (so the size of the population is N ) and the size of the sample is k. Often the language used is of a box with N coupons from which k are drawn with replacement.
Example 24. Sampling without replacement from a population. Now we take Ω = {ω ∈ [N ]k^ : ωi are distinct elem with pω = 1/N (N − 1)... (N − k + 1) for each ω ∈ Ω.
Fix m < N and define the random variable X(ω) = ∑ki=1 1 ωi≤m. If the population [N ] contains a subset, say [m], (could be the subset of people having a certain disease), then X(ω) counts the number of people in the sample who have the disease. Using X one can define events such as A = {ω : X(ω) = } for some
≤ m. If ω ∈ A, then ` of the ωi must be in [m] and the rest in [N ] \ [m]. Hence
(k `
m(m − 1)... (m − + 1)(N − m)(N − m − 1)... (N − m − (k −
) + 1).
As the probabilities are equal for all sample points, we get
(k `
m(m − 1)... (m − + 1)(N − m)(N − m − 1)... (N − m − (k −
) + 1) N (N − 1)... (N − k + 1)
= (N^1 k
(m `
)(N − m k − `
This expression arises whenever the population is subdivided into two parts and we count the number of samples that fall in one of the sub-populations.
(^4) The probabilities pMB (^) and pBE (^) are called Maxwell-Boltzmann statistics and Bose-Einstein statistics. There is a third kind, called Fermi-Dirac statistics which is obeyed by electrons. For general m ≥ r, the sample space is ΩFD = {(1 ,... ,
m) : i = 0 or 1 and
1 +... + `m = r} with equal probabilities for each element. In words, all distinguishable configurations are equally likely, with the added constraint that at most one electron can occupy each energy level.
Example 25. Gibbs measures. Let Ω be a finite set and let H : Ω → R be a function. Fix β ≥ 0. Define Zβ = ∑ ω e−βH(ω)^ and then set pω = (^) Z^1 β e−βH(ω). This is clearly a valid assignment of
probabilities. This is a class of examples from statistical physics. In that context, Ω is the set of all possible states of a system and H(ω) is the energy of the state ω. In mechanics a system settles down to the state with the lowest possible energy, but if there are thermal fluctuations (meaning the ambient temperature is not absolute zero), then the system may also be found in other states, but higher energies are less and less likely. In the above assignment, for two states ω and ω′, we see that pω/pω′ = eβ(H(ω′)−H(ω))^ showing that higher energy states are less probable. When β = 0, we get pω = 1/|Ω|, the uniform distribution on Ω. In statistical physics, β is equated to 1 /κT where T is the temperature and κ is Boltzmann’s constant. Different physical systems are defined by choosing Ω and H differently. Hence this provides a rich class of examples which are of great importance in probability.
It may seem that probability is trivial, since the only problem is to find the sum of pω for ω belonging to event of interest. This is far from the case. The following example is an illustration.
Example 26. Percolation. Fix m, n and consider a rectangle in Z^2 , R = {(i, j) ∈ Z^2 : 0 ≤ i ≤ n, 0 ≤ j ≤ m}. Draw this on the plane along with the grid lines. We see (m + 1)n horizontal edges and (n + 1)m vertical edges. Let E be the set of N = (m + 1)n + (n + 1)m edges and let Ω be the set of all subsets of E. Then |Ω| = 2N^. Let pω = 2−N^ for each ω ∈ Ω. An interesting event is
A = {ω ∈ Ω : the subset of edges in ω connect the top side of R to the bottom side of R}.
This may be thought of as follows. Imagine that each edge is a pipe through which water can flow. However each tube may be blocked or open. ω is the subset of pipes that are open. Now pour water at the top of the rectangle R. Will water trickle down to the bottom? The answer is yes if and only if ω belongs to A. Finding P(A) is a very difficult problem. When n is large and m = 2n, it is expected that P(A) converges to a specific number, but proving it is an open problem as of today!^5
We now give two non-examples.
Example 27. Pick a natural number uniformly at random. The sample space is clearly Ω = N = { 1 , 2 , 3 ,.. .}. The phrase “uniformly at random” suggests that the elementary probabilities should be the same for all elements. That is pi = p for all i ∈ N for some p. If p = 0, then ∑ i∈N pi = 0
(^5) In a very similar problem on a triangular lattice, it was proved by Stanislav Smirnov (2001) for which he won a fields medal. Proof that computing probabilities is not always trivial!
Definition 30. An set Ω is said to be finite if there is an n ∈ N and a bijection from Ω onto [n]. An infinite set Ω is said to be countable if there is a bijection from N onto Ω.
Generally, the word countable also includes finite sets. If Ω is an infinite countable set, then using any bijection f : N → Ω, we can list the elements of Ω as a sequence
f (1), f (2), f (3)...
so that each element of Ω occurs exactly once in the sequence. Conversely, if you can write the elements of Ω as a sequence, it defines an injective function from natural numbers onto Ω (send 1 to the first element of the sequence, 2 to the second element etc).
Example 31. The set of integers Z is countable. Define f : N → Z by
f (n) =
(^12) n if n is even. − 12 (n − 1) if n is odd.
It is clear that f maps N into Z. Check that it is one-one and onto. Thus, we have found a bijection from N onto Z which shows that Z is countable. This function is a formal way of saying the we can list the elements of Z as
0 , +1, − 1 , +2, − 2 , +3, − 3 ,....
It is obvious, but good to realize there are wrong ways to try writing such a list. For example, if you list all the negative integers first, as − 1 , − 2 , − 3 ,.. ., then you will never arrive at 0 or 1 , and hence the list is incomplete!
Example 32. The set N × N is countable. Rather than give a formula, we list the elements of Z × Z as follows.
(1, 1), (1, 2), (2, 1), (1, 3), (2, 2), (3, 1), (1, 4), (2, 3), (3, 2), (4, 1),...
The pattern should be clear. Use this list to define a bijection from N onto N × N and hence show that N × N is countable.
Example 33. The set Z × Z is countable. This follows from the first two examples. Indeed, we have a bijection f : N → Z and a bijection g : N × N → N. Define a bijection F : N × N → Z × Z by composing them, i.e., F (n, m) = f (g(n)). Then, F is one-one and onto. This shows that Z × Z is indeed countable.
Example 34. The set of rational numbers Q is countable. Recall that rational numbers other than 0 can be written uniquely in the form p/q where p is a non-zero integer and q is a strictly positive
integer, and there are no common factors of p and q (this is called the lowest form of the rational number r). Consider the map f : Q → Z × Z defined by
f (r) =
(0, 1) if r = 0 (p, q) if r = pq in the lowest form.
Clearly, f is injective and hence, it appears that Z × Z is a “bigger set” than Q. Next define the function g : Z → Q by setting g(n) = n. This is also injective and hence we may say that “Q is a bigger set than N. But we have already seen that N and Z × Z are in bijection with each other, in that sense, they are of equal size. Since Q is sandwiched between the two it ought to be true that Q has the same size as N, and thus countable. This reasoning is not incorrect, but an argument is needed to make it an honest proof. This is indicated in the Schr ¨oder-Bernstein theorem stated later. Use that to fill the gap in the above argument, or alternately, try to directly find a bijection between Q and N.
Example 35. The set of real numbers R is not countable. The extraordinarily proof of this fact is due to Cantor, and the core idea, called the diagonalization trick is one that can be used in many other contexts. Consider any function f : N → [0, 1]. We show that it is not onto, and hence not a bijec- tion. Indeed, use the decimal expansion to write a number x ∈ [0, 1] as 0 .x 1 x 2 x 3... where xi ∈ { 0 , 1 ,... , 9 }. Write the decimal expansion for each of the numbers f (1), f (2), f (3),.... as follows.
f (1) = 0.X 1 , 1 X 1 , 2 X 1 , 3... f (2) = 0.X 2 , 1 X 2 , 2 X 2 , 3... f (3) = 0.X 3 , 1 X 3 , 2 X 3 , 3... · · · · · · · · · · · ·
Let Y 1 , Y 2 , Y 3 ,... be any numbers in { 0 , 1 ,... , 9 } with the only condition that Yi 6 = Xi,i. Clearly it is possible to choose Yi like this. Now consider the number y = 0.Y 1 Y 2 Y 3... which is a number in [0, 1]. However, it does not occur in the above list. Indeed, y disagrees with f (1) in the first decimal place, disagrees with f (2) in the second decimal place etc. Thus, y 6 = f (i) for any i ∈ N which means that f is not onto [0, 1]. Thus, no function f : N → [0, 1] is onto, and hence there is no bijection from N onto [0, 1] and hence [0, 1] is not countable. Obviously, if there is no onto function onto [0, 1], there cannot be an onto function onto R. Thus, R is also uncountable.
Is it necessarily true that L = L′?
Case I - Non-negative f : We claim that for any two bijections ϕ and ψ as above, the limits are the same (this means that the limits are +∞ in both cases, or the same finite number in both cases). Indeed, fix any n and recall that xn = f (ω 1 ) +... + f (ωn). Now, ψ is surjective, hence there is some m (possibly very large) such that {ω 1 ,... , ωn} ⊆ {ω′ 1 ,... , ω m′}. Now, we use the non-negativity of f to observe that
f (ω 1 ) +... + f (ωn) ≤ f (ω 1 ′) +... + f (ω′ m).
This is the same as xn ≤ ym. Since yk are non-decreasing, it follows that xn ≤ ym ≤ ym+1 ≤ ym+2.. ., which implies that xn ≤ L′. Now let n → ∞ and conclude that L ≤ L′. Repeat the argument with the roles of ϕ and ψ reversed to conclude that L′^ ≤ L. Hence L = L′, as desired to show. In conclusion, for non-negative functions f , we can assign an unambiguous meaning to ∑ ω f (ω) by setting it equal to limn→∞(f (ϕ(1) +... + f (ϕ(n))), where ϕ : N → Ω is any bijection (the point being that the limit does not depend on the bijection chosen), and the limit here may be allowed to be +∞ (in which case we say that the sum does not converge).
Case II - General f : Ω → R : The above argument fails if f is allowed to take both positive and negative values (why?). In fact, the answers L and L′^ from different bijections may be completely different. An example is given later to illustrate this point. For now, here is how we deal with this problem. For a real number x we introduce the notations, x+ = x∨ 0 and x− = (−x)∨ 0. Then x = x+ −x− while |x| = x+ + x−. Define the non-negative functions f+, f− : Ω → R+ by f+(ω) = (f (ω))+ and f−(ω) = (f (ω))−. Observe that f+(ω) − f−(ω) = f (ω) while f+(ω) + f−(ω) = |f (ω)|, for all ω ∈ Ω.
Example 38. Let Ω = {a, b, c, d} and let f (a) = 1, f (b) = − 1 , f (c) = − 3 , f (4) = − 0. 3. Then, f+(a) = 1 and f+(b) = f+(c) = f+(d) = 0 while f−(1) = 0 and f−(b) = 1, f−(c) = 3, f−(d) = 0. 3.
∑Since^ f+^ and^ f−^ are non-negative functions, we know how to define their sums.^ Let^ S+^ = ω f+(ω)^ and^ S−^ =^
ω f−(ω). Recall that one or both of^ S+, S−^ could be equal to^ +∞, in which case we say that ∑ ω f (ω) does not converge absolutely and do not assign it any value. If both S+ and S− are finite, then we define ∑ ω f (ω) = S+ − S−. In this case we say that ∑^ f converges absolutely. This completes our definition of absolutely convergent sums. A few exercises to show that when working with absolutely convergent sums, the usual rules of addition remain valid. For example, we can add the numbers in any order.
Exercise 39. Show that ∑ ω f (ω) converges absolutely if and only if ∑ ω |f (ω)| is finite (since |f (ω)| is a non-negative function, this latter sum is always defined, and may equal +∞).
For non-negative f , we can find the sum by using any particular bijection and then taking limits of partial sums. What about general f?
Exercise 40. Let f : Ω → R. Suppose
ω∈Ω f^ (ω)^ be summable and let the sum be^ S. Then, for any bijection ϕ : N → Ω, we have limn→∞(f (ϕ(1)) +... + f (ϕ(n))) = S. Conversely, if limn→∞(f (ϕ(1)) +... + f (ϕ(n))) exists and is the same finite number for any bijection ϕ : N → R, then f must be absolutely summable and ∑ ω∈Ω f (ω) is equal to this common limit.
The usual properties of summation without which life would not be worth living, are still valid.
Exercise 41. Let f, g : Ω → R+ and a, b ∈ R. If ∑^ f and ∑^ g converge absolutely, then ∑(af + bg) converges absolutely and ∑(af + bg) = a ∑^ f + b ∑^ g. Further, if f (ω) ≤ g(ω) for all ω ∈ Ω, then ∑ (^) f ≤ ∑ (^) g.
Example 42. This example will illustrate why we refuse to assign a value to ∑ ω f (ω) in some cases. Let Ω = Z and define f (0) = 0 and f (n) = 1/n for n 6 = 0. At first one may like to say that ∑ n∈Z f (n) = 0, since we can cancel f (n) and f (−n) for each n. However, following our definitions
f+(n) =
n^1 if^ n^ ≥^1 0 if n ≤ 0 ,
f−(n) =
n^1 if^ n^ ≤ −^1 0 if n ≥ 0.
Hence ∑ S+ and S− are both +∞ which means our definition does not assign any value to the sum
ω f^ (ω). Indeed, by ordering the numbers appropriately, we can get any value we like! For example, here is how to get 10. We know that 1 + 12 +... + (^1) n grows without bound. Just keep adding these positive number till the sum exceeds 10 for the first time. Then start adding the negative numbers − 1 − 12 −... − (^) m^1 till the sum comes below 10. Then add the positive numbers (^) n+1^1 + (^) n+2^1 +... + (^) n^1 ′ till the sum exceeds 10 again, and then negative numbers till the sum falls below 10 again, etc. Using the fact that the individual terms in the series are going to zero, it is easy to see that the partial sums then converge to 10. There is nothing special about 10 , we can get any number we want!
One last remark on why we assumed Ω to be countable.
Remark 43. What if Ω is uncountable? Take any f : Ω → R+. Define the sets An = {ω : f (ω) ≥ 1 /n}. For some n, if An has infinitely many elements, then clearly the only reasonable value that we can assign to ∑^ f (ω) is +∞ (since the sum over elements of An itself is larger than any finite number). Therefore, for ∑^ f (ω) to be a finite number it is essential that An is a finite set for each set.