Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Basic Probability Cheat Sheet, Cheat Sheet of Probability and Statistics

Basic Probability Cheat Sheet: Probability and Expectation, Asymptotic Theory, Miscellaneous

Typology: Cheat Sheet

2019/2020

Uploaded on 10/09/2020

borich
borich 🇬🇧

4.3

(26)

293 documents

1 / 2

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Basic Probability Cheat Sheet
September 20, 2018
1 Probability and Expectation
1.1 Bayes Rule
Bayes rule:
p(θ|X) = p(X|θ)p(θ)
Rp(X|θ)p(θ)dθ=p(X|θ)p(θ)
p(X)
If Xrepresents data and θis an unknown quantity of interest, the Bayes rule
can be interpreted as making inference about θbased on the data X(Bayesian
inference) in the form of the posterior distribution p(θ|X).
Remark. In the machine learning course, you will encounter the words ’learning’
and ’inference’. From a Bayesian point of view, there’s no difference between
those two (because everything is expressed by posteriors) . But machine learning
people tend to use ’learning’ as tuning parameters of a model using data and
’inference’ as computing some quantity with the model (sometimes this includes
evaluating a posterior distribution). This distinction is not exhaustive but may
be good to know to avoid confusion.
1.2 Some Useful Formulas of Conditional Expectations
E[X] = EYEX|Y[X|Y]
Var[X] = VarYEX|Y[X|Y]+EYVarX|Y[X|Y]
2 Asymptotic Theory
Theorem. The Law of Large Numbers Let X1,X2. . . , be independent iden-
tically distributed (i.i.d.) real random variables. Let Sn=1
nPn
i=1 Xiand
µ=EX1. If E|X1|<, then Snµas n 1.
1To be precise, we need to define convergence of random variables.
1
pf2

Partial preview of the text

Download Basic Probability Cheat Sheet and more Cheat Sheet Probability and Statistics in PDF only on Docsity!

Basic Probability Cheat Sheet

September 20, 2018

1 Probability and Expectation

1.1 Bayes Rule

Bayes rule:

p(θ|X) =

p(X|θ)p(θ) ∫ p(X|θ)p(θ)dθ

p(X|θ)p(θ) p(X)

If X represents data and θ is an unknown quantity of interest, the Bayes rule can be interpreted as making inference about θ based on the data X (Bayesian inference) in the form of the posterior distribution p(θ|X).

Remark. In the machine learning course, you will encounter the words ’learning’ and ’inference’. From a Bayesian point of view, there’s no difference between those two (because everything is expressed by posteriors). But machine learning people tend to use ’learning’ as tuning parameters of a model using data and ’inference’ as computing some quantity with the model (sometimes this includes evaluating a posterior distribution). This distinction is not exhaustive but may be good to know to avoid confusion.

1.2 Some Useful Formulas of Conditional Expectations

• E[X] = EY

[

EX|Y [X|Y ]

]

  • Var[X] = VarY

[

EX|Y[X|Y]

]

+ EY

[

VarX|Y[X|Y]

]

2 Asymptotic Theory

Theorem. The Law of Large Numbers Let X 1 ,X 2... , be independent iden- tically distributed (i.i.d.) real random variables. Let Sn = (^1) n

∑n i=1 Xi^ and μ = EX 1. If E |X 1 | < ∞ , then Sn → μ as n → ∞^1. (^1) To be precise, we need to define convergence of random variables.

Theorem. The Central Limit Theorem Let X 1 ,X 2... , be as above. Let σ^2 = Var[X 1 ]. Under a (stronger) assumption EX 12 < ∞, the probability distri- bution of

n (Sn σ− μ)converges to the standard normal distribution N (0, 1)^2.

3 Miscellaneous

  • Linearity. Let X obeys a multivariate normal distribution. N (μ, Σ). Then, AX ∼ N

Aμ, AΣA>

, where A is a matrix of appropriate shape.

  • Product of normal densities. Let N (x; μ, σ^2 ) = √ 21 πσ exp

− (x − μ)^2 /(2σ^2 )

then

N (x; μ 1 , σ 12 )N (x; μ 2 , σ 22 ) = N (μ 1 ; μ 2 ,σ 12 + σ^22 )N

x;

μ 1 σ 12 +^

μ 2 σ^22 1 σ^21 +^

1 σ 22 ,

σ^21 σ^22 σ 12 + σ^22

(^2) Assume σ = 1 for simplicity. In contrast with the law of large numbers, what the central limit theorem says is that if you multiply the error of the estimate of the mean Sn − μ by √n, the distribution of the amplified error √ n(Sn − μ) is a Gaussian N (0, 1) for sufficiently large n. If you don’t, the error converges to a point (zero) as the variance tends to 0 , which agrees with the law of large numbers.