Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Summary of the Basic Probability Theory in Statistics | MATH 218, Study notes of Mathematics

Clark University Mathematics

Prof. David E. Joyce

Material Type: Notes; Professor: Joyce; Class: TOPICS IN STATISTICS; Subject: Mathematics; University: Clark University; Term: Spring 2008;

Typology: Study notes

Pre 2010

Uploaded on 08/07/2009

koofers-user-8gj 🇺🇸

5

(1)

9 documents

1 / 3

This page cannot be seen from the preview

Don't miss anything!

Summary of basic probability theory, part 2

D. Joyce, Clark University

Math 218, Mathematical Statistics, Jan 2008

Expectation. The expected value E(X), also

called the expectation or mean µX, of a random

variable Xis defined differently for the discrete and

continuous cases.

For a discrete random variable, it is a weighted

average defined in terms of the probability mass

function fas

E(X) = µX=X

x

xf(x).

For a continuous random variable, it is defined in

terms of the probability density function fas

E(X) = µX=Z∞

−∞

xf(x)dx.

There is a physical interpretation where this

mean is interpreted as a center of gravity.

Expectation is a linear operator. That means

that the expectation of a sum or difference is the

difference of the expectations

E(X+Y) = E(X) + E(Y),

and that’s true whether or not Xand Yare inde-

pendent, and also

E(cX) = c E (X)

where cis any constant. From these two properties

it follows that

E(X−Y) = E(X)−E(Y),

and, more generally, expectation preserves linear

combinations

E(

n

X

i=1

ciXi) =

n

X

i=1

ciE(Xi).

Furthermore, when Xand Yare independent,

then E(XY ) = E(X)E(Y), but that equation

doesn’t usually hold when Xand Yare not inde-

pendent.

Variance and standard deviation. The vari-

ance of a random variable Xis defined as

Var(X) = σ2

X=E((X−µX)2) = E(X2)−µ2

X

where the last equality is provable. Standard devia-

tion, σ, is defined as the square root of the variance.

Here are a couple of properties of variance. First,

if you multiply a random variable Xby a constant

cto get cX, the variance changes by a factor of the

square of c, that is

Var(cX) = c2Var(X).

That’s the main reason why we take the square

root of variance to normalize it—the standard de-

viation of cX is ctimes the standard deviation of

X. Also, variance is translation invariant, that is,

if you add a constant to a random variable, the

variance doesn’t change:

Var(X+c) = Var(X).

In general, the variance of the sum of two random

variables is not the sum of the variances of the two

random variables. But it is when the two random

variables are independent.

Moments, central moments, skewness, and

kurtosis. The kth moment of a random variable

Xis defined as µk=E(Xk). Thus, the mean is

the first moment, µ=µ1, and the variance can

1

Partial preview of the text

Download Summary of the Basic Probability Theory in Statistics | MATH 218 and more Study notes Mathematics in PDF only on Docsity!

Summary of basic probability theory, part 2

D. Joyce, Clark University

Math 218, Mathematical Statistics, Jan 2008

Expectation. The expected value E(X), also called the expectation or mean μX , of a random variable X is defined differently for the discrete and continuous cases. For a discrete random variable, it is a weighted average defined in terms of the probability mass function f as

E(X) = μX =

∑ x

xf (x).

For a continuous random variable, it is defined in terms of the probability density function f as

E(X) = μX =

∫ (^) ∞

−∞

xf (x) dx.

There is a physical interpretation where this mean is interpreted as a center of gravity. Expectation is a linear operator. That means that the expectation of a sum or difference is the difference of the expectations

E(X + Y ) = E(X) + E(Y ),

and that’s true whether or not X and Y are inde- pendent, and also

E(cX) = c E(X)

where c is any constant. From these two properties it follows that

E(X − Y ) = E(X) − E(Y ),

and, more generally, expectation preserves linear combinations

E(

∑^ n

i=

ciXi) =

∑^ n

i=

ciE(Xi).

Furthermore, when X and Y are independent, then E(XY ) = E(X) E(Y ), but that equation doesn’t usually hold when X and Y are not inde- pendent. Variance and standard deviation. The vari- ance of a random variable X is defined as

Var(X) = σ^2 X = E((X − μX )^2 ) = E(X^2 ) − μ^2 X

where the last equality is provable. Standard devia- tion, σ, is defined as the square root of the variance. Here are a couple of properties of variance. First, if you multiply a random variable X by a constant c to get cX, the variance changes by a factor of the square of c, that is

Var(cX) = c^2 Var(X).

That’s the main reason why we take the square root of variance to normalize it—the standard de- viation of cX is c times the standard deviation of X. Also, variance is translation invariant, that is, if you add a constant to a random variable, the variance doesn’t change:

Var(X + c) = Var(X).

In general, the variance of the sum of two random variables is not the sum of the variances of the two random variables. But it is when the two random variables are independent. Moments, central moments, skewness, and kurtosis. The kth^ moment of a random variable X is defined as μk = E(Xk). Thus, the mean is the first moment, μ = μ 1 , and the variance can

be found from the first and second moments, σ^2 = μ 2 − μ^21. The kth^ central moment is defined as E((X −μ)k. Thus, the variance is the second central moment. A third central moment of the standardized ran- dom variable X∗^ = (X − μ)/σ,

β 3 = E((X∗)^3 ) =

E((X − μ)^3 ) σ^3

is called the skewness of X. A distribution that’s symmetric about its mean has 0 skewness. (In fact all the odd central moments are 0 for a symmetric distribution.) But if it has a long tail to the right and a short one to the left, then it has a positive skewness, and a negative skewness in the opposite situation. A fourth central moment of X∗,

β 4 = E((X∗)^4 ) =

E((X − μ)^4 ) σ^4

is callled kurtosis. A fairly flat distribution with long tails has a high kurtosis, while a short tailed distribution has a low kurtosis. A bimodal distribu- tion has a very high kurtosis. A normal distribution has a kurtosis of 3. (The word kurtosis was made up in the early 19th century from the Greek word for curvature.)

The moment generating function. There is a clever way of organizing all the moments into one mathematical object, and that object is called the moment generating function. It’s a function m(t) of a new variable t defined by

m(t) = E(etX^ ).

Since the exponential function et^ has the power se- ries

et^ =

∑^ ∞

k=

tk k!

= 1 + t +

t^2 2!

tk k!

we can rewrite m(t) as follows

m(t) = E(etX^ )1 + μ 1 t +

μ 2 2!

t^2 + · · · +

μk k!

tk^ + · · ·.

That implies that m(k)(0), the kth^ derivative of m(t) evaluated at t = 0, equals the kth^ moment μk of X. In other words, the moment generating function generates the moments of X by differentiation. For discrete distributions, we can also compute the moment generating function directly in terms of the probability mass function f (x) = P (X=x) as m(t) = E(etX^ ) =

∑ x

etxp(x).

For continuous distributions, the moment generat- ing function can be expressed in terms of the prob- ability density function f as

m(t) = E(etX^ ) =

∫ (^) ∞

−∞

etxfX (x) dx.

The moment generating function enjoys the fol- lowing properties. Translation. If Y = X + a, then

mY (t) = etamX (t).

Scaling. If Y = bx, then

mY (t) = mX (bt).

Standardizing. From the last two properties, if

X∗^ =

X − μ σ is the standardized random variable for X, then

mX∗ (t) = e−μt/σmX (t/σ).

Convolution. If X and Y are independent vari- ables, and Z = X + Y , then

mZ (t) = mX (t) mY (t).

The primary use of moment generating functions is to develop the theory of probability. For instance, the easiest way to prove the central limit theorem is to use moment generating functions. The median, quartiles, quantiles, and per- centiles. The median of a distribution X, some- times denoted ˜μ, is the value such that P (X ≤ μ˜) =

Summary of the Basic Probability Theory in Statistics | MATH 218, Study notes of Mathematics

Related documents

Partial preview of the text

Download Summary of the Basic Probability Theory in Statistics | MATH 218 and more Study notes Mathematics in PDF only on Docsity!

Summary of basic probability theory, part 2

D. Joyce, Clark University

Math 218, Mathematical Statistics, Jan 2008

X∗^ =