

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Material Type: Notes; Professor: Joyce; Class: TOPICS IN STATISTICS; Subject: Mathematics; University: Clark University; Term: Spring 2008;
Typology: Study notes
1 / 3
This page cannot be seen from the preview
Don't miss anything!
Expectation. The expected value E(X), also called the expectation or mean μX , of a random variable X is defined differently for the discrete and continuous cases. For a discrete random variable, it is a weighted average defined in terms of the probability mass function f as
E(X) = μX =
∑ x
xf (x).
For a continuous random variable, it is defined in terms of the probability density function f as
E(X) = μX =
∫ (^) ∞
−∞
xf (x) dx.
There is a physical interpretation where this mean is interpreted as a center of gravity. Expectation is a linear operator. That means that the expectation of a sum or difference is the difference of the expectations
E(X + Y ) = E(X) + E(Y ),
and that’s true whether or not X and Y are inde- pendent, and also
E(cX) = c E(X)
where c is any constant. From these two properties it follows that
E(X − Y ) = E(X) − E(Y ),
and, more generally, expectation preserves linear combinations
E(
∑^ n
i=
ciXi) =
∑^ n
i=
ciE(Xi).
Furthermore, when X and Y are independent, then E(XY ) = E(X) E(Y ), but that equation doesn’t usually hold when X and Y are not inde- pendent. Variance and standard deviation. The vari- ance of a random variable X is defined as
Var(X) = σ^2 X = E((X − μX )^2 ) = E(X^2 ) − μ^2 X
where the last equality is provable. Standard devia- tion, σ, is defined as the square root of the variance. Here are a couple of properties of variance. First, if you multiply a random variable X by a constant c to get cX, the variance changes by a factor of the square of c, that is
Var(cX) = c^2 Var(X).
That’s the main reason why we take the square root of variance to normalize it—the standard de- viation of cX is c times the standard deviation of X. Also, variance is translation invariant, that is, if you add a constant to a random variable, the variance doesn’t change:
Var(X + c) = Var(X).
In general, the variance of the sum of two random variables is not the sum of the variances of the two random variables. But it is when the two random variables are independent. Moments, central moments, skewness, and kurtosis. The kth^ moment of a random variable X is defined as μk = E(Xk). Thus, the mean is the first moment, μ = μ 1 , and the variance can
be found from the first and second moments, σ^2 = μ 2 − μ^21. The kth^ central moment is defined as E((X −μ)k. Thus, the variance is the second central moment. A third central moment of the standardized ran- dom variable X∗^ = (X − μ)/σ,
β 3 = E((X∗)^3 ) =
E((X − μ)^3 ) σ^3
is called the skewness of X. A distribution that’s symmetric about its mean has 0 skewness. (In fact all the odd central moments are 0 for a symmetric distribution.) But if it has a long tail to the right and a short one to the left, then it has a positive skewness, and a negative skewness in the opposite situation. A fourth central moment of X∗,
β 4 = E((X∗)^4 ) =
E((X − μ)^4 ) σ^4
is callled kurtosis. A fairly flat distribution with long tails has a high kurtosis, while a short tailed distribution has a low kurtosis. A bimodal distribu- tion has a very high kurtosis. A normal distribution has a kurtosis of 3. (The word kurtosis was made up in the early 19th century from the Greek word for curvature.)
The moment generating function. There is a clever way of organizing all the moments into one mathematical object, and that object is called the moment generating function. It’s a function m(t) of a new variable t defined by
m(t) = E(etX^ ).
Since the exponential function et^ has the power se- ries
et^ =
∑^ ∞
k=
tk k!
= 1 + t +
t^2 2!
tk k!
we can rewrite m(t) as follows
m(t) = E(etX^ )1 + μ 1 t +
μ 2 2!
t^2 + · · · +
μk k!
tk^ + · · ·.
That implies that m(k)(0), the kth^ derivative of m(t) evaluated at t = 0, equals the kth^ moment μk of X. In other words, the moment generating function generates the moments of X by differentiation. For discrete distributions, we can also compute the moment generating function directly in terms of the probability mass function f (x) = P (X=x) as m(t) = E(etX^ ) =
∑ x
etxp(x).
For continuous distributions, the moment generat- ing function can be expressed in terms of the prob- ability density function f as
m(t) = E(etX^ ) =
∫ (^) ∞
−∞
etxfX (x) dx.
The moment generating function enjoys the fol- lowing properties. Translation. If Y = X + a, then
mY (t) = etamX (t).
Scaling. If Y = bx, then
mY (t) = mX (bt).
Standardizing. From the last two properties, if
X − μ σ is the standardized random variable for X, then
mX∗ (t) = e−μt/σmX (t/σ).
Convolution. If X and Y are independent vari- ables, and Z = X + Y , then
mZ (t) = mX (t) mY (t).
The primary use of moment generating functions is to develop the theory of probability. For instance, the easiest way to prove the central limit theorem is to use moment generating functions. The median, quartiles, quantiles, and per- centiles. The median of a distribution X, some- times denoted ˜μ, is the value such that P (X ≤ μ˜) =