



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
How to compute the standard deviation of sample means using two methods: with known and unknown standard deviation of individual values within a sample. examples and formulas for each method, as well as applications to real-life scenarios.
What you will learn
Typology: Lecture notes
1 / 5
This page cannot be seen from the preview
Don't miss anything!
Quality control charts are based on sample means not on individual values within a sample. A sample is a group of items, which are considered all together for our analysis. Items within a sample lose their individual characteristics in the analysis. Rather a summary statistic, e.g. sample mean, is used to represent the information in the sample. See the examples of samples below:
The number of samples and the sample size can potentially be confusing. Sample size is the number of items within a group. Number of samples is the number of groups.
Example 1: After a midterm exam for a course that is given to five sections of a course, the average exam grade ¯xj in section j is computed and reported below.
Sec 1 Sec 2 Sec 3 Sec 4 Sec 5 Average grade 68 72 74 82 71
Suppose that there are 50 students in each section and use xi,j to denote the ith student’s grade in Sec j. Then the average grades are computed by
¯xj =
i=1 xi,j 50
for j ∈ { 1 , 2 , 3 , 4 , 5 }.
Since all 50 grades within a section are reduced to a single summary statistic (the sample mean), all the students within a section are represented merely by the section’s summary statistic (the sample mean); Individual student grades are immaterial for an analysis that checks if a certain sectin is performing better than the others. Clearly, the sample size is 50 and the number of samples is 5.
There are two ways to compute the standard deviation σx¯ of sample means. The first way requires the knowledge of the standard deviation σx of the individual values within a sample, the second way does not require σx.
In order to understand what we have and what we want, first recall that
V ar(X) = σ x^2 and V ar( X¯) = σx^2 ¯.
Note that V ar(X) is known and we want to compute V ar( X¯). In order to perform this computation, we need to recall the following proposition from statistics:
Proposition 1. i) If X is a random variable and c is a constant, then V ar(c · X) = c^2 · V ar(X). ii) If X 1 and X 2 are two independent random variables, then V ar(X 1 + X 2 ) = V ar(X 1 ) + V ar(X 2 ).
Proof: i) First convince yourself that the mean of cX would be c¯x where ¯x is the mean of X. We start with V ar(c · X) and use the definition of variance
V ar(cX) =
n
∑^ n
i=
(cxi − c¯x)^2 = c^2
n
∑^ n
i=
(xi − ¯x)^2 = c^2 V ar(X).
ii) Again by using the definition
V ar(X 1 + X 2 ) =
n
∑^ n
i=
(x 1 ,i + x 2 ,i − x¯ 1 − x¯ 2 )^2
n
∑^ n
i=
{(x 1 ,i − x¯ 1 )^2 + (x 2 ,i − x¯ 2 )^2 + 2(x 1 ,i − x¯ 1 )(x 2 ,i − x¯ 2 )}
n
∑^ n
i=
(x 1 ,i − x¯ 1 )^2 +
n
∑^ n
i=
(x 2 ,i − x¯ 2 )^2 + 2
n
∑^ n
i=
(x 1 ,i − x¯ 1 )(x 2 ,i − x¯ 2 )
n
∑^ n
i=
(x 1 ,i − x¯ 1 )^2 +
n
∑^ n
i=
(x 2 ,i − x¯ 2 )^2 + 0
= V ar(X 1 ) + V ar(X 2 )
The fourth equality is due to the fact that X 1 and X 2 are independent so the sum of the cross products is zero. This sum would be the covariance of X 1 and X 2 , if X 1 and X 2 were not independent.
Now Proposition 1 can be used to relate the variance of the sample mean to the variance of the observation within the samples. We start with the definition ofthe sample mean, proceed as follows
V ar( X¯) = V ar
n
∑^ n
i=
Xi
n
V ar
( (^) n ∑
i=
Xi
n
) (^2) ∑n
i=
V ar(Xi)
n n^2 V ar(X)
=
n V ar(X) (1)
where we use the fact that each individual observation has the same variance as the other individuals: V ar(X 1 ) = V ar(X 2 ) = V ar(Xi) = V ar(X) where X stands for a generic observation and represents one of X 1 , X 2 ,... Xn. This fact is assumed when constructing samples; otherwise, we would be grouping “apples” with “oranges”. Given (1) which relates varainces, relating the standard deviations is easy. Just take the square root of the both sides in (1) to arrive at
σ¯x =
n
σx. (2)
a) If UTD decides to accept all applicants whose GMAT score is above 620, on average how many people will be accepted per year? b) If UTD decides to accept 50 students with highest GMAT scores every year, what should be the cut off GMAT score (lowest score among the 50 accepted students).
Jan Feb Mar Apr May Jun Jul Aug y¯Jan y¯F eb y¯M ar y¯Apr y¯M ay y¯Jun ¯yJul y¯Aug Average # of passengers/day 15000 14000 12600 13300 14700 14100 16800 17500 ¯zJan ¯zF eb z¯M ar ¯zApr z¯M ay z¯Jun z¯Jul ¯zAug Average # of searched passengers/day 47 53 61 41 42 44 51 43
The average number of passengers per day is computed as follows. Let yi,j be the number of the passengers on the ith day of month j. The average number of passengers per day for month j is ¯yj defined as
y¯j =
i=1 yi,j 30 for j ∈ {Jan, F eb, M ar, Apr, M ay, Jun, Jul, Aug}.
The average number of passengers searched per day is computed similarly. Let zi,j be the number of the passengers searched on the ith day of month j. The average number of passengers searched per day for month j is ¯zj defined as
z¯j =
i=1 zi,j 30 for j ∈ {Jan, F eb, M ar, Apr, M ay, Jun, Jul, Aug}.
a) What is the sample size n for computing averages in the table? b) Suppose that the standard deviation of the number of passengers (yi,j ) flying out of DFW every day is 3000, what is the standard deviation of the average number of passengers (¯yj ) flying out of DFW per day? c) Assuming a Normal distribution for the number of passengers, how many sigmas (σ) will give you a Type I error of 20% for an ¯x-chart on the average number of passengers flying out of DFW per day?
per day. b) Is the process in control during the first eight months? Explain.
Sep Oct Average number of passengers/day 9100 6200 Average number of searched passengers/day 57 63
Using c- and p-control charts obtained in questions 6 and 7 and the recent numbers above determine if a) The number of passengers searched per day is in control? b) The proportion of passengers searched per day is in control? c) How can you reconcile your answers if you say “yes” to either a) or b) above, and “no” to the other?