










Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
An introduction to sampling distributions, focusing on the distinction between parameters and statistics, defining sampling distributions, determining unbiased estimators, and the relationship between sample size and variability. It includes examples and activities to help understand these concepts.
What you will learn
Typology: Exercises
1 / 18
This page cannot be seen from the preview
Don't miss anything!
The Practice of Statistics, 4th^ edition – For AP STARNES, YATES, MOORE*
+ Chapter 7
Sampling Distributions
7.1 What is a Sampling Distribution?
7.2 Sample Proportions
7.3 Sample Means
What Is a Sampling Distribution? +
Introduction
The process of statistical inference involves using information from a sample to draw conclusions about a wider population.
Different random samples yield different statistics. We need to be able to describe the sampling distribution of possible statistic values in order to perform statistical inference.
We can think of a statistic as a random variable because it takes numerical values that describe the outcomes of the random sampling process. Therefore, we can examine its probability distribution using what we learned in Chapter 6.
+
Parameters and Statistics
What Is a Sampling Distribution?
A parameter is a number that describes some characteristic of the population. In statistical practice, the value of a parameter is usually not known because we cannot examine the entire population.
A statistic is a number that describes some characteristic of a sample. The value of a statistic can be computed directly from the sample data. We often use a statistic to estimate an unknown parameter.
Remember s and p : s tatistics come from s amples and p arameters come from p opulations We write μ (the Greek letter mu) for the population mean and x (" x - bar") for the sample mean. We use p to represent a population proportion. The sample proportion p ˆ ("p - hat") is used to estimate the unknown parameter p.
+
Sampling Variability
What Is a Sampling Distribution?
Sample Sample
Sample
Sample
Sample
Sample
Sample
Sample
Activity : Reaching for Chips
Teacher: Right-click (control-click) on the graph to edit the counts.
What Is a Sampling Distribution?
+
Population Distributions vs. Sampling Distributions
There are actually three distinct distributions involved
when we sample repeatedly and measure a variable of interest.
variable for all the individuals in the population.
the variable for all the individuals in the sample.
from all the possible samples of the same size from the population.
What Is a Sampling Distribution?
+
Describing Sampling Distributions
What Is a Sampling Distribution?
A statistic used to estimate a parameter is an unbiased estimator if the mean of its sampling distribution is equal to the true value of the parameter being estimated.
Center: Biased and unbiased estimators
In the chips example, we collected many samples of size 20 and calculated the sample proportion of red chips. How well does the sample proportion estimate the true proportion of red chips, p = 0.5? Note that the center of the approximate sampling distribution is close to 0.5. In fact, if we took ALL possible samples of size 20 and found the mean of those sample proportions, we’d get exactly 0.5.
+
Alternate Example – Sampling Heights What Is a Sampling Distribution? Suppose that the heights of adult males are approximately Normally distributed with a mean of 70 inches and a standard deviation of 3 inches. To see why sample size matters, we took 1000 SRSs of size 100 and calculated the sample mean height and then took 1000 SRSs of size 1500 and calculated the sample mean height. Here are the results, graphed on the same scale for easy comparisons:
As you can see, the spread of the approximate sampling distributions is much different. When the sample size was larger, the distribution of the sample mean was much less variable. In other words, when the sample size is larger, the sample mean will be closer to the true mean, on average.
+
Describing Sampling Distributions
Bias, variability, and shape^ What Is a Sampling Distribution?
We can think of the true value of the population parameter as the bull’s- eye on a target and of the sample statistic as an arrow fired at the target. Both bias and variability describe what happens when we take many shots at the target.
Bias means that our aim is off and we consistently miss the bull’s-eye in the same direction. Our sample values do not center on the population value.
High variability means that repeated shots are widely scattered on the target. Repeated samples do not give very similar results.
The lesson about center and spread is clear: given a choice of statistics to estimate an unknown parameter, choose one with no or low bias and minimum variability.
+
Alternate Example – More Tanks
Here are 5 methods for estimating the total number of tanks: (1) partition =(2) max = max , (3) MeanMedian = mean + median , (4) SumQuartiles = Q max (5/4), What Is a Sampling Distribution? 1 +^ Q 3 , (5) TwiceIQR = 2 IQR. The graph below shows the approximate sampling distribution for each of these statistics when taking samples of size 4 from a population of 342 tanks. (a) Which of these statistics appear to be biased estimators? Explain. The statistics Max and TwiceIQR appear to be biased estimators because they are consistently too low. That is, the centers of their sampling distributions appear to be below the correct value of 342.
Partition
Max
MeanMedian
SumQuartil...
TwiceIQR 0 100 200 300 400 500 600 700 = 342
Measures from Sample of Collection 1 Dot Plot
(b) Of the unbiased estimators, which is best? Explain. Of the three unbiased statistics, Max is best since it has the least variability. (c) Explain why a biased estimator might be preferred to an unbiased estimator. Even though max is a biased estimator, it often produces estimates very close to the truth. MeanMedian, although unbiased, is quite variable and not close to the true value as often. For example, in 120 of the 250 SRSs, Max produced an estimate within 50 of the true value. However, MeanMedian was this close in only 79 of the 250 SRSs.
+ Section 7.
What Is a Sampling Distribution?
In this section, we learned that…
A parameter is a number that describes a population. To estimate an unknown parameter, use a statistic calculated from a sample.
The population distribution of a variable describes the values of the variable for all individuals in a population. The sampling distribution of a statistic describes the values of the statistic in all possible samples of the same size from the same population.
A statistic can be an unbiased estimator or a biased estimator of a parameter. Bias means that the center (mean) of the sampling distribution is not equal to the true value of the parameter.
The variability of a statistic is described by the spread of its sampling distribution. Larger samples give smaller spread.
When trying to estimate a parameter, choose a statistic with low or no bias and minimum variability. Don’t forget to consider the shape of the sampling distribution before doing inference.