
























































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
The concept of measures of central tendency and provides examples of calculating the mean, median, and mode using data sets. It also discusses the importance of these measures in understanding statistical data and distinguishes between population and sample means.
Typology: Study notes
1 / 64
This page cannot be seen from the preview
Don't miss anything!
When thinking about questions such as: “how many calories do I eat per day?” or “how much time do I spend talking per day?, we quickly realize that the answer will vary from day to day and often modify our question to something like “how many calories do I consume on a typical day?” or “on average, how much time do I spend talking per day?”.
In this section we will study three ways of measuring central tendency in data, the mean, the median and the mode. Each measure give us a single value (the mode may give more than one) that might be considered typical. As we will see however, any one of these values can give us a skewed picture if the given data has certain characteristics.
A sample is a subset of the population, for example, we might collect the data on the number of home runs scored in a random sample of 20 games played by Babe Ruth. If we calculate the mean, median and mode using the data from a sample, the results are called the sample mean, sample median and sample mode.
A sample is a subset of the population, for example, we might collect the data on the number of home runs scored in a random sample of 20 games played by Babe Ruth. If we calculate the mean, median and mode using the data from a sample, the results are called the sample mean, sample median and sample mode. The Mean: The population mean of m numbers x 1 , x 2 ,... , xm (the data for every member of a population of size m) is denoted by μ and is computed as follows:
μ =
x 1 + x 2 + · · · + xm m
The sample mean of the numbers x 1 , x 2 ,... , xn (data for a sample of size n from the population) is denoted by ¯x and is computed similarly:
x¯ =
x 1 + x 2 + · · · + xn n
Example Consider the following set of data, showing the number of times a sample of 5 students check their e-mail per day: 1 , 3 , 5 , 5 , 3.
Here n = 5 and x 1 = 1, x 2 = 3, x 3 = 5, x 4 = 5 and x 5 = 3.
Example Consider the following set of data, showing the number of times a sample of 5 students check their e-mail per day: 1 , 3 , 5 , 5 , 3.
Here n = 5 and x 1 = 1, x 2 = 3, x 3 = 5, x 4 = 5 and x 5 = 3.
Calculate the sample mean ¯x. 1 + 3 + 5 + 5 + 3 5
We can calculate the mean above more efficiently here by using frequencies. We can see from the calculation above that
x¯ =
The frequency distribution for the data is:
Frequency 0 1 0 × 1 1 2 1 × 2 2 8 2 × 8 3 4 3 × 4 4 5 4 × 5 ¯x = Sum 20 = 5020 = 2. 5
The general case can be dealt with as follows: If our frequency/relative frequency table for our sample of size n, looks like the one below, (where the observations are denoted 0i, the corresponding frequencies, fi and the relative frequencies fi/n):
Observation Frequency Relative Frequency (^0) i fi fi/n 01 f 1 f 1 /n 02 f 2 f 2 /n 03 f 3 f 3 /n .. .
(^0) R fR fR/n then,
Alternatively we can use the relative frequencies, instead of dividing by the n at the end.
Outcome Frequency Relative Frequency Outcome × Relative Frequency (^0) i fi fi/n (^0) i × fi/n 01 f 1 f 1 /n 01 × f 1 /n 02 f 2 f 2 /n 02 × f 2 /n 03 f 3 f 3 /n 03 × f 3 /n ... ... ... ... (^0) R .fR fR/n (^0) R × fR/n SUM = ¯x
You can of course choose any method for calculation from the three methods listed above.
Alternatively we can use the relative frequencies, instead of dividing by the n at the end.
Outcome Frequency Relative Frequency Outcome × Relative Frequency (^0) i fi fi/n (^0) i × fi/n 01 f 1 f 1 /n 01 × f 1 /n 02 f 2 f 2 /n 02 × f 2 /n 03 f 3 f 3 /n 03 × f 3 /n ... ... ... ... (^0) R .fR fR/n (^0) R × fR/n SUM = ¯x
You can of course choose any method for calculation from the three methods listed above.The easiest method to use will depend on how the data is presented.
Example The number of goals scored by the 32 teams in the 2014 world cup are shown below: 18 , 15 , 12 , 11 , 10 , 8 , 7 , 7 , 6 , 6 , 6 , 5 , 5 , 5 , 4 , 4 , 4 , 4 , 4 , 4 , 3 , 3 , 3 , 3 , 3 , 2 , 2 , 2 , 2 , 1 , 1 , 1 Make a frequency table for the data and taking the soccer teams who played in the world cup as a population, calculate the population mean, μ.
Outcome Frequency 1 3 2 4 3 5 4 6 5 3 6 3 7 2
Outcome Frequency 8 1 10 1 11 1 12 1 15 1 18 1 μ = 5. 34375
μ =^1 ·^ 3 + 2^ ·^ 4 + 3^ ·^ 5 + 4^ ·^ 6 + 5^ ·^ 3 + 6^ ·^ 3 + 7^ ·^ 2 + 8 32 ·^ 1 + 10^ ·^ 1 + 11^ ·^ 1 + 12^ ·^ 1 + 15^ ·^ 1 + 18^ ·^1
= 3 + 8 + 15 + 24 + 15 + 18 + 14 + 8 + 10 + 11 + 12 + 15 + 18 32 =^17132
Example Approximate the mean for the set of data used to make the following histogram, showing the time (in seconds) spent waiting by a sample of customers at Gringotts Wizarding bank.
250 - 300 300 -^350
2
4
6
8
10
12
50 - 100 100 - 150 150 - 200 200 - 250 Time spent waiting (in seconds)
midpoints:
approximation of sample mean:
midpoints:
Outcome Frequency 75 12 125 10 175 4 225 2 275 1 325 1 Sample size 30