Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Boxplots, Interquartile Range, and Outliers: Analyzing MLB Team Payrolls, Study Guides, Projects, Research of Statistics

How to calculate the five-number summary, interquartile range, and outliers from a dataset using the example of mlb team payrolls in millions. It provides the steps to determine the minimum, first quartile, median, third quartile, and maximum values, as well as the iqr and outlier boundaries.

Typology: Study Guides, Projects, Research

2021/2022

Uploaded on 09/12/2022

theeconomist1
theeconomist1 🇺🇸

4.1

(30)

245 documents

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
pf3

Partial preview of the text

Download Boxplots, Interquartile Range, and Outliers: Analyzing MLB Team Payrolls and more Study Guides, Projects, Research Statistics in PDF only on Docsity!

Boxplots, Interquartile Range, and Outliers

Boxplots provide a visual representation of a data set that can be used to determine

whether the data set is symmetric or skewed. Constructing a boxplot requires calculation of

the “5 number summary”, the interquartile range (IQR), and the presence of any outliers.

5 Number Summary – The 5 number summary for a data set includes the following, which are listed in

order from smallest to largest –

IQR - The Interquartile Range is a measure of spread used to calculate the lower and upper outlier

boundaries. These boundaries are then used to determine whether a data set has any actual outliers.

Outliers - Outliers are data points that are considerably smaller or larger than most of the other values

in a data set. Data values that are smaller than the lower outlier boundary or larger than the upper outlier

boundary are outliers. Some data sets do not have any outliers. Outliers that are determined to be the

result of an error should be removed from the data set.

Example – For the following data set ( 2012 data for MLB team payrolls in millions) , find a) the 5 number

summary, b) the IQR, c) the upper and lower outlier boundaries, and d) any outliers. Note – data should be

sorted from lowest to highest if it is not provided that way. This allows the easy identification of the min,

max, median, and individual data positions within the set.

Team Payroll Team Payroll Team Payroll Team Payroll

1 Padres 55 9 Rockies 78 17 Mets 93 25 Rangers 121

2 Athletics 55 10 Indians 78 18 Twins 94 26 Tigers 132

3 Astros 61 11 Nationals 81 19 Dodgers 95 27 Angels 154

4 Royals 61 12 Orioles 81 20 W Sox 97 28 Red Sox 173

5 Pirates 63 13 Mariners 82 21 Brewers 98 29 Phillies 175

6 Rays 64 14 Reds 82 22 Cardinals 110 30 Yankees 198

7 D Backs 74 15 Braves 83 23 Giants 118

8 Blue Jays 75 16 Cubs 88 24 Marlins 118

1. Minimum - The smallest value in the data set. 2. First Quartile - Separates the lowest 25% of the data in a set from the highest 75%. It is

typically denoted as 𝑸

𝟏

25

100

1

3. Median – The middle value in a sorted (smallest to largest) data set. If there is an even

number of values, it is calculated by averaging the two middle values. The Median is also

referred to as the Second Quartile (𝑸

𝟐

) because it separates the lower 50% of data in a set

from the upper 50%.

4. Third Quartile - Separates the lowest 75% of the data in a set from the highest 25%. It is

typically denoted as 𝑸

𝟑

75

100

3

5. Maximum – The largest value in the data set.

𝑰𝒏𝒕𝒆𝒓𝒒𝒖𝒂𝒓𝒕𝒊𝒍𝒆 𝑹𝒂𝒏𝒈𝒆 (𝐼𝑄𝑅) = 𝑄

3

− 𝑄

1

𝑳𝒐𝒘𝒆𝒓 𝑂𝑢𝑡𝑙𝑖𝑒𝑟 𝐵𝑜𝑢𝑛𝑑𝑎𝑟𝑦 = 𝑄

1

− 1. 5 𝐼𝑄𝑅

𝑼𝒑𝒑𝒆𝒓 𝑂𝑢𝑡𝑙𝑖𝑒𝑟 𝐵𝑜𝑢𝑛𝑑𝑎𝑟𝑦 = 𝑄

3

    1. 5 𝐼𝑄𝑅

a) 5 Number Summary – These values can be calculated by hand (shown below) OR they can be found

using the “1-Var Stats” button from the Stat Menu on a TI-83 or TI-84 calculator.

b) IQR  𝐼𝑄𝑅 = 𝑄

3

1

c) Upper and Lower Outlier Boundaries

1

3

d) Outliers – Lower Outliers  None (There are no individual data points smaller than the lower

boundary of 10.5.)

Upper Outliers  198 (Yankees) (This data value is bigger than the upper

boundary of 182.5.)

Constructing a Box Plot – Construct a Boxplot for the data set in the previous example. Determine

whether the data set is symmetric or skewed.

x

55 65 75

85.

95

105 115 118 125 135 145 155 165 175 185 195

MLB Team Payrolls (in millions)

Minimum 𝑷𝒐𝒔𝒊𝒕𝒊𝒐𝒏 𝑸

𝟏

=

25

100

( 30 ) Median 𝑷𝒐𝒔𝒊𝒕𝒊𝒐𝒏 𝑸

𝟑

=

75

100

( 30 ) Maximum

55 = 7.5  8

th

Position =

83 + 88

2

= 22.5  23

rd

Position 198

= 75 = 85.5 = 118

Represents 25

th

percentile

of data

points in set

Represents 75

th

percentile

of data

points in set

Average of 2

middle data

points in set

𝑄

1

Median 𝑄

3

Draw the whisker out to the

smallest data value that is larger

than the lower boundary

Draw the whisker out to the

largest data value that is smaller

than the upper boundary

Mark outliers with an “x”

This data set is

Skewed RIGHT

If the “position” calculation results in a decimal, round up to the next whole number to determine the position.

If the calculation results in a whole number, average that position’s data value with the next data value