Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Understanding Outliers and Inter-Quartile Range in Statistics, Lecture notes of Statistics

An introduction to the concepts of Inter-Quartile Range (IQR), outliers, and the five-number summary in statistics. It explains how to calculate the IQR and identifies outliers using the 1.5 x IQR rule. The document also discusses the significance of outliers and their potential impact on data analysis.

Typology: Lecture notes

2021/2022

Uploaded on 09/12/2022

tiuw
tiuw 🇺🇸

4.7

(18)

288 documents

1 / 26

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Today:
- Inter-Quartile Range,
- Outliers,
- Boxplots.
Reading for today: Start Chapter 4.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a

Partial preview of the text

Download Understanding Outliers and Inter-Quartile Range in Statistics and more Lecture notes Statistics in PDF only on Docsity!

Today:

  • Inter-Quartile Range,
  • Outliers,
  • Boxplots.

Reading for today: Start Chapter 4.

Quartiles and the Five Number Summary

  • The five numbers are the Minimum (Q0), Lower Quartile (Q1), Median (Q2), Upper Quartile (Q3), and Maximum (Q4).
  • Q1 means bigger than 1 Quarter of the data.
  • Q3 means bigger than 3 Quarters of the data.

For the values {0, 1, 2, 4, 5, 5, 7, 10, 10, 12, 13, 17, 39},

the five number summary is: 0  3  7  12.5 39.

Inter-Quartile Range

  • We also need measures of spread, like the Inter- Quartile Range. (Literally “range the between the quartiles”, called the IQR for short).
  • The Inter-Quartile range is calculated:

IQR = Q3 – Q

  • The size of the IQR indicates how spread out the middle half of the data is.
  • We call these data points outliers.

They (figuratively) lay outside the rest of the

data.

  • Because an outlier stands out from the rest

of the data, it…

o might not belong there, or

o is worthy of extra attention.

  • One way to define an outlier is o anything below Q1 – 1.5 IQR or… o above Q3 + 1.5 IQR. This is called the 1.5 x IQR rule. (Important).
  • Example: {0, 1, 2, 4, 5, 5, 7, 10, 10, 12, 13, 17, 39} Q1 = 3, Q3 = 12. IQR = 9.

Q3 + 1.5xIQR = 12.5 + 1.5*9.

= 12.5 + 14.25 = 26. Anything more than 26.75 is an outlier.

39 is the only outlier.

More on IQR and Outliers:

  • There are other ways to define outliers, but 1.5xIQR is one of the most straightforward.
  • If our range has a natural restriction, (like it can’t possibly be negative), it’s okay for an outlier limit to be beyond that restriction.
  • If a value is more than Q3 + 3IQR or less than Q1 – 3IQR it is sometimes called an extreme outlier.
  • The five-number summary is in the boxplot:
  • The box from 3 to 12.5 is the region between Q and Q3.
  • The line going through the middle of the box at 7 is the median.
  • The lines going out the ends of the box are called the whiskers. They show the range of values that are not outliers.
  • The lower whisker goes to the lowest value, 1. The upper whisker goes to 17 because it’s the biggest value before the upper limit of 26.75 is hit.

Boxplots and Skew

  • Skewed distributions have more extreme values on one side, so a boxplot of a skewed distribution will have one whisker longer than the other.
  • There will also be more outliers on one side of the boxplot than the other.
  • There is some overlap
  • In general men are taller.
  • The variance is about the same.
  • Both distributions appear to be symmetric.

What exactly IS an outlier?

  • It’s a value far from anything else that warrants special consideration aside from the rest of the data.
  • Often it’s a mistake in data entry. If were recording a grade of 73%, mistyped, and recorded 3% or 730%, both of these values would be far from the rest of the data and would indicate that the data is not being represented properly.