



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
The IQR method for identifying outliers in a dataset. It discusses how to calculate the fences for potential and extreme outliers using the interquartile range (IQR) and the first (Q1) and third (Q3) quartiles. The document also covers the significance of outliers and their potential impact on data analysis.
Typology: Study notes
1 / 7
This page cannot be seen from the preview
Don't miss anything!
We will look at two ways to identify outliers. The first of which we will discuss now. This method, which we will call the IQR method, is based upon the five‐number summary, in particular it uses the IQR and the 1 st^ and 3 rd^ quartiles, Q1 and Q3.
To locate potential or suspected outliers, we need to calculate two values, sometimes called “fences” These values are not necessarily data points but simply provide a range, where values falling outside the interval are possible outliers. The two values are calculated by going beyond Q1 and Q3 by 1.5 times the IQR. In other words we take Q1 minus 1.5 times the IQR and Q3 plus 1.5 times the IQR Any observation falling outside those values (more toward the extremes) is a potential outlier.
If an outlier was produced by essentially the same process as the rest of data, and if such extremes are expected to eventually occur again , then the outlier contains something important and interesting about the process, and it should be kept in the data
If the outlier was produced under different conditions from the rest of the data (or by a different process), the outlier can be removed from the data if your goal is to investigate only the process that produced the rest of the data
Identifying outliers is an important component of describing the distribution of one quantitative variable. Next, we will see that this process of identifying outliers is used when creating boxplots.