Procedure for Identifying and Handling Outliers in Data Sets | Slides Statistics

Procedure for Dealing With Outliers

Table of Critical Values

(1% significance value)

# Observations Critical Value

7 2.10

8 2.22

9 2.32

10 2.41

11 2.48

12 2.55

13 2.61

14 2.66

An outlier is defined as an observation or "data point" which does not appear to fall within

the expected distribution for a particular data set. Outliers may

be rejected outright if they are caused by a known or

demonstrated physical reason, such as sample spillage,

contamination, mechanical failure, or improper calibration. Data

points which appear to deviate from the expected sample

distribution for no known physical reason must be verified as

outliers using statistical criteria.

Outliers can significantly alter the outcome of a method

detection limit calculation. Including outliers in an MDL

calculation leads to increased variability (larger standard

deviation). An MDL calculated using outliers will be inaccurate

and higher than the true detection limit. For this reason, it is

important to recognize outliers, and to reject them from the

calculation. Since the procedure requires at least seven

replicates, rejecting one of only seven sample results will result

in too few data points to calculate an MDL.

For the MDL procedure, all data sets will only be samples of the true population, and both

the population mean (µ) and the population standard deviation (σ) will be unknown. The

expected distribution for MDL observations is most closely represented by a log-normal

distribution, and only one-sided outliers should be expected. Due to the nature of the MDL

procedure (low-level precision), most outliers will be high-sided, and the only test necessary

will be a single-sided outlier test. A low-sided outlier could occur, but the data would be

unusable because it would most often appear as a "no detect".

One method for determining single sided outliers when both the population mean (µ) and the

population standard deviation (σ) are unknown was described by Grubbs (F.E. Grubbs 1979)

and is included in Standard Methods.

Tn= Xn-Xave/s (high sided outliers)

T1= Xave-X1/s (low sided outliers)

Where Xn (X1) is the data point in question, Xave is the sample mean, and s is the sample

standard deviation. The value Tn is then compared against a table of critical values. If Tn is

greater than the critical value for the appropriate number of replicates at the 1% significance

level, the questionable data point is an outlier, and it may be rejected. The critical values for

various numbers of replicates at the 1% significance level are given in the sidebar.

Example 1: The following results were obtained for an MDL study: [10.2, 9.5, 10.1, 10.3,

9.8, 9.9, 11.9, 10.0] with Xave= 10.2 and s= 0.726. The analyst suspects 11.9 to be an

outlier. Using the high-sided test:

Procedure for Identifying and Handling Outliers in Data Sets, Slides of Statistics

Related documents

Partial preview of the text

Download Procedure for Identifying and Handling Outliers in Data Sets and more Slides Statistics in PDF only on Docsity!