Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Anomaly Detection in R: Identifying Point and Collective Anomalies, Lecture notes of Statistics

This document, 'Anomaly Detection in R' by Alastair Rushworth on DataCamp, explains the concept of anomalies in data and provides methods for detecting point and collective anomalies using R. the definition of anomalies, point anomalies, collective anomalies, visualizing anomalies with boxplots, Grubbs' test for detecting outliers, and the Seasonal-Hybrid Extreme Studentized Deviate (ESD) algorithm for detecting anomalies in seasonal time series data.

Typology: Lecture notes

2021/2022

Uploaded on 09/12/2022

kalia
kalia 🇺🇸

4

(7)

239 documents

1 / 21

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
DataCamp Anomaly Detection in R
What is an anomaly?
ANOMALY DETECTION IN R
Alastair Rushworth
Data Scientist
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15

Partial preview of the text

Download Anomaly Detection in R: Identifying Point and Collective Anomalies and more Lecture notes Statistics in PDF only on Docsity!

What is an anomaly?

ANOMALY DETECTION IN R

Alastair Rushworth

Data Scientist

Defining the term anomaly

Anomaly: a data point or collection of data points that do

not follow the same pattern or have the same structure as

the rest of the data

Visualizing point anomalies with a boxplot

boxplot(temperature, ylab = "Celsius")

Collective anomaly

An anomalous collection of data instances

Unusual when considered together

Example: 10 consecutive high daily temperatures

Testing the extremes with

Grubbs' test

ANOMALY DETECTION IN R

Alastair Rushworth

Data Scientist

Visual assessment is not always reliable!

boxplot(temperature, ylab = "Celsius")

Checking normality with a histogram

Symmetrical & bell shaped?

hist(temperature, breaks = 6)

Running Grubbs' test

Use the grubbs.test() function:

grubbs.test(temperature)

Grubbs test for one outlier data: temp G = 3.07610, U = 0.41065, p-value = 0. alternative hypothesis: highest value 30 is an outlier

Get the row index of an outlier

Location of the maximum

Location of the minimum

which.max(weights)

[1] 5

which.min(temperature)

[1] 12

Let's practice!

ANOMALY DETECTION IN R

Monthly revenue data

Grubbs' test not appropriate here

Seasonality may be present

May be multiple anomalies

head(msales)

sales month 1 6.068 1 2 5.966 2 3 6.133 3 4 6.230 4 5 6.407 5 6 6.433 6

Visualizing monthly revenue

plot(sales ~ month, data = msales, type = 'o')

Seasonal-Hybrid ESD algorithm output

sales_ad <- AnomalyDetectionVec(x = msales$sales, period = 12, direction = 'both')

sales_ad$anoms

index anoms 1 14 1. 2 108 2.

Seasonal-Hybrid ESD algorithm plot

AnomalyDetectionVec(x = msales$sales, period = 12, direction = 'both', plot = T)