Association rules for frequent item sets | Exams Data Mining

Association Rules & Frequent Itemsets

All you ever wanted to know about diapers,

beers and their correlation!

Data Mining: Association Rules 2

The Market-Basket Problem

• Given a database of transactions, find rules that will

predict the occurrence of an item based on the

occurrences of other items in the transaction

Market-Basket transactions

TID Items

1 Bread, Milk

Bread, Diaper, Beer, Eggs

Milk, Diaper, Beer, Coke

Bread, Milk, Diaper, Beer

Bread, Milk, Diaper, Coke

Example of Association Rules

{Diaper} →{Beer},

{Milk, Bread} →{Eggs,Coke},

{Beer, Bread} →{Milk}

Implication here means

co-occurrence, not causality!

Data Mining: Association Rules 3

The Market-Basket Problem

Given a database of transactions

where each transaction is a collection of items (purchased by a

customer in a visit)

find all rules that correlate the presence of one set of

items with that of another set of items

Example:

30% of all transactions that contain diapers also contain

beers; 5% of all transactions contain these items

– 30%: confidence of the rule

– 5%: support of the rule

We are interested in

finding

all rules,

rather than

verifying

that a particular rule holds

Data Mining: Association Rules 4

Applications of Market-Basket Analysis

• Supermarkets

– Placement

– Advertising

– Sales

– Coupons

• Many applications outside market basket data analysis

– Prediction (telecom switch failure)

– Web usage mining

• Many different types of association rules

– Temporal

– Spatial

– Causal

Data Mining: Association Rules 5

Definition: Frequent Itemset

•Itemset

– A collection of one or more items

• Example: {Milk, Bread, Diaper}

– k-itemset

• An itemsetthat contains k items

•Support count (σ

σσ

σ)

– Frequency of occurrence of an

itemset

– E.g. σ({Milk, Bread,Diaper}) = 2

•Support

– Fraction of transactions that

contain an itemset

– E.g. s({Milk, Bread, Diaper}) = 2/5

•Frequent Itemset

– An itemset whose support is greater

than or equal to a

minsup

threshold

TID Items

1 Bread, Milk

Bread, Diaper, Beer, Eggs

Milk, Diaper, Beer, Coke

Bread, Milk, Diaper, Beer

Bread, Milk, Diaper, Coke

Data Mining: Association Rules 6

Definition: Association Rule

Example:

Beer}Diaper,Milk{

⇒

4.0

|T|

)

Beer

Diaper,

Milk

(

===

67.0

)Diaper,Milk(

)

Beer

Diaper,

Milk,

(

===

•Association Rule

– An implication expression of

the form X →Y, where X and Y

are itemsets

– Example:

{Milk, Diaper} →{Beer}

•Rule Evaluation Metrics

– Support (s)

• Fraction of transactions that

contain both X and Y

– Confidence (c)

• Measures how often items in Y

appear in transactions that

contain X

TID Items

1 Bread, Milk

Bread, Diaper, Beer, Eggs

Milk, Diaper, Beer, Coke

Bread, Milk, Diaper, Beer

Bread, Milk, Diaper, Coke

Association rules for frequent item sets, Exams of Data Mining

Related documents

Partial preview of the text

Download Association rules for frequent item sets and more Exams Data Mining in PDF only on Docsity!

Association Rules & Frequent Itemsets

All you ever wanted to know about diapers,

beers and their correlation!

The Market-Basket Problem

predict the occurrence of an item based on the

occurrences of other items in the transaction

The Market-Basket Problem

Given a database of transactions

find all rules that correlate the presence of one set of

items with that of another set of items

Example: 30% of all transactions that contain diapers also contain

beers; 5% of all transactions contain these items

We are interested in finding all rules,

rather than verifying that a particular rule holds

Applications of Market-Basket Analysis

Definition: Frequent Itemset

Definition: Association Rule

{Milk ,Diaper}⇒ Beer

|T |

(Milk ,Diaper,Beer )

s

(Milk,Diaper )

( Milk,Diaper,Beer )

c

Aspects of Association Rule Mining

Association Rule Mining Task

association rule mining is to find all rules

having

thresholds

⇒ Computationally prohibitive!

Mining Association Rules

Example of Rules:

Observations:

Finding Association Rules

Two-step approach:

1. Frequent Itemset Generation

2. Rule Generation

computationally expensive

Frequent Itemset Generation

Frequent Itemset Generation

N

M

w

The Apriori Algorithm

return ∪ k Lk;

Apriori Algorithm from Agrawal et al. (1993)

Apriori Algorithm Example (s = 50%)

Database D itemset sup.

Scan D

C 1

C 2

Scan D

C 2

Scan D^ L 3 itemset sup

C 3 itemset

L 1

itemset sup

L 2

Algorithm to Guess Itemsets

Apriori: How to Generate Candidates?

STEP 1: Self-join operation

STEP 2: Subset filtering

How to Count Supports of Candidates?

Example of Generating Candidate Itemsets

Run Time of Apriori

largest candidate itemset

data on disk but multiple in memory

Toivonen 1996 gives a statistical technique which

requires 1 + e passes (but more memory)

Brin 1997 - Dynamic Itemset Counting ⇒⇒⇒⇒ 1 + e

passes (less memory)

Methods to Improve Apriori’s Efficiency

Is Apriori Fast Enough? — Performance Bottlenecks