











Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Detailed informtion about Cluster Analysis, What is Cluster Analysis?, Types of Data in Cluster Analysis, Partitioning Methods, Hierarchical Methods, Density-Based Methods.
Typology: Study notes
1 / 19
This page cannot be seen from the preview
Don't miss anything!
November 25, 2014
Data Mining: Concepts and 1
10.Constraint-Based Clustering
11.Outlier Analysis
12.Summary
November 25, 2014
Data Mining: Concepts and 2
Data Matrix (object-by-variable structure)
n records, each with p attributes
n-by-p matrix structure (two mode)
(^) x ab
th record and b
th attribute
Attributes
np
... x nf
... x n
x
... ... ... ... ...
ip
... x if
... x i
x
... ... ... ... ...
1p
... x 1f
... x 11
x
record n
record i
record 1
November 25, 2014
Data Mining: Concepts and 4
Type of data in clustering analysis
November 25, 2014
Data Mining: Concepts and 5
Height Scale Weight Scale
metre or foot scale
heights as different scale
can be used to express
same absolute
measurement
kilogram or pound
scale
20kg
40kg
60kg 100kg
80kg 120kg
November 25, 2014
Data Mining: Concepts and 7
Similarity and Dissimilarity
Between Objects
similarity or dissimilarity between two data
objects
where i = ( x i
, x i
, …, x ip
) and j = ( x j
, x j
, …, x jp
)
are two p -dimensional data objects, and q is a
positive integer
q
q
p p
q q
j
x
i
x
j
x
i
x
j
x
i
d ( i , j ) (| x | | | ... | | )
1 1 2 2
( , ) | | | | ... | |
1 1 2 2 p jp
x
i
x
j
x
i
x
j
x
i
d i j x
November 25, 2014
Data Mining: Concepts and 8
Similarity and Dissimilarity Between
Objects (Cont.)
2 2
2 2
2
1 1 p jp
x
i
x
j
x
i
x
j
x
i
d i j x
November 25, 2014
Data Mining: Concepts and 10
Dissimilarity between Binary
Variables
to 0
0. 75
1 1 2
1 2
( , )
0. 67
1 1 1
1 1 ( , )
0. 33
2 0 1
0 1 ( , )
d jim mary
d jack jim
d jack mary
November 25, 2014
Data Mining: Concepts and 11
Nominal / Categorical Variables
take more than 2 states, e.g., red, yellow, blue, green
nominal states
p
p m
d i j
( , )
November 25, 2014
Data Mining: Concepts and 13
Nominal/ Categorical Example
attribute) test-1 are available which is
categorical. Since here we have one categorical
variable, test-1, we set p= 1in eq so that d(i,j)
evaluates to 0 if objects I and j match, and 1 if
the objects differ. Thus we get 0
1 0
1 1 0
0 1 1 0
November 25, 2014
Data Mining: Concepts and 14
Ordinal Variables
replacing i -th object in the f -th variable by
interval-scaled variables
1
1
f
if
if M
r
z
{ 1 ,..., } if f
r M
November 25, 2014
Data Mining: Concepts and 16
Ratio-Scaled Variables
nonlinear scale, approximately at exponential
scale, such as Ae
Bt or Ae
-Bt
good choice! (why?—the scale can be distorted)
y if
= log(x if
)
rank as interval-scaled
November 25, 2014
Data Mining: Concepts and 17
Ratio-Scaled Variables Example
available which is ratio-scaled variable. Let’s try a logarithmic
transformation. Take the log of test-3 results in the values 2.65,
1.34, 2.21 and 3.08 for the objects 1 to 4 respectively. Using
Euclidean distance on the transformed values, we obtain the
following dissimilarity matrix:
0
1.31 0
0.44 0.87 0
0.43 1.74 0.87 0
November 25, 2014
Data Mining: Concepts and 19
Vector Objects
arrays, etc.
By using above eq, the similarity between x and y is
2 2
( 0 1 0 0 )
( , )
s x y