Download Data Mining - Associative Classification and more Study notes Data Mining in PDF only on Docsity!
November 20, 2014 Data Mining: Concepts and 1
Chapter 6. Classification and Prediction
- (^) What is classification? What is prediction?
- (^) Issues regarding classification and prediction
- (^) Classification by decision tree induction
- (^) Bayesian classification
- (^) Rule-based classification
- (^) Classification by back propagation - (^) Support Vector Machines (SVM) - (^) Associative classification - (^) Lazy learners (or learning from your neighbors) - (^) Other classification methods - (^) Prediction - (^) Accuracy and error measures - (^) Ensemble methods - (^) Model selection - Summary
November 20, 2014 Data Mining: Concepts and 2
Associative Classification
- (^) Associative classification
- (^) Association rules are generated and analyzed for use in classification
- (^) Search for strong associations between frequent patterns (conjunctions of attribute-value pairs) and class labels
- (^) Classification: Based on evaluating a set of rules in the form of P 1 ^ p 2 … ^ pl “Aclass = C” (conf, sup)
- (^) Why effective?
- (^) It explores highly confident associations among multiple attributes and may overcome some constraints introduced by decision-tree induction, which considers only one attribute at a time
- (^) In many studies, associative classification has been found to be more accurate than some traditional classification methods, such as C4.
November 20, 2014 Data Mining: Concepts and 4
A Closer Look at CMAR
- (^) CMAR (Classification based on Multiple Association Rules: Li, Han, Pei, ICDM’01)
- (^) Efficiency: Uses an enhanced FP-tree that maintains the distribution
of class labels among tuples satisfying each frequent itemset
- (^) Rule pruning whenever a rule is inserted into the tree
- Given two rules, R 1 and R 2 , if the antecedent of R 1 is more
general than that of R 2 and conf(R 1 ) ≥ conf(R 2 ), then R 2 is pruned
- (^) Prunes rules for which the rule antecedent and class are not
positively correlated, based on a χ^2 test of statistical significance
- (^) Classification based on generated/pruned rules
- (^) If only one rule satisfies tuple X, assign the class label of the rule
- (^) If a rule set S satisfies X, CMAR
- (^) divides S into groups according to class labels
- (^) uses a weighted χ^2 measure to find the strongest group of
rules, based on the statistical correlation of rules within a
group
- (^) assigns X the class label of the strongest group
November 20, 2014 Data Mining: Concepts and 5 Associative Classification May Achieve High Accuracy and Efficiency (Cong et al. SIGMOD05)
November 20, 2014 Data Mining: Concepts and 7
Lazy vs. Eager Learning
- (^) Lazy vs. eager learning
- (^) Lazy learning (e.g., instance-based learning): Simply stores training data (or only minor processing) and waits until it is given a test tuple
- (^) Eager learning (the above discussed methods): Given a set of training set, constructs a classification model before receiving new (e.g., test) data to classify
- (^) Lazy: less time in training but more time in predicting
- (^) Accuracy
- (^) Lazy method effectively uses a richer hypothesis space since it uses many local linear functions to form its implicit global approximation to the target function
- (^) Eager: must commit to a single hypothesis that covers the entire instance space
November 20, 2014 Data Mining: Concepts and 8
Lazy Learner: Instance-Based Methods
- (^) Instance-based learning:
- (^) Store training examples and delay the processing (“lazy evaluation”) until a new instance must be classified
- (^) Typical approaches
- (^) k -nearest neighbor approach
- (^) Instances represented as points in a Euclidean space.
- (^) Locally weighted regression
- (^) Constructs local approximation
- (^) Case-based reasoning
- (^) Uses symbolic representations and knowledge-based inference
November 20, 2014 Data Mining: Concepts and 10
The k -Nearest Neighbor Classifiers
- (^) For categorical attributes we compare the corresponding value of the attribute in tuple X 1 with that in tuple in X 2. If the two are identical then the difference between the two is taken as 0.If the two are different then the difference is considered to be 1.
- (^) Missing values use the concept of difference.
- (^) Good value of k (nearest neighbor) by test set to estimate the error rate of the classifier. The k value that gives the minimum error rate may be selected.
- (^) NNC use distance-based comparisons that intrinsically assign equal weight to each attribute.
- (^) They can suffer from poor accuracy when given noisy or irrelevant attributes.
November 20, 2014 Data Mining: Concepts and 11
Case-Based Reasoning (CBR)
- (^) CBR: Uses a database of problem solutions to solve new problems
- (^) Store symbolic description (tuples or cases)—not points in a Euclidean space
- (^) Applications: Customer-service (product-related diagnosis), legal ruling
- (^) Methodology
- (^) Instances represented by rich symbolic descriptions (e.g., function graphs)
- (^) Search for similar cases, multiple retrieved cases may be combined
- (^) Tight coupling between case retrieval, knowledge-based reasoning, and problem solving
- (^) Challenges
- (^) Find a good similarity metric
- (^) Indexing based on syntactic similarity measure, and when failure, backtracking, and adapting to additional cases
November 20, 2014 Data Mining: Concepts and 13
Genetic Algorithms (GA)
- (^) Genetic Algorithm: based on an analogy to biological evolution
- (^) An initial population is created consisting of randomly generated
rules
- (^) Each rule is represented by a string of bits
- (^) E.g., if A 1 and ¬A 2 then C 2 can be encoded as 100
- (^) If an attribute has k > 2 values, k bits can be used
- (^) Based on the notion of survival of the fittest , a new population is
formed to consist of the fittest rules and their offsprings
- (^) The fitness of a rule is represented by its classification accuracy
on a set of training examples
- (^) Offsprings are generated by crossover and mutation
- (^) The process continues until a population P evolves when each rule
in P satisfies a prespecified threshold
- (^) Slow but easily parallelizable
November 20, 2014 Data Mining: Concepts and 14 Genetic Algorithms (^) A Genetic Algorithm (GA) is a computational model consisting of five parts: A starting set of individuals, P. Crossover : technique to combine two parents to create offspring. Mutation: randomly change an individual. Fitness: determine the best individuals. Algorithm which applies the crossover and mutation techniques to P iteratively using the fitness function to determine the best individuals in P to keep.
November 20, 2014 Data Mining: Concepts and 16
Rough Set Approach
- (^) Rough sets are used to approximately or “roughly” define equivalent classes
- (^) A rough set for a given class C is approximated by two sets: a lower approximation (certain to be in C) and an upper approximation (cannot be described as not belonging to C)
- (^) Finding the minimal subsets ( reducts ) of attributes for feature reduction is NP-hard but a discernibility matrix (which stores the differences between attribute values for each pair of data tuples) is used to reduce the computation intensity
November 20, 2014 Data Mining: Concepts and 17
Fuzzy Set
Approaches
- (^) Fuzzy logic uses truth values between 0.0 and 1.0 to represent the degree of membership (such as using fuzzy membership graph)
- (^) Attribute values are converted to fuzzy values
- (^) e.g., income is mapped into the discrete categories {low, medium, high} with fuzzy values calculated
- (^) For a given new sample, more than one fuzzy value may apply
- (^) Each applicable rule contributes a vote for membership in the categories
- (^) Typically, the truth values for each predicted category are summed, and these sums are combined