


Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Material Type: Exam; Class: DATA MINING; Subject: MANAGERIAL SCIENCES; University: Georgia State University; Term: Fall 2008;
Typology: Exams
1 / 4
This page cannot be seen from the preview
Don't miss anything!
In the Data Mining class we examined 3 primary areas: Data preparation, Exploration & Forecasting. Data preparation
Category (Down)Techniques (Across) -> Cluster Analysis Linear Discriminate Analysis Logistic Regression Goal Group observations into clumps/groups Predict something occurring given known parameters and values Predict likelihood that an event or entity will belong to a particular group Example Provide different loan or financial packages based on attributes of an applicant that match a cluster group Using a few attributes about an applicant determine likelihood they would not fail on a loan Whether or not a women continues to use contraception after a year Assumptions - Data Use a smaller refined more specific subset of variables that is portraying characteristics of groups. No dependent variable. The data has known dependent values that are cleaned and standardized. Account for missing and odd data. Generally only two categories of dependent variables. Y values are 0 or 1. Technique Group observations into clusters. The minimum cluster groups you can have are 2. You can many more than 2 clusters. Analyze the cluster results for distinction. Determine optimal cluster group count. Section independent variables into to 10% or smaller groups. Then create dummy variables based on logical grouping within variables from Xtab jobs. Run regression analysis on dummy variables. Eleiminate unnecessary dummies based on P value correlation. Generate model. If necessary run again and eliminate or add dummies.