Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Data Classification in WEKA - Data Mining | CSC 573, Assignments of Computer Science

Material Type: Assignment; Class: Data Mining; Subject: Computer Science; University: University of Illinois Springfield; Term: Unknown 1989;

Typology: Assignments

Pre 2010

Uploaded on 08/18/2009

koofers-user-7ix-2
koofers-user-7ix-2 🇺🇸

4

(1)

10 documents

1 / 2

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CSC 573: Data Mining
Weka Assignment #2: Data Classification in WEKA
Instructor: Ratko Orlandic
In this assignment, you will explore data-classification facilities in WEKA using both Explorer and
Experimenter. You will use “contact-lenses”, “iris”, and “soybean” data sets, all of which are
available in the required .arff format in the WEKA package. The “contact-lenses” data set has 24
instances with 5 nominal attributes, the last of which (“contact-lenses”) is the class dimension. The
“iris” set has 150 instances with 4 continuous attributes and the nominal class, which is the last
(5th) dimension. The “soybean” set has 683 instances with 36 nominal attributes, the last of which
is the class dimension. Unlike the other two sets, “soybean” has missing values.
Data Classification in WEKA Explorer
For each experiment A-C below, use WEKA Explorer to perform data classification using the
following classification methods with default parameters: 1) OneR; 2) NaiveBayesSimple; 3) Id3;
and 4) J48. For each method on every data set, use the following evaluation methods (“Test
options” in the “Classify” window of the WEKA Explorer): a) “Use training set”; b) “Cross-
validation” with 10 folds; and c) “Percentage split” set to 66%. Record the results of each run in a
CSV “Results1.csv” or Excel file “Results.xls” were you will indicate only: a) the experiment (A-
C), b) the name of the input data file; c) the classification method; d) the evaluation strategy used;
and e) the percentage of correctly classified instances.
A. Perform data classification on the “contact-lenses.arff” data set using the four classification
methods and each of the evaluation strategies indicated above.
B. Perform discretization of all non-class attributes in the “iris.arff” data set into 10 equal-width
bins as follows: under “Filter” in the “Preprocess” window of the Explorer, select ‘filters’-
>’unsupervised’->’attribute’->’Discretize’. Use default parameters for the ’Discretize’ filter. After
you make sure that all non-class attributes are nominal, perform classification on this set using the
four classification methods and each of the evaluation strategies indicated above.
C. Perform discretization of all non-class attributes in the “iris.arff” data set into 5 close-to equal-
height bins by selecting the ’Discretize’ filter and choosing appropriate parameters. After you
make sure that all non-class attributes are nominal, perform classification on this set using the four
classification methods and each of the evaluation strategies indicated above.
Data Classification in WEKA Experimenter
D. For this experiment, use WEKA Experimenter. Perform data classification on “contact-
lenses.arff” and “soybean.arff” using the OneR, NaiveBayesSimple, and J48 classification
methods with default parameters. For each method on both data sets, use the 10 times 10 fold cross
validation as the evaluation method. Record the results in the “RawResults.csv” file. From these
results, compute average accuracy of each method for every data set (“contact-lenses” and
“soybean”), and include in the “Results.csv” (i.e. “Results.xls”) file (the same result file as for
experiments A-C): a) the experiment D, b) the name of the data file; c) the classification method;
pf2

Partial preview of the text

Download Data Classification in WEKA - Data Mining | CSC 573 and more Assignments Computer Science in PDF only on Docsity!

CSC 573: Data Mining Weka Assignment #2 : Data Classification in WEKA Instructor: Ratko Orlandic

In this assignment, you will explore data-classification facilities in WEKA using both Explorer and Experimenter. You will use “contact-lenses”, “iris”, and “soybean” data sets, all of which are available in the required .arff format in the WEKA package. The “contact-lenses” data set has 24 instances with 5 nominal attributes, the last of which (“contact-lenses”) is the class dimension. The “iris” set has 150 instances with 4 continuous attributes and the nominal class, which is the last (5 th^ ) dimension. The “soybean” set has 683 instances with 36 nominal attributes, the last of which is the class dimension. Unlike the other two sets, “soybean” has missing values.

Data Classification in WEKA Explorer

For each experiment A-C below, use WEKA Explorer to perform data classification using the following classification methods with default parameters: 1) OneR; 2) NaiveBayesSimple; 3) Id3; and 4) J48. For each method on every data set, use the following evaluation methods (“Test options” in the “Classify” window of the WEKA Explorer): a) “Use training set”; b) “Cross- validation” with 10 folds; and c) “Percentage split” set to 66%. Record the results of each run in a CSV “Results1.csv” or Excel file “Results.xls” were you will indicate only: a) the experiment (A- C), b) the name of the input data file; c) the classification method; d) the evaluation strategy used; and e) the percentage of correctly classified instances.

A. Perform data classification on the “contact-lenses.arff” data set using the four classification methods and each of the evaluation strategies indicated above.

B. Perform discretization of all non-class attributes in the “iris.arff” data set into 10 equal-width bins as follows: under “Filter” in the “Preprocess” window of the Explorer, select ‘filters’-

’unsupervised’->’attribute’->’Discretize’. Use default parameters for the ’Discretize’ filter. After you make sure that all non-class attributes are nominal, perform classification on this set using the four classification methods and each of the evaluation strategies indicated above.

C. Perform discretization of all non-class attributes in the “iris.arff” data set into 5 close-to equal- height bins by selecting the ’Discretize’ filter and choosing appropriate parameters. After you make sure that all non-class attributes are nominal, perform classification on this set using the four classification methods and each of the evaluation strategies indicated above.

Data Classification in WEKA Experimenter

D. For this experiment, use WEKA Experimenter. Perform data classification on “contact- lenses.arff” and “soybean.arff” using the OneR, NaiveBayesSimple, and J48 classification methods with default parameters. For each method on both data sets, use the 10 times 10 fold cross validation as the evaluation method. Record the results in the “RawResults.csv” file. From these results, compute average accuracy of each method for every data set (“contact-lenses” and “soybean”), and include in the “Results.csv” (i.e. “Results.xls”) file (the same result file as for experiments A-C): a) the experiment D, b) the name of the data file; c) the classification method;

d) the evaluation strategy; and e) the average percentage of correctly classified instances over 10x10 runs.

NOTE: To perform this experiment, in the “Setup” window of the Experimenter:

  1. Select “New” as the “Experiment Configuration Mode”;
  2. In the “Results Destinations” view, select “CSV file” and type “RawResults” as the filename;
  3. In the “Experiment Type” view, select cross-validation with 10 as the number of folds and click on the “Classification” button;
  4. In the “Iteration Control” view, select 10 as number of repetitions and click on “Data sets first”;
  5. In the “Datasets” view, select “Add new” to add “contact-lenses.arff” and “soybean.arff”;
  6. In the “Algorithms” view, select “Add new” to add OneR, NaiveBayesSimple, and J48 methods with default parameters;
  7. Switch from “Setup” to “Run” window and click on “Start”.

Evaluation

Once you have performed the experiments, you should spend some time evaluating your results. In particular, try to answer at least the following questions: Which classification method typically gives the highest accuracy? Which method does not perform well and why? Why did we use discretization of the “iris” data set? Does discretization and its method affect the results of classification and how? Which of the three evaluation methods overestimates the accuracy and why? Which of the three evaluation methods underestimates the accuracy and why? Record these and any other observations in a Word file called “Observations.doc”.

Assignment Submission and Grading

On or before the due date, you should submit in a single zipped file through the Blackboard system: a) the “Results.csv” (or “Results.xls”) file with the summary results of your runs in all experiments A-D, b) the “RawResults.csv” of the experiment D, and c) the “Observations.doc” file. Please adhere to the following submission procedure:

  1. ZIP all files using WinZip;
  2. Name the zipped file as follows: LastnameFirstnameAssign3.zip ;
  3. Submit the zipped file through the digital drop box in the Blackboard system.

Grading will be done based on the correctness of the results as well as the extensiveness, clarity, and correctness of your observations.

Good luck!