Data Classification in WEKA - Data Mining | CSC 573 | Assignments Computer Science

CSC 573: Data Mining

Weka Assignment #2: Data Classification in WEKA

Instructor: Ratko Orlandic

In this assignment, you will explore data-classification facilities in WEKA using both Explorer and

Experimenter. You will use “contact-lenses”, “iris”, and “soybean” data sets, all of which are

available in the required .arff format in the WEKA package. The “contact-lenses” data set has 24

instances with 5 nominal attributes, the last of which (“contact-lenses”) is the class dimension. The

“iris” set has 150 instances with 4 continuous attributes and the nominal class, which is the last

(5th) dimension. The “soybean” set has 683 instances with 36 nominal attributes, the last of which

is the class dimension. Unlike the other two sets, “soybean” has missing values.

Data Classification in WEKA Explorer

For each experiment A-C below, use WEKA Explorer to perform data classification using the

following classification methods with default parameters: 1) OneR; 2) NaiveBayesSimple; 3) Id3;

and 4) J48. For each method on every data set, use the following evaluation methods (“Test

options” in the “Classify” window of the WEKA Explorer): a) “Use training set”; b) “Cross-

validation” with 10 folds; and c) “Percentage split” set to 66%. Record the results of each run in a

CSV “Results1.csv” or Excel file “Results.xls” were you will indicate only: a) the experiment (A-

C), b) the name of the input data file; c) the classification method; d) the evaluation strategy used;

and e) the percentage of correctly classified instances.

A. Perform data classification on the “contact-lenses.arff” data set using the four classification

methods and each of the evaluation strategies indicated above.

B. Perform discretization of all non-class attributes in the “iris.arff” data set into 10 equal-width

bins as follows: under “Filter” in the “Preprocess” window of the Explorer, select ‘filters’-

>’unsupervised’->’attribute’->’Discretize’. Use default parameters for the ’Discretize’ filter. After

you make sure that all non-class attributes are nominal, perform classification on this set using the

four classification methods and each of the evaluation strategies indicated above.

C. Perform discretization of all non-class attributes in the “iris.arff” data set into 5 close-to equal-

height bins by selecting the ’Discretize’ filter and choosing appropriate parameters. After you

make sure that all non-class attributes are nominal, perform classification on this set using the four

classification methods and each of the evaluation strategies indicated above.

Data Classification in WEKA Experimenter

D. For this experiment, use WEKA Experimenter. Perform data classification on “contact-

lenses.arff” and “soybean.arff” using the OneR, NaiveBayesSimple, and J48 classification

methods with default parameters. For each method on both data sets, use the 10 times 10 fold cross

validation as the evaluation method. Record the results in the “RawResults.csv” file. From these

results, compute average accuracy of each method for every data set (“contact-lenses” and

“soybean”), and include in the “Results.csv” (i.e. “Results.xls”) file (the same result file as for

experiments A-C): a) the experiment D, b) the name of the data file; c) the classification method;

Data Classification in WEKA - Data Mining | CSC 573, Assignments of Computer Science