Weka Tool:- Weka provides implementations of learning algorithms that one can easily apply to the dataset.
Its an effort to explain how weka tool can be used to implement data mining algorithm to sample data set in “Explorer”.
1. The data set can be organized into an excel sheet and saved as .CSV format.
The data set should be in either .CSV(comma separated values ) or .ARFF (attribute relation file format) since both of these formats are supported by Weka.
2. Then start Weka tool and click on “Explorer”.Click on “Open File” to select data set file
3. The format of .csv file can be changed into .arff file by clicking on “save”.when you specify a .csv file it is automatically converted into ARFF format. The workbench of weka includes methods for regression,classification,clustering, association rule mining and attribute selection.
4. The below screen shot shows the screen once you have loaded the file. This screen shows information about data set i.e. 5 attributes and their instances. The histogram shows how often the two values of play class occurs for each value of “outlook” attribute.
*the “outlook” is nominal attribute. If you select numeric attribute, you can see its minimum and maximum values,mean and standard deviation.
5. In this example we would be applying decision tree algorithm (C4.5 is an algorithm used to generate a decision tree developed by Ross Quinlan) to our data set.
First select classifier and click on “choose” button. This will provide all the list of algorithms then select “J48”
The model is generated from the full dataset available from the “Preprocess” .
At the lowest level we can see the Confusion matrix :-a confusion matrix, also known as a contingency table or an error matrix , is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one (in unsupervised learning it is usually called a matching matrix)
6. The result can be visualized by right clicking on selected result (“visualize tree”).
The attributes can be visualized by clicking on “visualize” .This will provide a plot matrix where relationship among attributes can be analysed by clicking on “select attributes” .
- “Practical machine learning tools and techniques” by Ian H. Witten, Eibe Frank, and Mark A. Hall.