Learning Classification using SMILE

After an introduction to SMILE, lets go through the various steps necessary for its implementation. You may refer to the blog for an introduction to SMILE. What can be most important thing for implementing any Machine Learning algorithm ? The answer is really easy. Data is primarily the most important thing as Machine Learning involves building models from data. So it is required to play with the data first. When we get some data to analyse, it might not be in a right format required for analyzing. The data can be large and vague. So we need to classify the data first in order to apply any of the Machine Learning Algorithms.

Since Smile is a machine intelligence learning engine, that primarily involves machine Learning and its algorithms hence, one of the core concepts of Machine Learning is to generalize from its experience.

How about a system which can determine the output for a given input and it may also help you to predict them or analyse them(for future purposes)? Smile helps in seeking a corrective output for a given input instance using the experience gathered from the training data. The first step to achieve this experience, is by using the Classification Algorithms.

Smile’s Classification algorithms are in the package smile.classification and all these algorithms implement the interface Classifier, which has a method called predict which is used to provide a class label for the input instances. This simply helps in assigning an input instance into a given number of categories. Classification is normally referred to as Supervised Procedure, i.e. a procedure that produces an inferred function called as a classifier if the output is discrete or regression function if the output is continuous. An instance is described by a vector of features i.e. a description for all the discriminating characteristics. Smile has many Classification Algorithms like

  • k-Nearest Neighbor
  • Logistic Regression
  • Decision Trees
  • Gradient Boosting
  • AdaBoost and many more.


The above algorithms have their performance based on the kind of features used in the feature vector. The algorithms like linear regression, logistic regression, k-nearest neighbor and neural networks require the input features to be numerical and scaled to similar ranges.

Heterogeneous data can be handled using Decision trees and Boosting algorithms. If the features involve highly correlated data (redundant) then imposing some form of regularization may help.  If the features involved have independent contribution, then algorithms based on linear regression, logistic regression, naive Bayes perform well. These algorithms perform poor when the data is highly correlated. Having complex relations between the features, allows algorithms like non-linear support vector, decision trees and neural networks to perform well.

So classification algorithms depend on the input feature vectors involved for the data.

This is it. Classification is the foremost step in order to study data and draw a model using Machine Learning algorithms. The rest of the steps will be covered in the subsequent blogs. So keep reading and learning from our blogs. #mlforscalalovers



This entry was posted in Scala and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s