Learning Classification using SMILE

Reading Time: 2 minutes

After an introduction to SMILE, lets go through the various steps necessary for its implementation. You may refer to the blog for an introduction to SMILE. What can be most important thing for implementing any Machine Learning algorithm ? The answer is really easy. Data is primarily the most important thing as Machine Learning involves building models from data. So it is required to play with the data first. When we get some data to analyse, it might not be in a right format required for analyzing. The data can be large and vague. So we need to classify the data first in order to apply any of the Machine Learning Algorithms.

Since Smile is a machine intelligence learning engine, that primarily involves machine Learning and its algorithms hence, one of the core concepts of Machine Learning is to generalize from its experience.

How about a system which can determine the output for a given input and it may also help you to predict them or analyse them(for future purposes)? Smile helps in seeking a corrective output for a given input instance using the experience gathered from the training data. The first step to achieve this experience, is by using the Classification Algorithms.

Smile’s Classification algorithms are in the package smile.classification and all these algorithms implement the interface Classifier, which has a method called predict which is used to provide a class label for the input instances. This simply helps in assigning an input instance into a given number of categories. Classification is normally referred to as Supervised Procedure, i.e. a procedure that produces an inferred function called as a classifier if the output is discrete or regression function if the output is continuous. An instance is described by a vector of features i.e. a description for all the discriminating characteristics. Smile has many Classification Algorithms like

  • k-Nearest Neighbor
  • Logistic Regression
  • Decision Trees
  • Gradient Boosting
  • AdaBoost and many more.


The above algorithms have their performance based on the kind of features used in the feature vector. The algorithms like linear regression, logistic regression, k-nearest neighbor and neural networks require the input features to be numerical and scaled to similar ranges.

Heterogeneous data can be handled using Decision trees and Boosting algorithms. If the features involve highly correlated data (redundant) then imposing some form of regularization may help.  If the features involved have independent contribution, then algorithms based on linear regression, logistic regression, naive Bayes perform well. These algorithms perform poor when the data is highly correlated. Having complex relations between the features, allows algorithms like non-linear support vector, decision trees and neural networks to perform well.

So classification algorithms depend on the input feature vectors involved for the data.

This is it. Classification is the foremost step in order to study data and draw a model using Machine Learning algorithms. The rest of the steps will be covered in the subsequent blogs. So keep reading and learning from our blogs. #mlforscalalovers



Written by 

Rachel Jones is a Solutions Lead at Knoldus Inc. having more than 22 years of experience. Rachel likes to delve deeper into the field of AI(Artificial Intelligence) and deep learning. She loves challenges and motivating people, also loves to read novels by Dan Brown. Rachel has problem solving, management and leadership skills moreover, she is familiar with programming languages such as Java, Scala, C++ & Html.