Understanding Support Vector Machines

[Contributed by Raghu from Knoldus, Canada]

One of the important and popular classification techniques among Machine Learning algorithms is Support Vector Machines. This is also called large margin classification. Support Vector Machine technique results in a hyperplane that separates and hence classifies samples into two distinct classes. SVM results in such a plane that not only separates samples but does it with maximum separation possible. Thus the name large margin classifier. A 2-dimensional depiction of this is shown in the picture below. This is the case of a linear SVM where the decision boundary that separates the classes is linear.

Screenshot from 2016-08-18 22-34-29

Support Vector Machines also support classification where the decision boundary is non-linear. In this case, SVM uses a Kernel. Most popular kernel that is used for non-linear decision problems is what is called an Radial Basis Function Kernel (RBF Kernel in short). This is also called a Gaussian Kernel. Below are 2 images that will depict the working of the SVM with Gaussian Kernel which does classification using non-linear decision boundary.

Screenshot from 2016-08-18 22-35-40.png

One of the easiest ways to build SVM is to use a SVM implementations available in many of the popular ML libraries for various languages. LIBSVM, Scikit-learn and Spark ML are all examples of SVM implementations that are available to use. In this article, we will demonstrate a simple way to build an SVM, train it and then use it using scikit-learn using Python.

The following listing shows a Python session

In the above Python session, we created a classifier that uses an SVM. As can see from the below output, the kind of kernel used is RBF. RBF kernel takes gamma as parameter. In this case, gamma is set automatically. We need to specify the value of C, which is another hyperparamter, which by default is set to 1.0.

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape=None, degree=3, gamma=’auto’, kernel=’rbf’,
max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False)

We have inputs with 2 dimensions. In this case, we have 2 examples [0,0] and [1,1] and the values of y for these inputs is 0 and 1. In this case, the SVM will come up with a decision boundary that is a line with [0,0] and [1,1] on either side. And we can now use this SVM, by giving it an X and and SVM classifies it and prints out the output. It classifies [.3,.3] as 0 and [.6,.6] as 1.

Enjoy!

 

Written by 

Vikas is the CEO and Co-Founder of Knoldus Inc. Knoldus does niche Reactive and Big Data product development on Scala, Spark, and Functional Java. Knoldus has a strong focus on software craftsmanship which ensures high-quality software development. It partners with the best in the industry like Lightbend (Scala Ecosystem), Databricks (Spark Ecosystem), Confluent (Kafka) and Datastax (Cassandra). Vikas has been working in the cutting edge tech industry for 20+ years. He was an ardent fan of Java with multiple high load enterprise systems to boast of till he met Scala. His current passions include utilizing the power of Scala, Akka and Play to make Reactive and Big Data systems for niche startups and enterprises who would like to change the way software is developed. To know more, send a mail to hello@knoldus.com or visit www.knoldus.com

1 thought on “Understanding Support Vector Machines

Leave a Reply

%d bloggers like this: