MachineX: Choosing Support Vector Machine over other classifiers

Table of contents
Reading Time: 4 minutes

When one has to select the best classifier from all the good options, often he would be on the horns of a dilemma. Decision tree/Random Forest, ANN, KNN, Logistic regression etc. are some options that often used for the choice of classification. Every one of it has its pros and cons and when to select the best one the probably the most important thing to consider is the type of data with available examples. For example, when our data has categorical values the best choice would be the decision tree and to avoid risk probably we would select the Random Forest. With lots of points in a low dimensional space, KNN is also a good choice many a time. ANN is another good choice till one can find the global minima without any struggle.

But as the title suggests, we are going to see when to select support vector machine and why. Undoubtedly KNN is a good choice many a time but it is error prone to wrong features and also it is quite dependent on “k”. Similarly, anything over ANN is simple but finding a global minimum is a tough job in ANN sometime you just won’t find it and it often overfits. However, SVM has a very intelligent way of avoiding overfitting as well as with a few examples it can be trained for high dimensional space. Moreover, if one has a few points in a high dimensional space linear SVM is again preferable.

So far we have covered the points which could be used in a brainstorming session among the data scientist, but if you are new to Support Vector Machine you gonna like it from here. As maximum effort would be to make you depict it well. Now consider the image below it has two sets of data one with blue color and the other is with red color.


By taking a look at it you can easily classify that. But the problem is where to draw the line so that we can predict the real data easily. Well, we can draw it anywhere just like the image below.


Well, as you can already think of, separating the dots with any line is not what we want. We need the best way to separate them. We need a line which would be at the widest distance from the boundary points. Something like below


And exactly as you can think of Support Vector Machine gives us that best way to divide them just like the diagram above. The orange line in the middle is the hyperplane which best divides the classes and the red and the blue points which are touching the borders are called the support vectors. Basically, it has the maximum margin towards both sides. But the question is how do we calculate that? As we can see points here are linearly separable (When the data can be divided by a straight line its called the linearly separable), we can resolve it through Hard-Margin. Here the hyperplane can be represented with w. x -b =0, where w represents the weight vector and x is the vector of the values and b is the constant. w. x -b >= 1, if y = 1, where y is the label of the one particular dataset, basically it is representing one straight line that is connected to a support vector. Similarly w. x -b <= -1 if y = -1 and it is representing the other straight line connected to the other support vector. So basically this is the constraints state that each data point must lie on the correct side of the margin. By considering these constraints we can write the following formula yi(w.x -b 1) for all 1< i <n

So far whatever we have gone through is for linearly separable data which can also be at one level can be solved by any classifier. The real challenge comes when data are not linearly separable. In this case, Soft Margin comes to play which is an extended version of Hard-Margin. The tale of Soft Margin is even more interesting, so stay tuned for the next blog on SVM with Soft Margin.

Out of the scope Note: Selecting ANN over SVM or vice versa has lots of different reasons. Basically, we need to try out our example data first to select the one. In the worst case, the no. of support vectors will be equal to the no. of examples given, wherein ANN that never the case, the hidden middle layer, and the bais always take care of such situations. The most direct way to create an n-array classifier with support vector machines is to create n support vector machines and train each of them one by one. On the other hand, an n-array classifier with neural networks can be trained in one go.





Written by 

Pranjut Gogoi is the enthusiast of Machine Learning and AI with 8+ years of experience. He is been implementing different machine learning projects in Knoldus. He started an initiative called MachineX through which they share knowledge with the world. With this initiative, he broadcasts different free webinars, write different blogs and contributes to open source communities on machine learning and AI.