In this blog post, we will look at a detailed explanation of how to use SVM for complex decision boundaries and build Non-Linear Classifiers using SVM. The primary method for doing this is by using Kernels.
In linear SVM we find margin maximizing hyperplane with features Xi’s . Similarly, in Logistic regression, we also try to find the hyperplane which minimizes logistic loss with features Xi’s. Most often when we use both these techniques the results are the same. But linear SVM or for the same reason a logistic regression would fail where there is a need to have complex or non-linear decision boundaries. These types of boundaries are then achieved by SVM using Kernels. So let us understand how SVM creates non-linear boundaries using Kernels
In this, we first try to pick a few points on the feature plane and call them landmarks and then we try to compute new features for an example(X) depending on the closeness of these features to the landmarks. What these features do is they measure how similar example X is to one of the landmarks. In this particular post, we will use the Gaussian Kernel also known as RBF kernel to compute the similarity measure.
Now let’s suppose the X1
≈ L1 then according to the formula the similarity score (f1) ≈ 1 whereas if X1 is far from the landmark L1 then f1 ≈ 0. In simple terms, this means that feature f1 is almost similar to the landmark L1. So we predict the value of feature f1 to be similar to L1. In same way we add more landmarks to our plane and try to come up with a feature vector by computing the similarity score of our example with these landmarks.
In the RBF kernel formula we have a feature known as gamma. Gamma controls how far the influence of the single training example reaches, which in turn effect how tightly the decision boundary ends up surrounding the input space. Small gamma means a larger similarity radius, which means the values father apart are grouped together which results in more points grouped together and smoother decision boundary. Whereas smaller gamma means that the points need to be closely packed which results in more complex tightly decision boundaries
How to choose the landmarks ?
A simple rule of the thumb is that we take all the examples in the training set and make then landmarks at the same location as that of the training example.
Given (x^1, y^1), (x^2, y^2), (x^3, y^3)........................ (x^n, y^n)
l^1 = x^1, l^2 = x^2,l^3 = x^3............................................ l^n = x^n
Therefore for every new training example (x^i, y^i) we will then find similarity scores according to each landmark
f^1 = sim(x^i, l^1)
f^2 = sim(x^i, l^2)
f^3 = sim(x^i, l^3)
f^n = sim(x^i, l^n)
which would give us a new feature vector F
So now if we want to predict an estimate for new example given we already have model coffecients we would then use a hypothesis where it says
Predict y=1 if
θ^T * F >= 0
To estimate the model cofficients we would then minimize the cost function:
We see that we have a parameter C that controls the tradeoff between satisfying the maximum margin criterion to find the simple decision boundary and avoiding misclassification errors on the training set. The C parameter is additionally a significant one for kernelized SVMs, and it communicates with the gamma parameter. If gamma is large, then C will have little to no effect. Whereas, if gamma is small, the model is much more constrained and C will be similar to how it would affect a linear classifier.
Hope you enjoyed the post.