MachineX: KNN algorithm using KSAI

Table of contents
Reading Time: 3 minutes

Classification is a well-known area of machine learning. the K-Nearest neighbor algorithm is a simple algorithm that keeps all available cases and classifies new cases based on the similarity with existing cases. KNN has been used in pattern recognition as a non-parametric technique. in this algorithm, a case is classified by a majority of votes of its neighbors. if K=1 then the cases are assigned directly to the class of its immediate neighbor. Similarly if K=2 then on the basis of two immediate neighbors we can decide the class of the new cases.


Now the question arises, how can we find the nearest neighbors? The answer is by calculating the distance between the data points (cases). Actually, there are a lot of ways to do that. Here are some of the popular methods to find the distance between two data points:

  1. Euclidean distance
  2. Manhattan Distance
  3. Minkowski Distance

We can choose either of the method based on the use case. It is also important to know that all above distance measures are only for continuous variables.

The next important point is to choose an optimal value for K. This can be best done by analyzing the data. A large value is more precise as it reduces the noise but there is no guarantee. Cross-validation is another way to determine the good K value by using an independent data set to validate the value of K. Usually the value between 3-10 was found optimal for most data sets. In this blog, we will be using KSAI a machine learning library written in Scala for training our model and prediction based on that training.

How to use KSAI for KNN algorithm?

KSAI is an open source machine learning library which contains various algorithms such as classification, regression, clustering and many others. It is an attempt to build machine learning algorithms with the language Scala. The library Breeze, which is again built on Scala is getting used for doing the mathematical functionalities.

KSAI mainly used Scala’s inbuilt case classes, Future and some of the other cool features. It has also used Akka in some places and tried doing things in an asynchronous fashion. In order to start exploring the library, the test cases might be a good start. Right now it might not be that easy to use the library with limited documentation and unclear api, however, the committers will update them in the near future.

Here is how we can use KSAI for KNN algorithm.

1. Adding library to project:

You can add this library to your project using the following lines:

libraryDependencies += "io.github.knolduslabs.ksai" %% "ksai" % "0.0.2"

Once the library is there on your project, you have to refresh your project after compiling it. Once compiled you should be able to access the required classes for KNN algorithm.

Here is a sample code block that uses KNN algorithm and also uses the same data set to validate the results:

val arffFile: String = getClass.getResource("/sampledata.arff").getPath
val arff: ARFF[String] = ARFFParser.parse(arffFile)

val data: Array[Array[Double]] =
val results: Array[Int] = arff.getNumericTargets.toArray

//KNN with K = 3
val knn3: KNN = KNN.learn(data, results, 3)
var error = 0
(0 until data.length).map{ i =>
  val result = knn3.predict(data(i))
  if(result != results(i)){
    error = error + 1
println("\n\nKNN with K = 3 ======>  ERROR: " + error)

Here the “sampledata.arff” is the data file in arff format. Using “ARFFParser.parse(arffFile)” you can parse the file and generate the data understood by the algorithm. Once the data is generated in Array[Array[Double]] form you can use this data to train your algorithm.

Using the following lines you can train your algorithm:

val knn3: KNN = KNN.learn(data, results, 3)

Here, 3 is the value of K and you can choose it according to your requirements by check the number of errors after validation.

You can tweak the data and value of K to find the perfect settings according to requirements.

To find more interesting algorithm from KSAI please visit the following link:

KSAI: A Machine learning library

I hope you enjoyed the post. Maybe in our next post, we will be going deeper into the algorithm.



Written by 

Girish is a Software Consultant, with experience of more than 3.5 years. He is a scala developer and very passionate about his interest towards Scala Eco-system. He has also done many projects in different languages like Java and He can work in both supervised and unsupervised environment and have a craze for computers whether working or not, he is almost always in front of his laptop's screen. His hobbies include reading books and listening to music. He is self motivated, dedicated and focused towards his work. He believes in developing quality products. He wants to work on different projects and different domains. He is curious to gain knowledge of different domains and try to provide solutions that can utilize resources and improve performance. His personal interests include reading books, video games, cricket and social networking. He has done Masters in Computer Applications from Lal Bahadur Shastri Institute of Management, New Delhi.