K Nearest Neighbor Regression (KNN) works in much the same way as KNN for classification. The difference lies in the characteristics of the dependent variable. With classification KNN the dependent variable is categorical. With regression KNN the dependent variable is continuous. Both involve the use neighboring examples to predict the class or value of other examples.
In this blog we will understand the basics and working of KNN for regression.
If you want to Learn how KNN for classification works , you can go to my previous blog i.e MachineX :k-Nearest Neighbors(KNN) for classification
Table of contents
- A simple example to understand the intuition behind KNN
- How does the KNN algorithm work?
- Methods of calculating distance between points
- How to choose the k factor?
- Working on a dataset
- Additional resources
1. A simple example to understand the intuition behind KNN
Let us start with a simple example. Consider the following table – it consists of the height, age and weight (target) value for 10 people. As you can see, the weight value of ID11 is missing. We need to predict the weight of this person based on their height and age.
Note: The data in this table does not represent actual values. It is merely used as an example to explain this concept.
For a clearer understanding of this, below is the plot of height versus age from the above table:
In the above graph, the y-axis represents the height of a person (in feet) and the x-axis represents the age (in years). The points are numbered according to the ID values. The yellow point (ID 11) is our test point.
If I ask you to identify the weight of ID11 based on the plot, what would be your answer? You would likely say that since ID11 is closer to points 5 and 1, so it must have a weight similar to these IDs, probably between 72-77 kgs (weights of ID1 and ID5 from the table). That actually makes sense, but how do you think the algorithm predicts the values? We will find that out in this article.
2. How does the KNN algorithm work?
As we saw above, KNN can be used for both classification and regression problems. The algorithm uses ‘feature similarity’ to predict values of any new data points. This means that the new point is assigned a value based on how closely it resembles the points in the training set. From our example, we know that ID11 has height and age similar to ID1 and ID5, so the weight would also approximately be the same.
Had it been a classification problem, we would have taken the mode as the final prediction. In this case, we have two values of weight – 72 and 77. Any guesses how the final value will be calculated? The average of the values is taken to be the final prediction.
Below is a stepwise explanation of the algorithm:
- First, the distance between the new point and each training point is calculated.
2. The closest k data points are selected (based on the distance). In this example, points 1, 5, 6 will be selected if value of k is 3. We will further explore the method to select the right value of k later in this article.
3. The average of these data points is the final prediction for the new point. Here, we have weight of ID11 = (77+72+60)/3 = 69.66 kg.
In the next few sections we will discuss each of these three steps in detail.
3. Methods of calculating distance between points
The first step is to calculate the distance between the new point and each training point. There are various methods for calculating this distance, of which the most commonly known methods are – Euclidian, Manhattan (for continuous) and Hamming distance (for categorical).
- Euclidean Distance: Euclidean distance is calculated as the square root of the sum of the squared differences between a new point (x) and an existing point (y).
- Manhattan Distance : This is the distance between real vectors using the sum of their absolute difference.
- Hamming Distance: It is used for categorical variables. If the value (x) and the value (y) are same, the distance D will be equal to 0 . Otherwise D=1.
Once the distance of a new observation from the points in our training set has been measured, the next step is to pick the closest points. The number of points to be considered is defined by the value of k.
4. How to choose the k factor?
The second step is to select the k value. This determines the number of neighbors we look at when we assign a value to any new observation.
In our example, for a value k = 3, the closest points are ID1, ID5 and ID6.
The prediction of weight for ID11 will be:
ID11 = (77+72+60)/3 ID11 = 69.66 kg
For the value of k=5, the closest point will be ID1, ID4, ID5, ID6, ID10.
The prediction for ID11 will be :
ID 11 = (77+59+72+60+58)/5 ID 11 = 65.2 kg
We notice that based on the k value, the final result tends to change. Then how can we figure out the optimum value of k? Let us decide it based on the error calculation for our train and validation set (after all, minimizing the error is our final goal!).
Have a look at the below graphs for training error and validation error for different values of k.
For a very low value of k (suppose k=1), the model overfits on the training data, which leads to a high error rate on the validation set. On the other hand, for a high value of k, the model performs poorly on both train and validation set. If you observe closely, the validation error curve reaches a minima at a value of k = 9. This value of k is the optimum value of the model (it will vary for different datasets). This curve is known as an ‘elbow curve‘ (because it has a shape like an elbow) and is usually used to determine the k value.
So these are some basic concepts of KNN for regression.
Happy learning !!!!!