MachineX: Evaluation Metrics for a Regression ML Model

Reading Time: 3 minutes

In this blog post, we will quickly look at the various metrics to evaluate our regression models.

But first, let us briefly discuss one of the best-known model evaluation approach we use which is Train-Test or also known as Train-Validation split.

Train-Test Split: In this approach, we split the data into two parts known as Training set and Test set. The model is then trained and built on the training set. Then the test set is passed to the model for the prediction. Further, these predictions are then compared to already known value using various metrics to determine how accurate our model is. This technique provides a much better accuracy measure since our model is unaware of the test data and has no knowledge about the data. But still, this technique only takes one variation into the account. Advancement of this technique is K-Fold cross-validation. As the name suggests, in this method we try to divide data into train and test set in various folds. For example, if we have K=4, in the first split we will use the first 25% of data for the test set and rest 75% of data for train set. We will then evaluate the model using various metrics. Then in the next fold, we will repeat these steps by using the second 25% of the data for the test set. This way we will have four accuracy scores and then we will find the average score and consider it as our accuracy score of the model.

Evaluation Metrics

MAE: Mean absolute error is the average of the absolute difference between the actual and predicted values

MSE: Mean Squared error is somewhat similar to the MAE. This metric is used more often than MAE since it focuses more on large errors. It is because the squaring the error results in large terms for the large errors.

RMSE: Root mean squared error is the square root of the mean squared error.

RAE: Relative absolute error, also known as the residual sum of square, where y bar is a mean value of y, takes the total absolute error and normalizes it by dividing by the total absolute error.

RSE: Relative squared error is very similar to RAE as it is used for calculating R squared for a model. R squared is a metric for the accuracy of the model. It represents how close the data values are, to the fitted regression line. Higher value of R squared means that the model is more accurate.

These are various evaluation metrics which we use to evaluate regression models. We still have various other metrics such as accuracy, recall, precision etc, to evaluate various models which will cover in upcoming blogs.

Written by 

Rahul Khanna is a software consultant having 1+ years of experience. In past, Rahul has worked on Python where his main focus of work was to handle and analyze data using various libraries such as pandas, numpy etc. Rahul is currently working on reactive technologies like Scala, Akka and Spark along with Machine learning algorithms.