When we think about Regression in Machine Learning, what usually comes in mind are these two techniques: Linear and Logistic regressions. These forms of Regression are considered most used and that’s why they became the most popular.

But the truth is, there are many other Regression techniques and all have their significant use in Machine Learning depending on the situation.

So in this blog we are going discuss about Regression Analysis, its frequently used types and their implementation used in SMILE.

What is Regression Analysis:

Regression Analysis is a type of predictive modeling technique that is used for estimating relationships between one dependent variable (called target) and one or more than one independent variables (called predictors). Here, unlike the Classification technique, output variable takes continuous values in regression analysis.

The analysis includes understanding how typical the value of target changes, when one of the predictors are varied while keeping the values of other predictors fixed.

So the two main benefits that Regression Analysis provides are:

- it provides the
**relevant****relationships**between target and predictors. - It provides the
**strength of impact**of multiple predictors on a target.

Types of Regression: There are various types of Regression techniques but mostly all the techniques are based on the following three metrics:

- Number of Predictors.
- Type of Target.
- Shape of Regression Line.

Based on these metrics, following are the most frequently used techniques:

- Linear Regression:It is one of the most widely used technique. In this technique, the target is continuous, predictors can be continuous or discrete and nature of regression line is linear.
- Logistic Regression:

It is used to find the probability of an event (Success or Failure). We should use this technique when target is in binary form, i.e, 0/1, true/false etc. - Polynomial Regression:

it is used when the power of predictors is more than 1. - Ridge Regression:

Whenever there is a need to alleviate multicollinearity among the predictors. When highly correlated predictors are there, the regression coefficient of any one predictor depend on which other predictors are included in the model, and which ones are excluded.

Ridge regression adds a small bias factor to the variables in order to curb this problem. - Lasso Regression:

It’s aim is similar to Ridge Regression, but Ridge Regression can’t zero out regression coefficients; thus, you either end up including all the coefficients in the model, or none of them. This is achieved by using absolute values in the penalty function, instead of squares. Hence in contrast to Ridge, LASSO does both parameter shrinkage and variable selection automatically. it is capable of reducing the variability and improving the accuracy of linear regression models. - Elastic Net Regression:

it is hybrid of lasso and ridge regression both. Elastic Net is a regularized regression method that linearly combines L1 and L2 penalties of the Lasso and Ridge methods. Elastic-net is useful when there are multiple predictors which are highly correlated.

Fascinating isn’t it? I bet a lot of you want to implement it. Scala lovers, this is for you.

So smile because SMILE is here. Smile’s regression algorithms are in the package **smile.regression** and all algorithms implement the interface **Regression** that has a single method **predict** to apply the model to an instance.

Now let’s talk Scala.

SBT dependency to be added:

libraryDependencies += "com.github.haifengl" % "smile-scala_2.12" % "1.3.1"

There is a trait **smile.regression.Operators** that provides methods for all the type of regression techniques. Some of the methods are:

def ols(x: Array[Array[Double]], y: Array[Double], method: String = "qr"): OLS

def ridge(x: Array[Array[Double]], y: Array[Double], lambda: Double): RidgeRegression

def lasso(x: Array[Array[Double]], y: Array[Double], lambda: Double, tol: Double = 1E-3, maxIter: Int = 5000): LASSO

It’s really simple to use these methods. Here is a code snippet to showcase the ease of **SMILE**.

import smile.regression.Operators object SmileExample extends App with Operators { val x = Array( Array(234.289, 235.6, 159.0, 107.608, 1947, 60.323), Array(259.426, 232.5, 145.6, 108.632, 1948, 61.122), Array(258.054, 368.2, 161.6, 109.773, 1949, 60.171), Array(284.599, 335.1, 165.0, 110.929, 1950, 61.187), Array(328.975, 209.9, 309.9, 112.075, 1951, 63.221), Array(346.999, 193.2, 359.4, 113.270, 1952, 63.639), Array(365.385, 187.0, 354.7, 115.094, 1953, 64.989), Array(363.112, 357.8, 335.0, 116.219, 1954, 63.761), Array(397.469, 290.4, 304.8, 117.388, 1955, 66.019), Array(419.180, 282.2, 285.7, 118.734, 1956, 67.857), Array(442.769, 293.6, 279.8, 120.445, 1957, 68.169), Array(444.546, 468.1, 263.7, 121.950, 1958, 66.513), Array(482.704, 381.3, 255.2, 123.366, 1959, 68.655), Array(502.601, 393.1, 251.4, 125.368, 1960, 69.564), Array(518.173, 480.6, 257.2, 127.852, 1961, 69.331), Array(554.894, 400.7, 282.7, 130.081, 1962, 70.551)) val y = Array(83.0, 88.5, 88.2, 89.5, 96.2, 98.1, 99.0, 100.0, 101.2, 104.6, 108.4, 110.8, 112.6, 114.2, 115.7, 116.9) val maxIterations = 1000 val tolerance =1E-3 println(ridge(x, y, 0.0057)) println(lasso(x, y, 0.0057, tolerance, maxIterations)) }

Here,

- x is the explanatory variables
- y is response values
- 0.0057 is the regularization parameter
- tolerance is tolerance for stopping iterations
- maxIterations is maximum number of iterations.

So here we finish the dive into the regression technique for machine learning using Scala. I hope this would encourage you and me both to explore more into the use of SMILE library for Machine Learning in Scala. In our next blog we will be diving deep into Regression Trees.

References: https://haifengl.github.io/smile/