MachineX: Simplifying Logistic Regression

Table of contents

Reading Time: 3 minutes

Logistic regression is one of the most popular machine learning algorithms for binary classification. This is because it is a simple algorithm that performs very well on a wide range of problems. It is used when you know that the data is linearly separable/classifiable and the outcome is Binary or Dichotomous but it can be extended when the dependent has more than 2 categories. It is used when the outcome is a discrete variable. Such as, trying to figure out who will win the election, whether a student will pass an exam or not, whether an email is a spam. This is commonly called as a classification problem because we are trying to determine which class the data set best fits.

Linear regression vs Logistic Regression

In linear regression, the outcome is continuous. It can have any one of an infinite number of possible values while In logistic regression, the outcome has only a limited number of possible values generally 0 or 1.

Linear regression needs to establish the linear relationship between dependent and independent variable whereas it is not necessary for logistic regression.

In the linear regression, the independent variable can be correlated with each other. On the contrary, in the logistic regression, the variable must not be correlated with each other.

Terminologies used for Logistic Regression

Probability- Probability is the measure of the likelihood that an event will occur.Probability is quantified as a number between 0 and 1.
Odds- Odds is the ratio of the probability of occurring of an event and probability of not occurring such as –

odds= P(Occurring)/P(Not occurring) where P= Probability

Odds Ratio- Odds ratio for a variable in logistic regression represents how the odds change with 1 unit increase in that variable holding all the other variables as constant. It can be defined as the ratio of two odds.
logit- In logistic regression, we need a function that can link independent variables or map the linear combination of variables that could result in any value from 0 to 1, that function is called logit.

ln(odds) = ln(p/1-p) = logit(p)

In logistic regression, we estimate an unknown probability for any given linear combination of independent variables.

Regression coefficients for logistic regression are calculated using maximum likelihood estimation (MLE).

So In above equation x is the independent variable and rest of them are regression coefficients. This equation is called Estimated Regression Equation.

Application of Logistic Regression

It can be applied anywhere the outcome is binary such as 0 or 1. Some of the applications are-

Predicting whether a student will be passed or not on the basis of hours of study or any relevant information.
Predicting the approval of loan on the basis of the credit score.
Predicting the failure of a firm.