Feature selection is a way of selecting the subset of the most relevant features from the original features set by removing the redundant, irrelevant, or noisy features.
Features are the input variables that we provide to our models. Each column in our dataset constitutes a feature. To train an optimal model, we need to make sure that we use only the essential features. If we have too many features, the model can capture unimportant patterns and learn from noise.
Even the saying “Sometimes less is better” goes as well for the machine learning model. Hence, feature selection is one of the important steps while building a machine learning model. Its goal is to find the best possible set of features for building a machine learning model.
We can define feature Selection as ”It is a process of automatically or manually selecting the subset of most appropriate and relevant features to be used in model building.” Feature selection is performed by either including the important features or excluding the irrelevant features in the dataset without changing them.
Need for Feature Selection?
Machine learning models follow a simple rule: whatever goes in, comes out. If we put garbage into our model. We can expect the output to be garbage too. In this case, garbage refers to noise in our data.
For example, Consider a table which contains information on the cars.
Figure 1: Old cars dataset
In the above table, we can see the model of the car, the year of manufacture. And the miles it has traveled are pretty important to find out if the car is old enough to be crushed or not. However, the name of the previous owner of the car does not decide if the car should be crushed or not. Further, it can confuse the algorithm into finding patterns between names and the other features. Hence we can drop the column.
Figure 2: Dropping columns for feature selection
Below are some benefits of using feature selection in machine learning:
- It helps in avoiding the curse of dimensionality.
- Easily interpreted by the researchers.
- It reduces the training time.
- It reduces overfitting hence enhance the generalization.
Feature Selection Techniques
There are mainly two types of Feature Selection techniques, which are:
Supervised Feature Selection technique
We can use this technique for the labeled datasets. Moreover, in this technique, we can consider the target variable.
Unsupervised Feature Selection technique
We can this technique for the unlabelled datasets. Moreover, in this technique, we can ignore the target variable.
There are mainly three techniques under supervised feature Selection:
1. Wrapper Methods
In wrapper methodology, the selection of features is done by considering it as a search problem. Wrapper methodology has different combinations made, evaluated, and compared with other combinations. In addition, it trains the algorithm by using the subset of features iteratively.
On the basis of the output of the model, features are being added or subtracted. Further, with this feature set, the model will be trained again.
Some techniques of wrapper methods are:
- Forward selection – Forward selection is an iterative process, which begins with an empty set of features. After each iteration, it keeps adding on a feature. And evaluates the performance to check whether it is improving the performance or not. The process continues until the addition of a new variable/feature does not improve the performance of the model.
- Backward elimination – Backward elimination is also an iterative approach, but it is the opposite of forward selection. This technique begins the process by considering all the features and removes the least significant feature. This elimination process continues until removing the features does not improve the performance of the model.
- Exhaustive Feature Selection- Exhaustive feature selection is one of the best feature selection methods. It evaluates each feature set as brute-force. It means this method tries & make each possible combination of features and return the best performing feature set.
- Recursive Feature Elimination-
Recursive feature elimination is a recursive greedy optimization approach. In which features can be select by recursively taking a smaller and smaller subset of features.
2. Filter Methods
In Filter Method, features are selected on the basis of statistics measures. This method does not depend on the learning algorithm and chooses the features as a pre-processing step. The filter method filters out the irrelevant feature and redundant columns from the model by using different metrics through ranking. In addition, the advantage of using filter methods is that it needs low computational time and does not overfit the data.
Some common techniques of Filter methods are as follows:
- Information Gain
- Chi-square Test
- Fisher’s Score
- Missing Value Ratio
Information Gain: Information gain determines the reduction in entropy while transforming the dataset. For feature selection, we can use this technique by calculating the information gain of each variable with respect to the target variable.
Chi-square Test: Chi-square test is a technique to determine the relationship between the categorical variables. Moreover, the chi-square value is calculated between each feature and the target variable as a result, the desired number of features with the best chi-square value is selected.
For feature selection, Fisher’s score is one of the popular techniques. It returns the rank of the variable on the fisher’s criteria in descending order. After that, we can select the variables with a large fisher’s score.
Missing Value Ratio:
The value of the missing value ratio can be used for evaluating the feature set against the threshold value. In addition, the formula for obtaining the missing value ratio is the number of missing values in each column divided by the total number of observations.
3. Embedded Methods
Embedded methods combined the advantages of both filter and wrapper methods by considering the interaction of features along with low computational cost. These are fast processing methods similar to the filter method but more accurate than the filter method.
These methods are also iterative, which evaluates each iteration, and optimally finds the most important features that contribute the most to training in a particular iteration. Similarly, some techniques of embedded methods are:
- Regularization– Regularization adds a penalty term to different parameters of the machine learning model for avoiding overfitting in the model. Hence, it shrinks some coefficients to zero. In addition, those features with zero coefficients can be remove from the dataset. The types of regularization techniques are L1 Regularization (Lasso Regularization) or Elastic Nets (L1 and L2 regularization).
- Random Forest Importance – Different tree-based methods of feature selection help us with feature importance to provide a way of selecting features. In addition, feature importance specifies which feature has more importance in model building or has a great impact on the target variable. Random Forest is such a tree-based method, which is a type of bagging algorithm that aggregates a different number of decision trees. It automatically ranks the nodes by their performance or decrease in the impurity (Gini impurity) over all the trees.
In conclusion, in this blog, we learned why we need features selection techniques in machine learning. Happy Learning!