Feature Selection
Sometimes more is not better, especially in the case of machine learning if we give a lot of unnecessary features, then there is a good chance then the model can overfit with the features. Also, it will take more time to converge as the model has to learn which are the important features and which are not.
Feature Selection algorithms:
Depending on Data:
- Percent of missing value
- Remove the feature if most of them are missing
- Otherwise impute (refer to Handling Missing Data)
- Drop variables with zero Variance (no important information)
Depending on Redundancy
- Pairwise correlation (refer to Pearson Correlation)
- Multicollinearity to remove multiple co-related features
- Use Principal Component Analysis (PCA) to reduce feature using crossing among features
- good for introducing non-linearity
- bad for interpretability
- Use Cluster analysis to find out which features are related (refer to Hierarchical Clustering)
- good for if the dataset has Multicollinearity
Greedy:
- Co-relation with the target
- Forward Feature Selection
- Backward Feature Elimination
- Stepwise Selection
Embedding Method
- Random Forest Importance Features
- Feature Selection using Decision Tree
- L1 or Lasso Regression
- Elastic Net Regression
- Decision Tree
Pros:
- Improved model performance
- Reduced overfitting
- Increased interpretability
Cons:
- Increased computational complexity