For the numerical value I can perform feature selection anytime but when I have a categorical feature should I perform feature selection after encoding if I do then should I remove the less important feature from both train and test?
![]() |
Ask Question? |
Yes, when performing feature selection on categorical features, it is recommended to do so after encoding them. This is because many feature selection algorithms require numerical inputs, so the categorical features need to be encoded before they can be processed.
Regarding whether to remove the less important feature from both training and testing, it depends on the specific feature selection method used. If the feature selection method is based on the statistical properties of the data (such as correlation or mutual information), then it is advisable to remove the less important feature from both the train and test datasets to ensure that the model is trained and evaluated on the same set of features.
However, if the feature selection method is based on the performance of a machine learning model (such as recursive feature elimination), then the less important features should only be removed from the training set, and the same set of features should be used to evaluate the model on the test set. This is because removing features from the test set can result in a biased evaluation of the model's performance, as it will be evaluated on a different set of features than it was trained on.
In summary, it is generally recommended to perform feature selection after encoding categorical features and to remove less important features from both the train and test datasets, unless the feature selection method is based on the model's performance, in which case the features should only be removed from the training set.
0 Comments