A Novel Approach for Ensemble Feature Selection Using Clustering with Automatic Threshold

Feature Selection (FS) is the core part of data processing pipeline. Use of ensemble in FS is a relatively new approach aiming at producing more diversity in feature dataset, which provides better performance as well as more robust and accurate result. An aggregation step combined the output of each FS method and generate the Single feature Subset. In this paper, a novel ensemble method for FS “EFSCAT” is proposed which rank all the features and then cluster the most related features. To reduce the size of ranking an automatic threshold in every ranker is being introduced. This added thresholding step will improve the computational efficiency because it cutoff low-ranking features which were initially ranked by Ranker. Mean-shift clustering is then use to combined the results of each ranker. The process of aggregation will become very time efficient. “EFSCAT” will make the classification more robust and stable.