A computational experience for automatic feature selection on big data frameworks
Free (open access)
Volume 11 (2016), Issue 3
168 - 177
Y. ORENES, A. RABASA, A. PÉREZ-MARTÍN, J.J. RODRÍGUEZ-SALA & J. SÁNCHEZ-SORIANO
The classification rule system is one of the predictive analytical techniques used in Big Data problems, where finding datasets with millions of rows but also with dozens of variables (attributes) is common. Classification rule systems consist of rule sets which have a so-called antecedent (variable or set of variables that can be numeric or nominal) and a consequent (target variable, provided nominal). If the antecedent variables are numerical, many generator algorithms of classification rules employ traditional methods of automatic feature selection, based on techniques already established in the scientific field, such as discriminant analysis or cluster analysis. In this paper, the authors propose the comparison of their own method of feature selection and classification, RBS (originally designed to manage only nominal variables) and classical methods of feature selection. After the formal definition of our own method, this paper presents the design of a computing experience that allows a qualitative and quantitative comparison of the adapted RBS and other methods for feature selection. Finally, optimal conditions of application of each method are discussed and future research areas in the field of automatic feature selection are identified.
big data, classification rule systems, feature selection