Selective Bayesian Classifier: Feature Selection For The Naive Bayesian Classifier Using Decision Trees
Free (open access)
C Ratanamahatana & D Gunopulos
It is known that Naive Bayesian classifier (NB) works very well on some domains, and poorly on some. The performance of NB suffers in domains that involve correlated features. C4.5 decision trees, on the other hand, typically perform better than the Naive Bayesian algorithm on such domains. This paper describes a Selective Bayesian classifier (SBC) that simply uses only those features that C4.5 would use in its decision tree when learning a small example of a training set, a combination of the two different natures of classifiers. Experiments conducted on eleven datasets indicate that SBC performs reliably better than NB on all domains, and SBC outperforms C4.5 on many datasets of which C4.5 outperform NB. SBC also can eliminate, on most cases, more than half of the original attributes, which can greatly reduce the size of the training and test data, as well as the running time. Further, the SBC algorithm typically learns faster than both C4.5 and NB, needing fewer training examples to reach high accuracy of classification. 1 Introduction Two of the most widely used and successful methods of classification are C4.5 decision trees  and Naive Bayesian learning (NB) . While C4.5 constructs decision trees by using features to try and split the training set into positive and negative examples until it achieves high accuracy on the training set, NB represents each class with a probabilistic summary, and finds the most likely class for each example it is asked to classify.