A Comparison Of Some Classification Techniques

P S S Coelho; N F F Ebecken

doi:10.2495/DATA020551

WIT Press

A Comparison Of Some Classification Techniques

Price

Free (open access)

Transaction

WIT Transactions on Information and Communication Technologies

Volume

Pages

Published

2002

Size

561 kb

Paper DOI

10.2495/DATA020551

WIT Press

Author(s)

P S S Coelho & N F F Ebecken

Abstract

The classification activity assigns labels, or classes, to differentiate object groups. In general, these labels are well known beforehand through objects already classified. In Data Mining tasks, the objects are records, i.e., they are described using a set of attributes. These attributes can have any nature (categorical or continuous). The objective is to establish models to characterize the classes of the records using its attributes (values, distribution, pattern, etc.). Many different techniques for the record classification task are available today. These techniques are differentiated by the heuristics they use. In this article a comparison is made of some of the most popular classification techniques. This includes Decision Trees, Bayesian Algorithms (Statistical Methods), and the Classification Based on Rule Induction, also Classification Based on Association Rules. To compare these techniques, the Predictive Accuracy Criteria was mainly used. The Speed, Robustness, Scalability and Interpretability Aspects are also argued, but they had not been quantified for a mathematical comparison. The classification models had been determined from two relational tables with real data. The first one of them is composite with data about meteorological conditions in the region of the International Airport of Rio de Janeiro. This table has 26482 records with 19 variables (one of them is the class label). The second one is about an insurance company, having 130143 registers with 63 independent variables (attributes) and one dependent variable (label of the class). These data tables were prepared earlier. The result of this comparison can be seen in some tables. 1 Introduction It can be considered that the activities of Data Mining are concentrated in development of models that represent some knowledge contained in the data

Keywords

Keep me updated

View Book

WIT Press, Ashurst Lodge, Ashurst, Southampton SO40 7AA, UK. Registered in England as a limited company No. 4741634

Connect with WIT Press: