WIT Press


A Comparison Of Traditional And Rough Set Approaches To Missing Attribute Values In Data Mining

Price

Free (open access)

Volume

42

Pages

9

Page Range

155 - 163

Published

2009

Size

150 kb

Paper DOI

10.2495/DATA090161

Copyright

WIT Press

Author(s)

J. W. Grzymala-Busse

Abstract

Real-life data sets are often incomplete, i.e., some attribute values are missing. In this paper we compare traditional, frequently used methods of handling missing attribute values, which are based on preprocessing, with another class of methods dealing with missing attribute values in which rule induction is performed directly on incomplete data sets, i.e., handling missing attribute values and rule induction are conducted concurrently. In our experiments four traditional methods of handling missing attribute values were applied: Most Common Value, Concept Most Common Value, Closest Fit, and Concept Closest Fit. Both Closest Fit methods were enhanced by a rough set approach to missing attribute values. On the same typical data sets experiments were conducted using three different rough-set interpretations of missing attribute values: lost values, \“do not care” conditions and attribute-concept values using the MLEM2 rule induction algorithm, based on rough set theory. The best method is the Concept Closest Fit enhanced by interpreting remaining missing attribute values as lost values. Keywords: missing attribute values, incomplete data sets, concept approximations, LERS data mining system, MLEM2 algorithm.

Keywords

missing attribute values, incomplete data sets, concept approximations, LERS data mining system, MLEM2 algorithm.