A Comparison Of Traditional And Rough Set
Approaches To Missing Attribute Values In
Data Mining

J. W. Grzymala-Busse

doi:10.2495/DATA090161

WIT Press

A Comparison Of Traditional And Rough Set Approaches To Missing Attribute Values In Data Mining

Price

Free (open access)

Transaction

WIT Transactions on Information and Communication Technologies

Volume

Pages

Page Range

155 - 163

Published

2009

Size

150 kb

Paper DOI

10.2495/DATA090161

WIT Press

Author(s)

J. W. Grzymala-Busse

Abstract

Real-life data sets are often incomplete, i.e., some attribute values are missing. In this paper we compare traditional, frequently used methods of handling missing attribute values, which are based on preprocessing, with another class of methods dealing with missing attribute values in which rule induction is performed directly on incomplete data sets, i.e., handling missing attribute values and rule induction are conducted concurrently. In our experiments four traditional methods of handling missing attribute values were applied: Most Common Value, Concept Most Common Value, Closest Fit, and Concept Closest Fit. Both Closest Fit methods were enhanced by a rough set approach to missing attribute values. On the same typical data sets experiments were conducted using three different rough-set interpretations of missing attribute values: lost values, \“do not care” conditions and attribute-concept values using the MLEM2 rule induction algorithm, based on rough set theory. The best method is the Concept Closest Fit enhanced by interpreting remaining missing attribute values as lost values. Keywords: missing attribute values, incomplete data sets, concept approximations, LERS data mining system, MLEM2 algorithm.

Keywords

missing attribute values, incomplete data sets, concept approximations, LERS data mining system, MLEM2 algorithm.

Keep me updated

View Book

WIT Press, Ashurst Lodge, Ashurst, Southampton SO40 7AA, UK. Registered in England as a limited company No. 4741634

Connect with WIT Press: