Multivariate Interdependent Discretization In Discovering The Best Correlated Attribute
Free (open access)
S. Chao & Y. P. Li
The decision tree is one of the most widely used and practical methods in data mining. However, many discretization algorithms developed in this field focus on the univariate only, which discretize continuous-valued attributes independently, without considering the interdependent relationship between other attributes, at most taking the class attribute into account. Such univariate discretization is inadequate to handle the critical problems especially owned in the medical domain. In this paper, we propose a new multivariate discretization method called Multivariate Interdependent Discretization for Continuous Attributes – MIDCA. This method incorporates the normalized relief and information measures to look for the best correlated attribute with respect to each continuous-valued attribute being discretized, and using the discovered best correlated attribute as the interdependent attribute to carry out the multivariate discretization. We believe that a good multivariate discretization scheme for continuous-valued attributes should rely highly on their perfect correlated attributes respectively. Among an attribute space, each attribute should have at least one most relevant attribute that may be different from others. Our novel multivariate discretization algorithm can minimize the uncertainty between the interdependent attribute and the continuous-valued attribute being discretized and at the same time maximize their correlation. Such a method can be used as a pre-processing step for the learning algorithms. The empirical results demonstrate a comparison of performance between MIDCA and various discretization methods for two decision tree algorithms ID3 and C4.5 on twelve real-life datasets from UCI repository. Keywords: multivariate discretization, interdependent feature, correlated attribute, data mining, machine learning.
multivariate discretization, interdependent feature, correlated attribute, data mining, machine learning.