A Bayesian Approach For Supervised
Discretization

M. Boullé

doi:10.2495/DATA040191

WIT Press

A Bayesian Approach For Supervised Discretization

Price

Free (open access)

Transaction

WIT Transactions on Information and Communication Technologies

Volume

Pages

Published

2004

Size

281 kb

Paper DOI

10.2495/DATA040191

WIT Press

Author(s)

M. Boullé

Abstract

In supervised machine learning, some algorithms are restricted to discrete data and thus need to discretize continuous attributes. In this paper, we present a new discretization method called MODL, based on a Bayesian approach. The MODL method relies on a model space of discretizations and on a prior distribution defined on this model space. This allows the setting up of an evaluation criterion of discretization, which is minimal for the most probable discretization given the data, i.e. the Bayes optimal discretization. We compare this approach with the MDL approach and statistical approaches used in other discretization methods, from a theoretical and experimental point of view. Extensive experiments show that the MODL method builds high quality discretizations. Keywords: supervised learning, data preparation, discretization, Bayesianism. 1 Introduction While real data often comes in mixed format, discrete and continuous, many induction algorithms rely on discrete attributes and need to discretize continuous attributes, i.e. to slice their domain into a finite number of intervals. More generally, using discretization to preprocess continuous attribute often provides many advantages. Discrete values are generally more understandable than continuous values both for users and experts. Many classification algorithms are more accurate and run faster when discretization is used. Discretization of continuous attributes is a problem that has been studied extensively in the past [6, 7, 9, 12, 16]. For example, decision tree algorithms exploit a discretization method to handle continuous attributes. C4.5 [13] uses the information gain based on Shannon entropy. CART [5] applies the Gini

Keywords

supervised learning, data preparation, discretization, Bayesianism.

Keep me updated

View Book

WIT Press, Ashurst Lodge, Ashurst, Southampton SO40 7AA, UK. Registered in England as a limited company No. 4741634

Connect with WIT Press: