WIT Press

Ordering Attributes For Missing Values Prediction And Data Classification


Free (open access)

Paper DOI








478 kb


E R Hruschka Jr & N F F Ebecken


This work shows the application of the bayesian K2 learning algorithm as a data classifier and preprocessor having an attribute order searcher to improve the results. One of the aspects that have influence on the K2 performance is the initial order of the attributes in the data set, however, in most cases, this algorithm is applied without giving special attention to this preorder. The present work performs an empirical method to select an appropriate attribute order, before applying the learning algorithm (K2). Afterwards, it does the data preparation and classification tasks. In order to analyze the results, in a first step, the data classification is done without considering the initial order of the attributes. Thereafter it seeks for a good variable order, and having the sequence of the attributes, the classification is performed again. Once these results are obtained, the same algorithm is used to substitute missing values in the learning dataset in order to verify how the process works in this kind of task. The dataset used came from the standard classification problems databases from UCI Machine Learning Repository. The results are empirically compared taking into consideration the mean and standard deviation 1. Introduction The aim of the present work is to show how the definition of a good attribute preorder can have influence on a classification task (with and without missing values) results. To achieve such objective a preorder searcher is implemented, and it prepares the data to a bayesian classifier algorithm that learns from such data and classifies the objects. A bayesian classifier uses a bayesian network as a knowledge base [1]. This network is a directed acyclic graph (DAG) in which the nodes represent the