WIT Press


Generating Clusters’ Explanations In Just One Data Scan

Price

Free (open access)

Volume

28

Pages

Published

2002

Size

636 kb

Paper DOI

10.2495/DATA020111

Copyright

WIT Press

Author(s)

H A do Prado, P M Engel & A Haëndchen Filho

Abstract

Knowledge discovery from unlabeled data comprises two main tasks: identification of \“natural groups” and analysis of these groups in order to interpret their meaning. These tasks are accomplished by unsupervised and supervised learning, respectively, and correspond to the phases of the discovery process described by Langley. The efforts of Knowledge Discovery from Databases (KDD) research field have addressed these two processes into two main dimensions: (1) scaling up the learning algorithms to very large databases and (2) improving the efficiency of the KDD process. In this paper we argue that the advances achieved in scaling up supervised and unsupervised learning algorithms allow us to combine these two processes in just one stream, providing extensional and intensional descriptions of unlabeled data. We propose a framework, called Orpheo, which enables the integration of any two unsupervised and supervised algorithms to compose the stream. A particular advantage of our approach is that, if the two algorithms are incremental, the system will build the model in just one data scan. This characteristic satisfies important desiderata for clustering in data mining posed by Bradley et al. The framework is instantiated using as building blocks two incremental neural

Keywords