WIT Press

Generating Clusters’ Explanations In Just One Data Scan


Free (open access)

Paper DOI








636 kb


H A do Prado, P M Engel & A Haëndchen Filho


Knowledge discovery from unlabeled data comprises two main tasks: identification of \“natural groups” and analysis of these groups in order to interpret their meaning. These tasks are accomplished by unsupervised and supervised learning, respectively, and correspond to the phases of the discovery process described by Langley. The efforts of Knowledge Discovery from Databases (KDD) research field have addressed these two processes into two main dimensions: (1) scaling up the learning algorithms to very large databases and (2) improving the efficiency of the KDD process. In this paper we argue that the advances achieved in scaling up supervised and unsupervised learning algorithms allow us to combine these two processes in just one stream, providing extensional and intensional descriptions of unlabeled data. We propose a framework, called Orpheo, which enables the integration of any two unsupervised and supervised algorithms to compose the stream. A particular advantage of our approach is that, if the two algorithms are incremental, the system will build the model in just one data scan. This characteristic satisfies important desiderata for clustering in data mining posed by Bradley et al. The framework is instantiated using as building blocks two incremental neural