WIT Press


An Incremental Multi-Centroid, Multi-Run Sampling Scheme For K-medoids-based Algorithms

Price

Free (open access)

Volume

28

Pages

Published

2002

Size

695 kb

Paper DOI

10.2495/DATA020531

Copyright

WIT Press

Author(s)

S-C Chu, J F Roddick & J-S Pan

Abstract

Data clustering has become an important task for discovering significant patterns and characteristics in large spatial databases. The Mufti-Centroid, Multi-Run Sampling Scheme (MCMRS) has been shown to be effective in improving the k-medoids-based clustering algorithms in our previous work. In this paper, a more advanced sampling scheme termed Incremental Multi-Centrozd, Multi-Run Sampling Scheme (IMCMRS) is proposed for k-medoids-based clustering algorithms. Experimental results demonstrate the proposed scheme can not only reduce by more than 80% computation time but also reduce the average distance per object compared with CLARA and CLARANS. IMCMRS is also superior to MCMRS. 1 Introduction Clustering is a useful practice of classification imposed over a finite set of objects. The goal of clustering is to group sets of objects into classes such that single groups have similar characteristics, while dissimilar objects are in separate groups. Various existing clustering algorithms have been proposed and designed to fit various formats and constraints of application including k-means [16], k-medoids [11], BIRCH [18], CURE [8], CHAMELEON [10], DBSCAN [4],

Keywords