A Kernel Density Smoothing Method For Determining An Optimal Number Of Clusters In Continuous Data
Free (open access)
165 - 178
J. Bugrien, K. Mwitondi & F. Shuweihdi
While data clustering algorithms are becoming increasingly popular across scientific, industrial and social data mining applications, model complexity remains a major challenge. Most clustering algorithms do not incorporate a mechanism for finding an optimal scale parameter that corresponds to an appropriate number of clusters. We propose , a kernel-density smoothing-based approach to data clustering. Its main ideas derive from two unsupervised clustering approaches – kernel density estimation (KDE) and scale-spacing clustering (SSC). The novel method determines the optimal number of clusters by first finding dense regions in data before separating them based on data-dependent parameter estimates. The optimal number of clusters is determined from different levels of smoothing after the inherent number of arbitrary shape clusters has been detected without a priori information. We demonstrate the applicability of the proposed method under both nested and non-nested hierarchical clustering methodologies. Simulated and real data results are presented to validate the performance of the method, with repeated runs showing high accuracy and reliability. Keywords: BASINS -1, data clustering, data mining, kernel density estimation, local optimization, scale-space clustering, supervised learning, unsupervised learning.
BASINS -1, data clustering, data mining, kernel density estimation, local optimization, scale-space clustering, supervised learning, unsupervised learning.