Cluster Discovery In Spatial Data Mining: A Variable Resolution Approach
Free (open access)
A J Brimicombe
Spatial data mining seeks to discover meaningful patterns from data where a key dimension of the data is geographical location. This spatial dimension becomes important when data either refer to specific locations and/or have significant spatial dependence and which needs to be taken into consideration if meaningful patterns are to emerge. For point data there are two main groups of approaches. One stems from traditional statistical techniques such as k-means clustering in which every point is assigned to a spatial grouping and results in a spatial segmentation. The segmentation has k sub-regions, is usually space filling and non-overlapping (i.e. a tessellation) in which all points fall within a spatial segment. The difficulty with this approach is in defining k centroid locations at the outset of any data mining. The other broad approach searches for ‘hotspots’ which can be loosely defined as a localised excess of some incidence rate. In this approach not all points are necessarily assigned to clusters. It is the mainstay of those approaches which seek to identify any significantly elevated risk above what might be expected from an at-risk background population. Definition of the population at risk is clearly critical and in some data mining applications is not possible at the outset. This paper presents a novel variable resolution approach to cluster discovery which acts in the first instance to define spatial concentrations in the absence of population at risk. The cluster centroids are then used to establish initial centroids for techniques such as k-means clustering and arrive at a segmentation on the basis of point attributes. The variable resolution technique can thus be viewed as a bridge between the two broad approaches towards knowledge discovery in mining point data sets. The technique is equally applicable to the mining of business, crime, health and environmental data. A business-oriented case study is presented here.