Mining Geospatial Data In A Transductive Setting
Free (open access)
A. Appice, N. Barile, M. Ceci, D. Malerba & R. P. Singh
Many organizations collect large amounts of spatially referenced data. Spatial Data Mining targets the discovery of interesting, implicit knowledge from such data. The specific classification task has been extensively investigated in the classical inductive setting, where only labeled examples are used to generate a classifier, discarding a large amount of information potentially conveyed by the unlabeled instances to be classified. In this work spatial classification is based on transduction, an inference mechanism \“from particular to particular” which uses both labeled and unlabeled data to build a classifier whose main goal is that of classifying (only) unlabeled data as accurately as possible. The proposed method, named TRANSC, employs a principled probabilistic classification in multi-relational data mining to face the challenges posed by handling spatial data. The predictive accuracy of TRANSC has been evaluated on two real-world spatial datasets. 1 Introduction The expanding market for spatial databases and Geographic Information System (GIS) technologies is driven by the pressure from the public sector, environmental agencies and industries to provide innovative solutions to data applications that involve spatial data, that is, a collection of (spatial) objects organized in thematic layers (e.g., roads, rivers). A thematic layer is characterized by a geometrical representation as well as several non-spatial attributes, called thematic attributes. A GIS provides the set of functionalities to store, retrieve and manage both geometrical representation and thematic attributes stored in a spatial database. Anyway, the range of GIS applications can be extended by adding spatial data mining facilities to the systems  to extract implicit knowledge from georeferenced data.