Use Of Graph Theory For Data Mining In Public Health
Free (open access)
P Bath, C Craigs, R Maheswaran, J Raymond & P Willett
Data mining problems are common in public health, for example for identifying disease clusters and multidimensional patterns within large databases, e.g., socioeconomic differentials in health. Although numerous data mining methods have been developed, currently available methods are not designed to handle complex pattern searching queries and no satisfactory methods are available for this purpose. The aim of the study reported here was to test graph-theoretical methods for data mining in public health databases to identify areas of high deprivation that are surrounded by affluent areas and deprived areas surrounded by deprived areas. Graph-theory (using the maximum common subgraph isomorphism (mcs) method) was used to search a database containing information on the 10920 enumeration districts (EDs) for the Trent Region of England. Each ED was allocated to a deprivation quintile based on the Townsend Deprivation Score. These mcs program was used to identify deprived EDs that are adjacent to deprived EDs and deprived EDs that are adjacent to affluent EDs. The mcs program identified 1528 deprived EDs adjacent to at least two deprived EDs, 1181 deprived EDs adjacent to at least three deprived EDs, 802 deprived EDs adjacent to at least four deprived EDs, and 505 deprived EDs adjacent to at least five deprived EDs. The program successfully identified 147 deprived EDs adjacent to at least two affluent EDs, 54 deprived EDs adjacent to at least three affluent EDs, 14 deprived EDs adjacent to at least four affluent EDs, and six deprived EDs adjacent to at least five affluent EDs. The retrieved EDs were then used for hypothesis testing using statistical methods. The study demonstrates the potential of graph theoretical techniques for data mining in public health databases.