WIT Press

Data Mining And Population Genetics Of Birth Defects: Preliminary Investigation


Free (open access)

Paper DOI









368 kb


B. Little


Population level inbreeding can be estimated from mating patterns in marriage records. Birth defect registries separately collect information on the frequency and types of birth defects. The U.S. national census provides population structure data to the county or postal code levels in another separate database. State health departments maintain vital statistics information on marriage, reproduction, birth defects, and mortality in yet other databases. More than 6 million marriages in Texas were analysed to estimate inbreeding at county, regional, and state levels. Three types of birth defects were analysed: (1) one known to be associated with an autosomal recessive inheritance pattern (Ventricular Septal Defect – VSD); (2) one of unknown aetiology, speculated to be associated with a very rare autosomal recessive gene (Ebstein’s Anomaly), and (3) one known to not be related to parental consanguinity (Fetal Alcohol Syndrome - FAS). A significant relationship between estimated local consanguinity and VSD was found. Ebstein’s Anomaly, the birth defect of unknown aetiology but suspected to be recessively inherited, showed a strong relationship to estimated population level of inbreeding, suggesting a major recessive gene influence. In the case of a birth defect known to caused by environmental rather than genetic factors (FAS), no relation to estimated inbreeding was found. In conclusion, data mining population genetic data revealed patterns of birth defects in very large databases (VLDBs) when merged into a data structure suited to data mining. In this instance, a viable hypothesis was derived for the cause of an extremely rare birth defect of unknown aetiology. Hence, data mining of population genetic VLDBs can yield new information that may be useful to guide genomic and clinical research directions.