Data Mining Highly Multiple Time Series Of Astronomical Observations
Free (open access)
This is a case study of data mining a large data set of astronomical interest. Our first concern is the outliers apparently existing in the data set.We used a robust method to do curve fitting and identify outliers, and estimated the occurrence intensity of outliers. We find that the occurrence intensity of outliers varies considerably over time. Besides, we designed a test which led to rejection of the hypothesis that all observation series are independent of each other. Combining this fact with our estimation of the occurrence intensity of outliers we believe there are common factors transiently acting on many series of observations. Additionally, we analyse gaps in time series and summarise simple but possibly interesting characteristics of data from a methodological viewpoint of data mining. Keywords: data mining, highly multiple time series, loess, MACHO project, nonparametric curve fitting, outliers. 1 Introduction The MACHO Project is a collaboration between scientists at the Mt. Stromlo and Siding Spring Observatories, the Center for Particle Astrophysics at the Santa Barbara, San Diego, and Berkeley campuses of the University of California, and the Lawrence Livermore National Laboratory. The primary aim is to test the hypothesis that a significant fraction of the dark matter in the halo of the Milky Way is made up of objects like brown dwarfs or planets: these objects have come to be known as MACHOs, for MAssive Compact Halo Objects. The signature of these objects is the occasional amplification of the light from extragalactic stars by the gravitational lens effect. The amplification can be large, but events are extremely rare: it is necessary to monitor photometrically several million stars for a period of
data mining, highly multiple time series, loess, MACHO project, nonparametric curve fitting, outliers.