Time Series Data Analysis And Pre-process On Large Databases
Free (open access)
G Guo, H Wang & D Bell
In this paper we introduce a novel classification algorithm called MCC (Minimal Cover Classification), which works well for numerical data and categorical data. Given a new data tuple, it provides values for each class that measures the likelihood of the tuple belonging to that class. We then apply the MCC algorithm on real stock market data to predict the ‘upward’ or ‘downward’ trend of k-days stock returns. To improve the prediction accuracy we use the discrete Fourier transform and its inverse transform to filter noise whilst preserving the trend of global movement of time series in the time domain. The experimental result shows that the MCC algorithm is comparable to C4.5. Using MCC as a mining algorithm to predict the ‘upward’ or ‘downward’ trend of k-day stock returns, the average hit rate on pre-processed data is 20.55%A higher than that on the original data. This means that the prediction accuracy has been remarkably improved by means of the proposed MCC algorithm on noise filtered time series. 1 Introduction Time series account for a large amount of the data stored in databases. A common task with a time series database is to look for an occurrence of a particular pattern within a longer sequence. Such queries have obvious applications in many fields, such as identifying patterns associated with growth in stock prices or identifying non-obvious relationships between two time series of weather data, or detecting anomalies in an online robot monitoring system.