Grid-based Data Mining For Market Basket Analysis In The Retail Sector
Free (open access)
R. P. Singh, A. Turi & D. Malerba
Recent advances in computing, communications, and digital storage technologies, together with the development of high-throughput data-acquisition technologies, have made it possible to gather and store incredible volumes of data. The warehouses of international retailers (such as Wal-Mart) are typically multi-terabyte databases that contain information about retail transactions by customers all over the world. The emergence of these large data sets creates a growing need for analyzing them across geographical lines using distributed and parallel systems like the Grid infrastructure, thereby unlocking the intelligence hidden deep within these geographically distributed databases. Market basket analysis is a method for discovering consumer purchasing patterns by extracting associations or co-occurrences from the stores’ transaction database. This is a typical association rule mining task where an Apriori algorithm is widely adopted to find out the large item-set. But since the traditional sequential Apriori algorithm can no longer serve the purpose due to the huge amount of data, the strategy for a parallel and distributed association rule mining algorithm is outlined in this paper. Keywords: grid, data mining, market basket analysis, retail sector. 1 Introduction The data that businesses collect about their customers is one of their greatest assets. Buried within this vast amount of data is valuable information that could make a significant difference to the way in which any business organization run their business, interact with their current and prospective customers and gain the competitive edge on their competitors. Market basket analysis applies
grid, data mining, market basket analysis, retail sector.