Free (open access)
F Di Fiore
Many recent techniques try to evaluate the interestingness of patterns automatically extracted by data mining algorithms. In this paper we propose an interestingness measure able to be visually analyzed which looks at the concept of unexpectedness in a data driven perspective, addressing both objective and subjective issues. The proposed approach shifts the focus from patterns to the data underlying them, considered as the primary source of information: data can be more functionally visualized than any artificial yet compact representation of it. Starting from the well-known star schema model, we propose a new operational definition of patterns, based on the visual evaluation of the distributions of involved dimensions. Consequently we suggest a suitable scheme for evaluating the interestingness of the patterns based on the given representation: we show how to choose the best ordering of the categories of a dimension, strictly dependent on the measure involved. To cope with the subjective traits of the nature of interestingness, we propose a supervised learning approach based on the same visual driven perspective, able to customize interestingness with respect to users’ expectations. The last section of the paper confirms our claims by outlining effective results we have obtained from our system at work. 1 Introduction Whatever business managers have to administer, they have nowadays to deal with a huge amount of data, typically collected through past years of activity: the more detailed the information, the more abundant the data. The basic objective of data mining techniques is to extract useful patterns (i.e. actionable knowledge) from these large databases.