WIT Press


An Efficient Bayesian Network Approach For Discovering Interesting Patterns

Price

Free (open access)

Paper DOI

10.2495/DATA060111

Volume

37

Pages

11

Published

2006

Size

437 kb

Author(s)

R. Malhas & Z. Al Aghbari

Abstract

The main problem faced by all association rule/pattern mining algorithms is their production of a large number of rules which incurred a secondary mining problem; namely, mining interesting association rules/patterns. The problem is compounded by the fact that ‘common knowledge’ discovered rules are not interesting, but they are usually strong rules with high support and confidence levels – the classical measures. In this paper, we present an efficient algorithm for discovering interesting (unexpected) patterns based on background knowledge, represented by a Bayesian network. A pattern/rule is unexpected if it is ‘surprising’ to the user. The algorithm profiles a pattern as interesting (unexpected), if the absolute difference between its support estimated from the dataset and the Bayesian network exceeds a user specified threshold (ε ). Itemsets with the highest diverging supports are considered the most interesting. The efficiency of the Java implementation of the algorithm is verified experimentally. Keywords: interesting patterns, association rules, frequent itemsets, Bayesian network, background knowledge. 1 Introduction Since the inception of the classical Apriori algorithm [1] for mining association rules, development of interestingness measures has been a vigilant area of research to mine interesting patterns out of a sheer volume of obvious and irrelevant rules. The problem is compounded since obvious ‘common knowledge’ discovered rules are not interesting, but they are usually strong rules with high support and confidence levels - the classical measures in [1]. In this paper, we present an efficient algorithm that discovers interesting/unexpected patterns based on background knowledge, represented by

Keywords

interesting patterns, association rules, frequent itemsets, Bayesian network, background knowledge.