WIT Press


Mining Association Rules With Negative Terms Using Candidate Pruning

Price

Free (open access)

Paper DOI

10.2495/DATA040151

Volume

33

Pages

10

Published

2004

Size

247 kb

Author(s)

T. Shintani & D. Hayashi

Abstract

In this paper, we discuss an association rule with negative terms that contains negative and affirmative conditions intermingled, such as \“80% of customers who buy A and B but do not buy X, also buy C and D”. An association rule with negative terms can provide higher confidence rules, that is, we can attain more valuable information. To find them, itemsets containing negative conditions must be checked. We proposed two candidate pruning methods, upper bound pruning and database partition pruning, which are suitable for handling these itemsets. Upper bound pruning detects itemsets that cannot generate rules satisfying userspecified minimum thresholds. Database partition pruning detects itemsets that do not appear in database. Through performance evaluations, we show that the proposed methods not only reduce candidate itemsets but also avoid finding useless frequent itemsets for rule derivation. Moreover, we show an example of rules obtained by applying the proposed methods to a real dataset that is the hospitalization data of the cardiovascular medicine of the University of Tokyo hospital. Keywords: association rule, negative term, candidate pruning, medical data. 1 Introduction Mining association rules within a large database is representative problem in data mining. Several effective algorithms have been proposed[1, 2], but only affirmative information has been taken into account. In order to apply association rule mining to more complicated applications, we must consider rules that contain negative conditions. In [3, 4], negative condition was made consideration in association rules. [3] introduced a negative association rule X ⇒ Y , such as \“60% of customers who buy A and B do not buy D”. Furthermore, the other forms of negative

Keywords

association rule, negative term, candidate pruning, medical data.