A Rules Reduction Algorithm Based On Significance Measure
Free (open access)
63 - 72
R. Abadía, M. Almiñana, L. F. Escudero, A. Pérez-Martín, A. Rabasa & L. Santamaría
A lot of rules systems generated from decision trees (like CART, ID3, C4.5, etc) or from counting frequencies direct methods, usually provides non-significant or even contradictory rules. Most existing papers on the subject reach very important reductions over generated rules sets by searching and removing redundancies and conflicts and simplifying similarities between them. In this paper we propose an algorithm (RBS: Reduction Based on Significance) for allocating a significance value to each rule in the system. This significance value may be used by experts to point out which of these rules must be considered preferable and to understand the exact correlation degree between different rule attributes. The significance is calculated from support and confidence parameters. For each rule, if its support is over a minimum level and its confidence is into a critical interval, its significance ratio is calculated by the algorithm. Thus, the rules space is divided according to these critical boundaries which are calculated by an incremental method. Finally, the significance function is defined in each of these intervals. Like other rules reduction methods, our approach can also be applied to rules sets generated from decision trees or frequency counting algorithms, in an absolutely independent way and after the rules set was created. So, our RBS algorithm does not change the original accuracy of the rules. The proposed method has been executed over three different data sets: two of them belong to UCI (University of California, Irvine) standard repository and the third is a real irrigation data set provided by the users. The validity of our reduction approach on the later data set is supervised and contrasted by experts. The computational experience provided in this paper supplies rules sets more reduced, ordered and easily understandable than the original ones. Keywords: classification rules, reduction, significance measures, support, confidence, regions of significance.
classification rules, reduction, significance measures, support, confidence, regions of significance.