CC4.5: Cost-sensitive Decision Tree Pruning
Free (open access)
J. Cai, J. Durkin & Q. Cai
There are many methods to prune decision trees, but the idea of cost-sensitive pruning has received much less investigation even though additional flexibility and increased performance can be obtained from this method. In this paper, we introduce a cost-sensitive decision tree pruning algorithm called CC4.5 based on the C4.5 algorithm. This algorithm uses the same method as C4.5 to construct the original decision tree, but the pruning methods in CC4.5 are different from that in C4.5. CC4.5 includes three cost-sensitive pruning methods to deal with misclassification cost in the decision tree. Unlike many other pruning algorithms, CC4.5 uses intelligent inexact classification to consider both error and cost when pruning. Moreover, experiments show that CC4.5 results in improved decision trees with respect to the cost and its comprehensibility and accuracy are also satisfactory. Keywords: decision tree pruning, cost-sensitive, C4.5, CC4.5, intelligent inexact classification. 1 Introduction Decision tree technology has been proven to be a valuable way of capturing human decision making within a computer. But it often suffers the disadvantage of developing very large trees and making them incomprehensive to experts . To solve this problem, researchers in the field have much interest in tree pruning [1, 2, 3]. Tree pruning methods transform a large tree into a small tree and make it easily understood. But one main problem for many traditional decision tree pruning methods is that when we prune a decision tree, we always assume that all mis-
decision tree pruning, cost-sensitive, C4.5, CC4.5, intelligent inexact classification.