WIT Press


Fast Outlier Detection Using Rough Sets Theory

Price

Free (open access)

Paper DOI

10.2495/DATA080031

Volume

40

Pages

10

Page Range

25 - 34

Published

2008

Size

692 kb

Author(s)

F. Shaari, A. A. Bakar & A. R. Hamdan

Abstract

In many Knowledge Discovery applications, finding outliers is more interesting than finding inliers in a dataset. The perception of outliers is rare cases in dataset in which is being described as abnormal data in the information table. Outliers detections are applied in many important applications like fraud detection systems to uncover the suspicious objects which may have important knowledge hidden in the system. A new outlier detection technique based on Rough Sets Theory (RST) is hereby proposed. RSetOF is a new measure for the outlier factor based on RST. By employing this factor, a new formulation for detecting outlier is established. The outlyingness of outliers objects in a dataset using this measurement is identified. To detect outliers, two measurements which are the top n ratio and the coverage ratio are presented. Finding top n outliers from all objects allow searching of outliers from top ranked records based on the least outlier factor value. The capability in detecting outliers at top n number of outliers will indicate how fast the detection is. The efficiency of this technique by obtaining the coverage ratio value is then tested. The maximum percentage of coverage obtained shows the maximum number of outliers detected belonging to rare cases. A comparison is hence carried out to examine the performance of the RSetAlg with a selective outlier detection method, the Frequent Pattern method referred to as FindFPOF. Ten benchmark datasets for assessing the outlier detection technique are used for this purpose. The experimental result shows that the proposed technique is competitive and proven to be better in speed of detection than the other technique. The fast and efficient detection of outliers has proven its potential as a new outliers detection technique based on RST. Keywords: outlier detection, rare, deviate, exception, deviation, anomaly, infrequent, small, imbalance.

Keywords

outlier detection, rare, deviate, exception, deviation, anomaly,infrequent, small, imbalance.