Naive Rule Induction For Text Classification
Based On Key-phrases

N. N. Karanikolas; C. Skourlas

doi:10.2495/DATA050181

WIT Press

Naive Rule Induction For Text Classification Based On Key-phrases

Price

Free (open access)

Transaction

WIT Transactions on Information and Communication Technologies

Volume

Pages

Published

2005

Size

361 kb

Paper DOI

10.2495/DATA050181

WIT Press

Author(s)

N. N. Karanikolas & C. Skourlas

Abstract

In this paper we focus on the induction of naive rules for classifying text documents. An algorithm is briefly described for the creation of key-phrases from a given set of documents and these key-phrases are organized and used as features for the automatic classification of new documents. An Authority list of key-phrases is specified by the algorithm containing key-phrases that occur frequently within the documents of only one or a few classes in the training set. In this framework, this last property permitted the creation of naive rules that measure the similarity of new documents with the existing classes. Keywords: text data mining, text classification, instance based learning, rule induction. 1 Introduction Key-phrases or search terms could be defined as sequences of adjacent words within a text window (e.g. five successive words of the text / a sentence) forming a meaningful, descriptive phrase related to the content of the text document. Such terms can be used as features for classifying (text) documents. Since, not every key-phrase is appropriate for discriminating between documents, we have to examine and apply methods for selecting the appropriate ones. Hence, a prerequisite for such a classification method is the use and maintenance of a list of key-phrases, the so-called \“Authority List” Karanikolas and Skourlas [4]. An interesting problem is related to the reduction of the search space that is needed for the extraction of candidate key-phrases. In Classification learning, a learning scheme takes a set of classified examples from which it is expected to learn a way of classifying unseen

Keywords

text data mining, text classification, instance based learning, rule induction.

Keep me updated

View Book

WIT Press, Ashurst Lodge, Ashurst, Southampton SO40 7AA, UK. Registered in England as a limited company No. 4741634

Connect with WIT Press: