Textual Document Pre-processing And Feature
Extraction In OLEX

R. Curia; M. Ettorre; L. Gallucci; S. Iiritano; P. Rullo

doi:10.2495/DATA050171

WIT Press

Textual Document Pre-processing And Feature Extraction In OLEX

Price

Free (open access)

Transaction

WIT Transactions on Information and Communication Technologies

Volume

Pages

Published

2005

Size

367 kb

Paper DOI

10.2495/DATA050171

WIT Press

Author(s)

R. Curia, M. Ettorre, L. Gallucci, S. Iiritano & P. Rullo

Abstract

KnowledgeDiscovery in Text (KDT) has emerged as a challenging application due to the large amount of textual documents available from heterogeneous sources. OLEX is a KDT system for text classification developed at Exeura. A critical step of a KDT process is the pre-processing phase, consisting of a number of complex tasks aimed at making documents \“machine readable”. This paper describes the OLEX Pre-processing Module (OPM), an advanced software based on a general framework supporting the extraction from texts of linguistic, syntactic and structural relevant features. A main aspect of OPM is its capability to provide support for parallel text annotation. 1 Introduction Managing the huge amount of textual documents available on the web and on the intranets has become an important problem of Knowledge Management. Thus, techniques and tools for text categorization are needed [1]. Textual document collections can be seen as sources of unstructured data for which knowledge mining can be made by using Knowledge Discovery in Text (KDT) [2], an interactive and iterative process based on four phases: • Document Acquisition • Document Pre-Processing • Text Mining • Result Interpretation

Keywords

Keep me updated

View Book

WIT Press, Ashurst Lodge, Ashurst, Southampton SO40 7AA, UK. Registered in England as a limited company No. 4741634

Connect with WIT Press: