WIT Press


Preprocessing Method And Similarity Measures In Clustering-based Text Mining: A Preliminary Study

Price

Free (open access)

Paper DOI

10.2495/DATA030071

Volume

29

Pages

8

Published

2003

Size

310 kb

Author(s)

S. Iiritano, M. Ruffolo & P. Rullo

Abstract

Preprocessing method and similarity measures in clustering-based text mining: a preliminary study S. 1iritano1t2, M. ~ u f f o l o l > ~ & P. ~ u l l o l l ~ Exeura s.i-.l. 2 ~ ~ ~ ~ - ~ i p a r t i m e n t o di Elettronica, lnformatica e Sistemistica 3 ~ ~ ~ ~ - ~ ~ ~ - - l s t i t u t o di CAlcolo e Reti ad alte prestazioni del Consiglio Nazionale delle Ricerche 4 Dipartimento di Matematica Universitd della Calabria, 87036 Rende (CS), Italy Abstract Knowledge Discovery in Text (KDT) has emerged as a challenging application due to the large amount of textual documents available from heterogeneous sources. An approach to knowledge discovery in text is based on clustering techniques in which the quality of results strongly depends on features extracted from documents and on similarity coefficients defined on them. In this work we present a framework for textual document preprocessing useful for the extraction of relevant features (i.e. lemma and word). Moreover, we define two similarity coeffi

Keywords