Preprocessing Method And Similarity Measures In Clustering-based Text Mining: A Preliminary Study
Price
Free (open access)
Volume
29
Pages
8
Published
2003
Size
310 kb
Paper DOI
10.2495/DATA030071
Copyright
WIT Press
Author(s)
S. Iiritano, M. Ruffolo & P. Rullo
Abstract
Preprocessing method and similarity measures in clustering-based text mining: a preliminary study S. 1iritano1t2, M. ~ u f f o l o l > ~ & P. ~ u l l o l l ~ Exeura s.i-.l. 2 ~ ~ ~ ~ - ~ i p a r t i m e n t o di Elettronica, lnformatica e Sistemistica 3 ~ ~ ~ ~ - ~ ~ ~ - - l s t i t u t o di CAlcolo e Reti ad alte prestazioni del Consiglio Nazionale delle Ricerche 4 Dipartimento di Matematica Universitd della Calabria, 87036 Rende (CS), Italy Abstract Knowledge Discovery in Text (KDT) has emerged as a challenging application due to the large amount of textual documents available from heterogeneous sources. An approach to knowledge discovery in text is based on clustering techniques in which the quality of results strongly depends on features extracted from documents and on similarity coefficients defined on them. In this work we present a framework for textual document preprocessing useful for the extraction of relevant features (i.e. lemma and word). Moreover, we define two similarity coeffi
Keywords