Preprocessing Method And Similarity Measures In Clustering-based 
Text Mining: A Preliminary Study

S. Iiritano; M. Ruffolo; P. Rullo

doi:10.2495/DATA030071

Preprocessing Method And Similarity Measures In Clustering-based Text Mining: A Preliminary Study

Price

Free (open access)

Transaction

WIT Transactions on Information and Communication Technologies

Volume

Pages

Published

2003

Size

310 kb

Paper DOI

10.2495/DATA030071

WIT Press

Author(s)

S. Iiritano, M. Ruffolo & P. Rullo

Abstract

Preprocessing method and similarity measures in clustering-based text mining: a preliminary study S. 1iritano1t2, M. ~ u f f o l o l > ~ & P. ~ u l l o l l ~ Exeura s.i-.l. 2 ~ ~ ~ ~ - ~ i p a r t i m e n t o di Elettronica, lnformatica e Sistemistica 3 ~ ~ ~ ~ - ~ ~ ~ - - l s t i t u t o di CAlcolo e Reti ad alte prestazioni del Consiglio Nazionale delle Ricerche 4 Dipartimento di Matematica Universitd della Calabria, 87036 Rende (CS), Italy Abstract Knowledge Discovery in Text (KDT) has emerged as a challenging application due to the large amount of textual documents available from heterogeneous sources. An approach to knowledge discovery in text is based on clustering techniques in which the quality of results strongly depends on features extracted from documents and on similarity coefficients defined on them. In this work we present a framework for textual document preprocessing useful for the extraction of relevant features (i.e. lemma and word). Moreover, we define two similarity coeffi

Keywords

Keep me updated

Keep me updated

View Book