WIT Press

Textual Data Mining By Parsing


Free (open access)

Paper DOI








436 kb


A Bellacicco


The paper deals with the problem of the identification of the specific answer to whatever question put to a search engine in the web exploration. The actual problem is to avoid the usual stream of thousands pages extracted by the search engine without the help of an a priori categorization of the theme to which the question is directed. The system, called IRAS, described in the paper, is a follow up of a previous theoretical framework, called TOM, presented at MIS 2002, Bellacicco [1]. The innovation of IRAS is the exploitation of parsing algorithms for the identification of the semantic organization of the statements so that the answer is specific for the real content of the question. Problem faced by IRAS is therefore to overcome besides the stream of thousands pages from the web, the stream of unspecific answers too, which are the fallout of the ambiguity related the use of the terms of the query without the specification of their role in the statement. The parsing of the query besides the parsing, selection and clustering of the statements is the primary tools of IRAS. 1 Introduction Data mining and textual mining differ mainly by the use of tools which consider numerical operations besides the logical operations and by the use of tools linked to operators which consider the specific role of the data. For the numerical case the operators are functors which map the data either into a geometric hyperspace or on an hypesurface. The choice of the geometry