WIT Press


A Web Mining Process For E-Knowledge Services

Price

Free (open access)

Volume

37

Pages

12

Published

2006

Size

882 kb

Paper DOI

10.2495/DATA060261

Copyright

WIT Press

Author(s)

M. Castellano, F. Fiorino, F. Arcieri, V. Summo & G. Bellone de Grecis

Abstract

The purpose of this paper is to describe a process of Web Mining in order to support specialized e-Knowledge services. Here is proposed a new reference architecture based on an orchestration of reusable building blocks, with well defined tasks and the ability to interoperate among them. The system is designed to support a decision maker in a service-oriented way, by adopting a clear separation of tasks: crawling, pre-processing, information extraction, information retrieval, text mining and presentation of results. It allows the analysis of Web information by extracting, selecting, processing and modelling huge amounts of data, in order to discover rules and patterns in a distributed and heterogeneous content environment of informative resources. Finally, as a case study, the Reputation Management process is presented. Keywords: Web Mining, text mining, Web crawling, information extraction, information retrieval, reputation management. 1 Introduction The digital universe known as the World Wide Web is a very huge place that includes literally billions of Web pages, and is estimated to continue to grow at an accelerating rate of 7,3 million pages per day (Cyveillance, 2003). Moreover, with this amount of data available online, the WWW is today considered a popular and interactive medium to disseminate information. At the beginning, it was an instrument primarily used by universities and research communities; nowadays it represents a tool of easy access and insert of information [3, 8]. Moreover, the available information is extremely distributed and heterogeneous:

Keywords

Web Mining, text mining, Web crawling, information extraction, information retrieval, reputation management.