A Web Mining Process For E-Knowledge Services
Free (open access)
M. Castellano, F. Fiorino, F. Arcieri, V. Summo & G. Bellone de Grecis
The purpose of this paper is to describe a process of Web Mining in order to support specialized e-Knowledge services. Here is proposed a new reference architecture based on an orchestration of reusable building blocks, with well defined tasks and the ability to interoperate among them. The system is designed to support a decision maker in a service-oriented way, by adopting a clear separation of tasks: crawling, pre-processing, information extraction, information retrieval, text mining and presentation of results. It allows the analysis of Web information by extracting, selecting, processing and modelling huge amounts of data, in order to discover rules and patterns in a distributed and heterogeneous content environment of informative resources. Finally, as a case study, the Reputation Management process is presented. Keywords: Web Mining, text mining, Web crawling, information extraction, information retrieval, reputation management. 1 Introduction The digital universe known as the World Wide Web is a very huge place that includes literally billions of Web pages, and is estimated to continue to grow at an accelerating rate of 7,3 million pages per day (Cyveillance, 2003). Moreover, with this amount of data available online, the WWW is today considered a popular and interactive medium to disseminate information. At the beginning, it was an instrument primarily used by universities and research communities; nowadays it represents a tool of easy access and insert of information [3, 8]. Moreover, the available information is extremely distributed and heterogeneous:
Web Mining, text mining, Web crawling, information extraction, information retrieval, reputation management.