Personalization Of Information Delivery Based On Web Mining And DNA Classification
Free (open access)
31 - 40
M. Santos & J. Amado
It is hard to find any kind of media with a growth-rate as high as the World Wide Web. At the same time, it is hard to find one that stores within itself such an amount of metadata, useful for an in-depth study. It is wrong to look at the WWW simply as a kind of information store. Although all its contents are information one way or the other, the truth is there are quite a few ways of letting the users interact with that information, either to manipulate it (via ajax-based applications), to alter it (through the use of wikis), to add to it (via blogs and web sites themselves) or to transform and amplify its meanings. These are only a few examples of what can be done today. Web site access logs are the main information source on how the WWW is used. Rather than asking the users if they viewed the pages (such as a TV station might do), any web site has the means to keep a permanent record about its visitors. By analyzing these logs, we are able to get a better understanding of the roles played by the web site. In this paper we borrow a few concepts from biology, in order to establish a kind of ‘DNA’ for each document on the web site of the Portuguese Tribunal de Contas (Court of Auditors). We do this by looking at the WWW as an information source and by processing what we find. At the same time, we try to extend the same approach to the users who looked for those documents, by processing the web access logs. The results of such an approach might enable future uses of automatic document classification, as well as an effective personalization of information delivery.
Keywords: web mining, DNA, access logs.