A New Algorithm To Measure Relevance Among Web Pages
Free (open access)
M. S. Sadi, M. M. H. Rahman & S. Horiguchi
This paper proposes a new algorithm to measure relevance among Web pages (RWP) using a hybrid method of hyperlink analysis and content analysis. Here we used a new approach to Web searching where the input to the search process is not a set of query terms, rather the URL of a page, and the output is a set of related Web pages. A related Web page is one that addresses the same topic as the original page. Here, the proposed algorithm first uses only the connectivity information in the Web (i.e., the links between pages) and then the content of pages. To evaluate the performance, the algorithm is compared with existing algorithms. Experimental results show that RWP outperforms existing algorithms to find relevant Web pages. RWP increases the search efficiency effectively and enhances the application area of Web related research. Keywords: Web mining, relevant pages, hyperlink analysis, content analysis. 1 Introduction The World Wide Web is a rich source of information and continues to expand in size and complexity. How to efficiently and effectively retrieve required Web pages on the Web is becoming a challenge day by day [1–3]. There are many ways to find relevant pages. For example, as indicated in , Netscape uses Web page content analysis, usage pattern information, as well as linkage analysis to find relevant pages. Among the approaches of finding relevant pages, hyperlink analysis has its own advantages. Primarily, the hyperlink is one of the most obvious features of the Web and can be easily extracted by parsing the Web page codes. Most importantly, hyperlinks encode a considerable amount of latent human judgment in most cases [5, 6].
Web mining, relevant pages, hyperlink analysis, content analysis.