WIT Press


Searching Relationships Between Enterprise Websites Using Graph Based Web Crawling

Price

Free (open access)

Paper DOI

10.2495/DATA090071

Volume

42

Pages

9

Page Range

61 - 69

Published

2009

Size

660 kb

Author(s)

R. C. F. De Souza, G. M. Caputo & N. F. F. Ebecken

Abstract

The objective of this paper is to find explicit web relationships using enterprise websites as seeds. We apply a web crawler to find these relationships in a hierarchy starting from the given seed using the external links to construct a Jaccard Score weighted tree. The proposed methodology aims to search related enterprises from the root node based on the link, which are potential partners, suppliers, clients, etc. We crawl the whole site to find external links using the Breadth First Search (BSF) algorithm and build a tree structure containing just the interesting external links. The applied algorithms were programmed with very simple computational components and may produce interesting results to analyze the domain of sites, their structure, and how they link with each other in their acting range. Keywords: link analysis, BSF algorithm, web crawling.

Keywords

link analysis, BSF algorithm, web crawling.