David Gibson, Jon M. Kleinberg and Prabhakar Raghavan
Summary
This paper carefully explains away the general viewpoint that the web is a conglomerate of unstructured pages. The authors present a picture of the web that is composed of communities, or groups of authoritative pages strongly connected together via hub pages. This structure is uncovered through the use of the HITS algorithm, described below, which discovers authoritative pages on the principle of 'broad topic information discovery'. Experimental evidence shows a stabilization of root communities after about 50 iterations, and these pages tend to reoccur even when a different seed set is used on the same query. This observation strongly supports the communities argument. In addition, the authors note that more specific queries will result in a broader topic set of pages with this method, as the specific topic is the child of a broader parent topic, and the parents pages are returned. This behavior is correlated with the density of links and pages on a given topic: denser topics are less likely to be generalized. Finally, a temporal movement of a topic can be viewed over time, as the pages relating to a topic filter out and are removed, or are heavily linked to during the height of the topic's interest.
Methods
The HITS algorithm begins with a seed set of hub and authority pages about a given topic. These types of pages have a mutually reinforcing relationship, allowing an iterative algorithm to repeatedly add to the weight a page recieves in an overall ranking. A set of pages around the seed set is extracted, and all pages are looped over increasing their scores until convergance. The resulting weights correspond to the principal eigenvectors of the matrices derived from the link structure. Note that the page text and relationship to the query is only considered in the seed step of the algorithm.
Assumptions
Keywords
Hypertextual communities, information exploration, WWW, collaborative annotation, hub, authority, HITS
Rating
8
Bibtex Entry
@inproceedings = { gibson98,
author = "David Gibson and Jon M. Kleinberg and Prabhakar Raghavan",
title = "Inferring Web Communities from Link Topology",
booktitle = "{UK} Conference on Hypertext",
pages = "225-234",
year = "1998",
url = "citeseer.nj.nec.com/kleinberg98inferring.html"
}