Authoritative sources in a hyperlinked environment

Jon M. Kleinberg

Back to index

Summary

This paper discusses an algorithm for constructing focused subgraphs for a particular topic, utilizing the notion of hub and authority pages. An iterative algorithm is applied to a graph of seed pages to find the 'densest' collection of pages about a topic, by successively increasing the weights of the hub and authority pages. This iterative algorithm takes advantage of the mutually reinforcing relationship between important hub and authority pages. It is interesting that the most densely linked subgraph appears quickly, within 20 iterations. The top authorities and hub pages are then filtered out, which results in a discrete principal eigenvector of the weights of these pages. The nonprincipal eigenvector is also analyzed in the results, and the information it provides gives an insight into further breakdown of the pages within a topic. (see the abortion search for a poignant example). The experimental section shows how accurately this method works to retrieve the most important, relevant pages. There is also a section comparing this analysis to other work such as relationships between scientific citations and clustering of similar documents. Finally an analysis of the generality of a query is touched upon, reiterating previous work by Dr Menczer. The conclusions say that the authors are able to infer global notions of structure within the unstructured environment of the internet, and apply these findings in a concise algorithm. This work has been widely cited, and definitely gives the field a new approach for search problems in a global environment.

Methods

Kleinberg presents the details of three algorithms: subgraph, iterate, and filter. Subgraph constructs a graph of pages from an initial root set of pages that relate to a query. To this root set, pages that the root set links to are added, then all intrinsic pages (pages with same domain name) are deleted. This deletion of intrinsic pages is a heuristic method which simplifies the result set and maintains the property that important pages are all included. From this subgraph, an iterative algorithm is applied that increases the hub and authority scalar values for each page by counting the number of links to and from a page, respectively. An equilibrium is reached for the hub and authority scores generally within 20 iterations. Then a filtering algorithm is applied that ranks the pages through the values obtained by the iterate algorithm, which leaves a result set of the most important pages.

Keywords

graph algorithms, hypertext structure, link analysis, www, focused subgraph, hub, authority, principal eigenvector, information retrieval

Rating

8

Bibtex Entry

@article{ kleinberg99,

author = "Jon M. Kleinberg",

title = "Authoritative sources in a hyperlinked environment",

journal = "Journal of the ACM",

volume = "46",

number = "5",

pages = "604--632",

year = "1999",

url = "citeseer.nj.nec.com/kleinberg97authoritative.html"

}

 

Back to index