Information retrieval in the World-Wide Web: Making client-based searching feasible

P. M. E. De Bra and R. D. J. Post

Back to index

Summary

This paper presents the details of the "Fish Search" algorithm and the Lagoon cache system. The Fish search algorithm follows the metaphor that a school of fish are let loose in a lagoon looking for food and reproducing. The Fish 'food' is equivilant to finding a page relevant to a user's query. The environment the fish search in is a set of cached documents that reside locally on a server, or the Lagoon cache. The details of the Lagoon cache are explored in this paper, with the most interesting idea including how often the documents should be cached. Basically, documents are cached according to the frequency of use by users, where the majority of documents do not need to be refreshed regularly. Unfortunately, I disagree with this infrequent caching strategy, though I understand the network limitations imposed in 1994. The majority of documents I search today include news pages such as Slashdot and CNN, and databases such as Citeseer. In this model, older documents would not suffice. This document does not include any experimental measurements about the efficiency of the cache over non-caching strategies, and it leaves out the details of how the Fish search is acctually accomplished within the Lagoon cache.

Methods

The Fish search was implemented as a plugin to the Mosaic browser for the X platform. Fish search allows the user to enter in a keyword search, a regular expression search, or add an external filter search. Agents, or 'fish' are sent out over the local cached corpus to retrieve documents that are relevant based on a depth-first heuristic strategy. 'Highlighted' links, or pages visited before, are not added to the queue that maintains the set of documents that will be explored, though the priority of a link may change within the queue upon revisit of a link.

Keywords

Fish search, information retrival, Lagoon cache, highlighting links, search depth, recency

Rating

6

Bibtex Entry

@article{ debra94,

author = "P. M. E. De Bra and R. D. J. Post",

title = "Information retrieval in the World-Wide Web: Making client-based searching feasible",

journal = "Computer Networks and ISDN Systems",

volume = "27",

number = "2",

pages = "183--192",

year = "1994",

url = "citeseer.nj.nec.com/99604.html"

}

 

Back to index