Literature Review for Web-Based Agents

This page serves two purposes: to organize my research path, and to allow others to build on the work deposited here. Each general topic lists relevant documents, their authors, and my personal rating of the utility of the paper. Ratings are from 1 - 10, with 10 as highest and 5 as average. Details about each paper are available through the link. Papers are in no particular order.

 

InfoSpiders

Paper Title Author Features
Rating
Artificial life applied to adaptive information agents Menczer, Belew, Willuhn Evolve a population of agents using local decisions to exploit shared resources
7
Is agent based online search feasible? Menczer Distributed populations of adaptive agents are scalable, use Q-learning and cloning to evolve
8
Evaluating topic-driven web crawlers Menczer Assessment using classification, a retrieval system, and mean topic similarity
7
Adaptive retrieval agents: internalizing context and scaling up to the web Menczer and Belew Learning techniques: evolutionary adaption by local selection and environment internalization
8
ARACHNID: Adaptive retrieval agents choosing heuristic neighborhoods for information discovery Menczer Distibuted, adaptive agents that make reproductive decisions locally
8
Adaptive information agents in a distributed textual environment Menczer and Belew Adaptive agents exploit link topology using context and connectivity between pages
8
Latency-dependant fitness in evolutionary multithreaded Web agents Degeratu, Pant, Menczer Variable cost in energy based on the network download time increases InfoSpiders' speed
7
     

PageRank algorithm and Google

The anatomy of a large-scale hypertextual web search engine Brin and Page Architecture of Google, description of three steps: crawling, indexing, and sorting
9
The Connectivity Server: fast access to linkage information on the web Bharat et al Produce the neighborhood (back and forward links) around a page efficiently
6
The Term Vector Database: fast access to indexing terms for web pages Stata, Bharat and Maghout Method of mapping pageIDs to terms, efficient form of indexing
8
Efficient crawling through URL ordering Cho et al PageRank algorithm to select links based on number of forward and back links
6
     

Hub and Authority-based algorithms

Automatic resource compilation by analyzing hyperlink structure and associated text Chakrabarti Rank pages using link context and iterative hub and authority scores
7
Authoritative sources in a hyperlinked environment Kleinberg Foundational paper using iterative hub and authority scores on a subgraph to rank pages
8
An efficient algorithm to rank web resources Zhang and Dong Predict best next link be using a Markov model to map the user's browsing history
5
     

MetaCrawlers and Meta Search Engines

The MetaCrawler architecture for resource aggregation on the web Selberg and Etzioni Using 'Harness', query search engines and parse their results, return ranking
6
Query routing for web search engines: architecture and experiments Sugiura and Etzioni Route a query to a specialized search engine based on the search engine's topic
8
     

Internet Topology and Similarity

Link Analysis in Web Information Retrieval

 

Henzinger Comparision of query-dependant and query-independant link analysis algorithms
6

Inferring Web Communities from Link Topology

Gibson, Kleinberg, Raghavan Discovering 'communities' of pages for a topic
8
Effect of Environmental Structure on Evolutionary Adaptation Fletcher, Bedau and Zwick Using environmental information and resource utility, assess agents internally and externally
5
A system for automatic personalized tracking of scientific literature on the web Bollacker, Lawrence, Giles Determining text and citation relatedness for CiteSeer, assisted by user profiles
7
The Small-World Phenomenon: An Algorithmic Perspective Kleinberg Web pages are linked by chains of acquaintances, bounding size of chain
7

Community-Based Service Location

Singh, Yu and Venkatraman Pages form social networks, referrals are used to identify important services
7
Diameter of the World Wide Web Albert, Joeng and Barabasi The average number of links between any two web pages is 19
7
     

Personalizable Internet Agents

SPHINX: A framework for creating personal, sit-specific web crawlers Miller and Bharat Customizable crawler framework allowing users to add their own search heuristics
7
Is there an intelligence agent in your future? Hendler Ideal internet agent must be communicative, capable, autonomous, and adaptive
4
Training Intelligent Agents using Human Internet Data Sklar,. Blair, Funes, Pollack Creation of a diverse population of agents from human trainers
5
     

Learning Techniques (that really don't fit into the categories above)

Efficient web spidering with reinforcement learning Rennie and McCallum Follow links based on 'immediate reward', using the text surrounding the link
8
How learning improves the performance of evolutionary agents: a case study with an information retriveal system for a distributed environment Pereira and Costa Evolutionary algorithm for internet queries, based on link and keyword learning
3
Adding support for dynamic and focused search with fetuccino Ben-Shaul et al Shark search
7
     

Misc

Information retrieval in the World-Wide Web: Making client-based searching feasible De Bra and Post Fish search, Lagoon cache, evolutionary algorithm for search query
6
Scalable Internet resource discovery: research problems and approaches Bowman et al Efficient method to scale resource discovery using replicated servers and caching
5