| This page serves two purposes: to organize my research path, and to allow others to build on the work deposited here. Each general topic lists relevant documents, their authors, and my personal rating of the utility of the paper. Ratings are from 1 - 10, with 10 as highest and 5 as average. Details about each paper are available through the link. Papers are in no particular order. |
| Paper Title | Author | Features |
Rating
|
| Artificial life applied to adaptive information agents | Menczer, Belew, Willuhn | Evolve a population of agents using local decisions to exploit shared resources |
7
|
| Is agent based online search feasible? | Menczer | Distributed populations of adaptive agents are scalable, use Q-learning and cloning to evolve |
8
|
| Evaluating topic-driven web crawlers | Menczer | Assessment using classification, a retrieval system, and mean topic similarity |
7
|
| Adaptive retrieval agents: internalizing context and scaling up to the web | Menczer and Belew | Learning techniques: evolutionary adaption by local selection and environment internalization |
8
|
| ARACHNID: Adaptive retrieval agents choosing heuristic neighborhoods for information discovery | Menczer | Distibuted, adaptive agents that make reproductive decisions locally |
8
|
| Adaptive information agents in a distributed textual environment | Menczer and Belew | Adaptive agents exploit link topology using context and connectivity between pages |
8
|
| Latency-dependant fitness in evolutionary multithreaded Web agents | Degeratu, Pant, Menczer | Variable cost in energy based on the network download time increases InfoSpiders' speed |
7
|
| The anatomy of a large-scale hypertextual web search engine | Brin and Page | Architecture of Google, description of three steps: crawling, indexing, and sorting |
9
|
| The Connectivity Server: fast access to linkage information on the web | Bharat et al | Produce the neighborhood (back and forward links) around a page efficiently |
6
|
| The Term Vector Database: fast access to indexing terms for web pages | Stata, Bharat and Maghout | Method of mapping pageIDs to terms, efficient form of indexing |
8
|
| Efficient crawling through URL ordering | Cho et al | PageRank algorithm to select links based on number of forward and back links |
6
|
| Automatic resource compilation by analyzing hyperlink structure and associated text | Chakrabarti | Rank pages using link context and iterative hub and authority scores |
7
|
| Authoritative sources in a hyperlinked environment | Kleinberg | Foundational paper using iterative hub and authority scores on a subgraph to rank pages |
8
|
| An efficient algorithm to rank web resources | Zhang and Dong | Predict best next link be using a Markov model to map the user's browsing history |
5
|
| The MetaCrawler architecture for resource aggregation on the web | Selberg and Etzioni | Using 'Harness', query search engines and parse their results, return ranking |
6
|
| Query routing for web search engines: architecture and experiments | Sugiura and Etzioni | Route a query to a specialized search engine based on the search engine's topic |
8
|
| Link Analysis in Web
Information Retrieval
|
Henzinger | Comparision of query-dependant and query-independant link analysis algorithms |
6
|
| Gibson, Kleinberg, Raghavan | Discovering 'communities' of pages for a topic |
8
|
|
| Effect of Environmental Structure on Evolutionary Adaptation | Fletcher, Bedau and Zwick | Using environmental information and resource utility, assess agents internally and externally |
5
|
| A system for automatic personalized tracking of scientific literature on the web | Bollacker, Lawrence, Giles | Determining text and citation relatedness for CiteSeer, assisted by user profiles |
7
|
| The Small-World Phenomenon: An Algorithmic Perspective | Kleinberg | Web pages are linked by chains of acquaintances, bounding size of chain |
7
|
| Singh, Yu and Venkatraman | Pages form social networks, referrals are used to identify important services |
7
|
|
| Diameter of the World Wide Web | Albert, Joeng and Barabasi | The average number of links between any two web pages is 19 |
7
|
| SPHINX: A framework for creating personal, sit-specific web crawlers | Miller and Bharat | Customizable crawler framework allowing users to add their own search heuristics |
7
|
| Is there an intelligence agent in your future? | Hendler | Ideal internet agent must be communicative, capable, autonomous, and adaptive |
4
|
| Training Intelligent Agents using Human Internet Data | Sklar,. Blair, Funes, Pollack | Creation of a diverse population of agents from human trainers |
5
|
| Efficient web spidering with reinforcement learning | Rennie and McCallum | Follow links based on 'immediate reward', using the text surrounding the link |
8
|
| How learning improves the performance of evolutionary agents: a case study with an information retriveal system for a distributed environment | Pereira and Costa | Evolutionary algorithm for internet queries, based on link and keyword learning |
3
|
| Adding support for dynamic and focused search with fetuccino | Ben-Shaul et al | Shark search |
7
|
| Information retrieval in the World-Wide Web: Making client-based searching feasible | De Bra and Post | Fish search, Lagoon cache, evolutionary algorithm for search query |
6
|
| Scalable Internet resource discovery: research problems and approaches | Bowman et al | Efficient method to scale resource discovery using replicated servers and caching |
5
|