A. Sugiura and O. Etzioni
Summary
This paper presents Q-Pilot, which is an application that recognizes specialized search engines, and attempts to route the user's query to the appropriate search engine. First, Q-Pilot identifies the topic of a specialized search engine off-line by using a 'Neighborhood-based topic identification technique'. This technique generates queries for the search engine, and determines the scope of the search engine's 'Neighborhood'. Then, when a user enters a query, the query is clustered into a topic, and the query is directed to the appropriate search engine. Ambiguous topics may require Q-Pilot to prompt the user for which topic he desires. Experimental results support the robustness of the search method, and claim accuracy for over 400 categories.
The authors obviously spent a large amount of time on the user interface of this application, and provided strong experimental results to polish off the work. This recent paper is actively cited, and looks to be a promising direction for future work.
Keywords
web search, query routing, query expansion, search engines, Q-Pilot, off-line
Methods
Neighborhood-based topic identification is accomplished using either front-page, back-link, or database sampling methods. The front-page method uses the search engine's high level overview description on its own home page (ex: securityfocus.com searches for security items, and this is stated on its home page). The back-link method uses the previously indexed pages that point to the new engine (backlinks) to describe that engine's content, as well as a topic frequency count, which queries the search engine with a list of topic terms, and measure's the search engine's accuracy. Database sampling generates training queries and indexes the page results.
The clustering method to determine the topic of a given query uses highest word co-occurrences in pages to determine up to three mutually exclusive categories. Then the appropriate search engine is selected by a measure of goodness, which is the sum of the frequencies and the word co-occurrences in the search engine selection index.
Rating
8
Bibtex Entry
@proceedings{ sugiura00,
author = "A. Sugiura and O. Etzioni",
title = "Query Routing for Web Search Engines: Architecture and Experiments",
text = "WWW9 Conference, 2000.",
year = "2000",
url = "http://citeseer.nj.nec.com/context/1447883/0"
}