Public Release: 

Single click generates lists to end all lists

New Scientist

USING search engines to compile a list- like the top 50 greatest blues guitarists by record sales, say- involves a lot of drudge work because you have to visit many web pages to gather the data you need. But the next step in search engine technology could make creating such lists possible with a single mouse click. KnowItAll, a search engine under development at the University of Washington, Seattle, trawls the web for data and then collates it in the form of a list.

The approach is unique, says its developer, Oren Etzioni, because it generates information that probably doesn't exist on any single web page. The US Department of Defense's research arm, DARPA, and Google, are so impressed that they are providing funding for the project.

Etzioni's ultimate aim is to have KnowItAll answer questions such as "list all British scientists born before 1900". The software cannot do that yet, because it lacks a module that can understand "natural-language" questions of this type. That will come later, he says. What it can do, however, is take a phrase like "list scientists" and return with a list that it believes with a high degree of confidence are (or were) scientists.

For any input noun- "scientists", "guitarists", "gardeners" or "actors", say- KnowItAll tries to find sentences on websites that contain that noun and looks for words that often appear after it. In this way it might find the phrases "scientists such as" and "scientists including". It then feeds these to 12 search engines and extracts the words that tend to follow, which are often scientists' names. But c1ertain phrases like "scientists such as botanists" also fulfil the search criteria. The software can work out that "botanists" is not a name, and it can use this to inject "botanists such as" into the engines to obtain an even fuller list of scientists' names.

KnowItAll then returns a long list of scientists' names- each one accompanied by its percentage probability of being correct, as measured by frequency of occurrence of the names on websites. Users will be able to choose the level of confidence they want in the data. KnowItAll is also able to find words that often occur close to the search term. In the case of "scientists" these might be words like "DNA" and "quantum". It uses them to refine the probability that a person is indeed a scientist.

###

Author: Celeste Biever

New Scientist issue: 8 May 2004

PLEASE MENTION NEW SCIENTIST AS THE SOURCE OF THIS STORY AND, IF PUBLISHING ONLINE, PLEASE CARRY A HYPERLINK TO: http://www.newscientist.com

"These articles are posted on this site to give advance access to other authorised media who may wish to quote extracts as part of fair dealing with this copyrighted material. Full attribution is required, and if publishing online a link to www.newscientist.com is also required. Advance permission is required before any and every reproduction of each article in full - please contact celia.thomas@rbi.co.uk. Please note that all material is copyright of Reed Business Information Limited and we reserve the right to take such action as we consider appropriate to protect such copyright."

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.