The approach is unique, says its developer, Oren Etzioni, because it generates information that probably doesn't exist on any single web page. The US Department of Defense's research arm, DARPA, and Google, are so impressed that they are providing funding for the project.
Etzioni's ultimate aim is to have KnowItAll answer questions such as "list all British scientists born before 1900". The software cannot do that yet, because it lacks a module that can understand "natural-language" questions of this type. That will come later, he says. What it can do, however, is take a phrase like "list scientists" and return with a list that it believes with a high degree of confidence are (or were) scientists.
For any input noun- "scientists", "guitarists", "gardeners" or "actors", say- KnowItAll tries to find sentences on websites that contain that noun and looks for words that often appear after it. In this way it might find the phrases "scientists such as" and "scientists including". It then feeds these to 12 search engines and extracts the words that tend to follow, which are often scientists' names. But c1ertain phrases like "scientists such as botanists" also fulfil the search criteria. The software can work out that "botanists" is not a name, and it can use this to inject "botanists such as" into the engines to obtain an even fuller list of scientists' names.
KnowItAll then returns a long list of scientists' names- each one accompanied by its percentage probability of being correct, as measured by frequency of occurrence of the names on websites. Users will be able to choose the level of confidence they want in the data. KnowItAll is also able to find words that often occur close to the search term. In the case of "scientists" these might be words like "DNA" and "quantum". It uses them to refine the probability that a person is indeed a scientist.
Author: Celeste Biever
New Scientist issue: 8 May 2004
PLEASE MENTION NEW SCIENTIST AS THE SOURCE OF THIS STORY AND, IF PUBLISHING ONLINE, PLEASE CARRY A HYPERLINK TO: http://www.
"These articles are posted on this site to give advance access to other authorised media who may wish to quote extracts as part of fair dealing with this copyrighted material. Full attribution is required, and if publishing online a link to www.newscientist.com is also required. Advance permission is required before any and every reproduction of each article in full - please contact firstname.lastname@example.org. Please note that all material is copyright of Reed Business Information Limited and we reserve the right to take such action as we consider appropriate to protect such copyright."