In the futuristic world of "Star Trek," computers listen to human speech, then follow orders or answer questions with near-perfect precision. But present-day computers are not so skillful. Current machines can respond to brief, clearly uttered instructions such as "open the door," but asking them to understand casual conversation is asking for trouble. Even state-of-the-art systems stumble over words that sound alike or have more than one meaning. To a computer, the phrases "recognize speech" and "wreck a nice beach" sound the same. When people slur their words or speak with a regional accent, computerized comprehension becomes even tougher.
In the face of such obstacles, researchers at The Johns Hopkins University are developing new tools to help computers understand speech. To support this work, they recently received a $750,000 National Science Foundation grant. The Hopkins team was one of 15 nationwide picked to participate in a $10-million NSF project aimed at developing more natural interaction between computers and humans. When the Hopkins software is perfected, it could allow a blind person to dictate a letter to a computer with far greater accuracy than existing systems. It could also provide a powerful new way to search through hours of recorded speeches and news reports that have not been transcribed. "The grand goal," says Eric Brill, a researcher at Hopkins' Center for Language and Speech Processing "would be to have a computer understand any kind of human speech."
Significant advances are needed. Current speech recognition software often trips over garbled or sound-alike words. When that happens, it looks at the previous word or two for help, says Brill, an assistant professor in the Department of Computer Science. With this method, the machine can decide that the words it heard after "walking" were more likely to be "the dog" than "the dock." Still, this existing software will incorrectly transcribe about 40 percent of the words it hears in a casual conversation. "It's not enough," Brill says. "We need to be much more sophisticated in predicting what was just said, based on what the computer has already heard. It will be a very long time before we're down to a 1 percent error rate. But the system becomes more and more useful as the error rate goes down." To cut down on mistakes, Hopkins researchers are teaching computers to examine the structure of a sentence, not just a couple of neighboring words. Just as grammar school children are taught to do, a computer could be programmed to break a sentence into its subject, verb, object and modifiers. With this knowledge, it could make better guesses about troublesome words. "If the main verb of the sentence is 'drive,' then 'spaghetti' is an unlikely object," explains Brill. "But 'car' is a likely object, or maybe 'golf ball.' We are trying to get the computer to ask questions like that."
To further aid comprehension, the Hopkins team wants the computer to figure out the subject of a conversation, such as science, music, politics or food. If words such as "home run," "catcher" and "foul ball" turn up, the computer should sense that "sports" is the topic and that the baffling word is probably "pitcher," not "picture." "It's very important in speech recognition to know what's under discussion," says David Yarowsky, another CLSP researcher who is also an assistant professor of computer science. "So the computer will have these 'topic detectors' running in the background. Are we talking about education or politics? Are we talking about a school principal or a legal principle? Both words sound alike, but the computer will give them different weights depending on what topic it thinks you're talking about."
Together, the added attention to linguistics and world knowledge should help computers recognize human speech with far greater accuracy, the Hopkins researchers say. Within five years, Yarowsky predicts, a computer may serve as an audio search engine. It could "listen" to hours of radio and television news reports and locate virtually every speech or interview in which Secretary of State Madeleine Albright has discussed human rights issues involving China, for example. "For this purpose, you don't have to have a perfect speech recognizer," Yarowsky says. "You don't have to get every word right to recognize that Mrs. Albright is speaking about human rights in China. This technology has the potential to revolutionize the way we retrieve things that were never in text form to begin with, but were only recorded as speech."
The sort of speech recognition seen on "Star Trek" may be many years away, but the Hopkins researchers are moving in that direction. "Human-computer interaction is the major goal here," says Yarowsky. "We want to make it easier for people to interact with machines."