PITTSBURGH--In response to the humanitarian crisis in Haiti, scientists at Carnegie Mellon University's Language Technologies Institute (LTI) have publicly released spoken and textual data they've compiled on Haitian Creole so that translation tools desperately needed by doctors, nurses and other relief workers on the earthquake-ravaged island can be rapidly developed.
Since Carnegie Mellon began to make the data publicly available last week, a team at Microsoft Research has used it to help develop an experimental, web-based system for translating between English and Haitian Creole (http://www.
"French speakers can sort of puzzle through it, but Creole isn't penetrable if you don't know French," Frederking said. Few translation resources are available for the language, he added.
The Carnegie Mellon data base for Haitian Creole was created in the late 1990s for Diplomat, a project sponsored by the Defense Advanced Research Projects Agency. The project was headed by Jaime Carbonell, LTI director, and focused on developing portable, speech-to-speech translation devices that could be deployed rapidly for Haitian Creole and other languages of special interest to the Department of Defense. Frederking and Alex Rudnicky, principal systems scientist in the Computer Science Department, served as co-principal investigators.
A prototype Haitian Creole translation system was delivered to the U.S. Army, but "as far as we know, nobody ever field-tested it," Frederking said. The project ended in the late 1990s, but LTI retained the data compiled and produced for the project.
Since the Jan. 12 earthquake, LTI researchers decided to begin work on an updated translation system for Haitian Creole that would incorporate the latest translation technologies. To aid other groups pursuing parallel efforts worldwide, they also opted to release the data publicly at www.speech.cs.cmu.edu/haitian/, making it available with minimal restrictions. In addition to the Diplomat material, other data developed by researchers at LTI and elsewhere are being added to the site as they become available.
Given the extreme poverty of Haiti, "nobody is going to make money on a Haitian Creole translator," Frederking said. "But translation systems could be an important tool, both for the relief workers now involved in emergency response and in the long-term as rebuilding takes place."
LTI, which focuses on such topics as machine translation, speech processing, information retrieval, text mining and computer-assisted language learning, is one of seven academic units in Carnegie Mellon's School of Computer Science.
About Carnegie Mellon: Carnegie Mellon (www.cmu.edu) is a private, internationally ranked research university with programs in areas ranging from science, technology and business, to public policy, the humanities and the fine arts. More than 11,000 students in the university's seven schools and colleges benefit from a small student-to-faculty ratio and an education characterized by its focus on creating and implementing solutions for real problems, interdisciplinary collaboration and innovation. A global university, Carnegie Mellon's main campus in the United States is in Pittsburgh, Pa. It has campuses in California's Silicon Valley and Qatar, and programs in Asia, Australia and Europe. The university is in the midst of a $1 billion fundraising campaign, titled "Inspire Innovation: The Campaign for Carnegie Mellon University," which aims to build its endowment, support faculty, students and innovative research, and enhance the physical campus with equipment and facility improvements.