[Home]   [Full version]  

Grammar Lost Translation Machine In Researchers Fix Will

Sep 09 ,Technology



Full size image
The makers of a University of Southern California computer translation system consistently rated among the world's best are teaching their software something new: English grammar.

Image: A Tree Grows in Translation Grammatical structure, long in second place, is emerging as a key to better English in the finished product.

Most modern "machine translation" systems, including the highly rated one created by USC's Information Sciences Institute, rely on brute force correlation of vast bodies of pre-translated text from such sources as newspapers that publish in multiple languages.

Software matches up phrases that consistently show up in parallel fashion — the English "my brother's pants" and Spanish "los pantalones de mi hermano," — and then use these matches to piece together translations of new material.

It works — but only to a point. ISI machine translation expert Daniel Marcu (left) says that when such a system is "trained on enough relevant bilingual text ... it can break a foreign language up into phrasal units, translate each of them fairly well into English, and do some re-ordering. However, even in this good scenario, the output is still clearly not English. It takes too long to read, and it is unsatisfactory for commercial use."

So Marcu and colleague Kevin Knight (right), both ISI project leaders who also hold appointments in the USC Viterbi School of Engineering department of computer science, have begun an intensive $285,000 effort, called the Advanced Language Modeling for Machine Translation project, to improve the system they created at ISI by subjecting the texts that come out of their translation engine to a follow-on step: grammatical processing.

The step seems simple, but is actually imposingly difficult. "For example, there is no robust algorithm that returns 'grammatical' or 'ungrammatical' or 'sensible' or 'nonsense' in response to a user-typed sequence of words," Marcu notes.

The problem grows out of a natural language feature noted by M.I.T. language theorist Noam Chomsky decades ago. Language users have literally a limitless ability to nest and cross-nest phrases and ideas into intricate referential structures — "I was looking for the stirrups from the saddle that my ex-wife's oldest daughter took with her when she went to Jack's new place in Colorado three years ago, but all she had were Louise's second-hand saddle shoes, the ones Ethel's dog chewed during the fire."

Unraveling these verbal cobwebs (or, in the more common description, tracing branching "trees" of connections) is such a daunting task that programmers long ago went in the brute force direction of matching phrases and hoping that the relation of the phrases would become clear to readers.

With the limits of this approach becoming clear, researchers have now begun applying computing power to trying to assemble grammatical rules. According to Knight, one crucial step has been the creation of a large database of English text whose syntax has been hand-decoded by humans, the "Penn Treebank."

Using this and other sources, computer scientists have begun developing ways to model the observed rules. A preliminary study by Knight and two colleagues in 2003 showed that this approach might be able to improve translations.

Accordingly, for their study, "We propose to implement a trainable tree-based language model and parser, and to carry out empirical machine-translation experiments with them. USC/ISI's state-of-the-art machine translation system already has the ability to produce, for any input sentence, a list of 25,000 candidate English outputs. This list can be manipulated in a post-processing step. We will re-rank these lists of candidate string translations with our tree- based language model, and we plan for better translations to rise to the top of the list."

One crucial trick that the system must be able to do is to pick out separate trees from the endless strings of words. But this is doable, Knight believes -- and in the short, not the long term.

Referring to the annual review of translation systems by the National Institute of Science and Technology, in which ISI consistently gains top scores, "we want to have the grammar module installed and working by the next evaluation, in August 2006," he said.

Knight and Marcu are cofounders and, respectively, chief scientist and chief technology and operating officer of a spinoff company, Language Weaver.

Source: University of Southern California

Related stories:

Children in non-English-speaking households face many health disparities, researcher concludes
Children in U.S. households where English is not the primary language experience multiple disparities in health care, a UT Southwestern Medical Center researcher has found.
Lost in translation: Language barrier adds confusion to prenatal testing
Many people struggle to understand the complexities of genetic problems in pregnancy and find medical language difficult to understand, particularly when faced with major decisions such as whether to terminate a pregnancy. A recent study, funded by the Economic and Social Research Council (ESRC), investigated how Britain's Bangladeshi community understand the disorders, and make decisions about testing and screening in the light of health care and religious opinion.
Braille converter bridges the information gap
A free, e-mail-based service that translates text into Braille and audio recordings is helping to bridge the information gap for blind and visually impaired people, giving them quick and easy access to books, news articles and web pages.
Bringing down the language barrier... automatically
Progress being made by European researchers on automatic speech-to-speech translation technology could help the EU tackle one of the biggest remaining boundaries to internal trade, mobility and the free exchange of information – language.
Fast-learning computer translates from four languages
Modern approaches to machine translation between languages require the use of a large ‘corpus’ of literature in each language. Now a European project has demonstrated a cheaper solution which compares favourably with the market leaders in translating from Dutch, German, Greek or Spanish into English.
Linguist tunes in to pitch processing in brain
More of the brain is busy processing pitch from language and other sounds than previously thought, according to a researcher in neurophonetics at Purdue University.
A rose is a rózsa is a 薔薇: Image-search tool speaks hundreds of languages
From the fall of the Tower of Babel to the Esperanto global language movement, many humans have dreamed of sharing a common tongue. Despite the Internet's promise of global communication, language barriers remain. Even pictures on the Web get lost in translation.
Evaluations aim to advance translation technology
Wartime military patrols and civilian encounters can be especially dangerous if neither group understands the other’s language. To help American forces secure critical information and communicate with the local population, National Institute of Standards and Technology (NIST) researchers are evaluating prototype, real-time, two-way translation systems for the Defense Advance Research Projects Agency (DARPA).

News discussion:

Technology news

[Home]   [Full version]