English|中文|Deutsch|Español|Français|Indonesian|Italiano|日本語|Português|Pycckий

Three Pointe Dr.
Suite 301
Brea, California 92821
U.S.A.

Tel.: +1 714.671.9180
Fax: +1 714.671.9188
Toll Free (U.S.)
+1 888.472.2001

 

The Global Advisor Newsletter -  Tips for improving the process and reducing the cost of website localization. Bringing Medical Devices to Market - Useful links. Celebrating notable anniversaries...

Features articles of interest on language translation and localization, culture, language technology and other related topics. The goal of the Global Advisor Newsletter is to inform and entertain.

Other Editions

  Print this page

Forty-Seventh Edition - Translation Memory (TM) and Machine Translation (MT)

In a perfect world, it would be possible to feed any text to a computer and have it translated “on the fly” into any language, flawlessly and cheaply. The researchers and linguists who have been working on the design of the perfect automated system have been dreaming about this for more than half a century. But in the real world, the perfect MT system has not been invented yet. Unedited text translated by an ‘untrained’ MT system is still...well...text translated by a machine. The quality of the translation ranges from understandable, but awkward or completely wrong, to simply unintelligible.

MT translation began in 1953, seventeen years after Konrad Zuse, a construction engineer for the Henschel Aircraft Company in Berlin, developed a series of automatic calculators to help him with complex engineering calculations that earned him the semiofficial title of “inventor of the modern computer.” The first MT systems were Direct Systems. They simply translated the Source Language (SL) text to the Target (TL) directly using a bilingual dictionary as reference. Think of the hand-held travel dictionary that converts phrases that travelers use most frequently. Direct MT systems do not analyze or disambiguate the SL text. They translate only simple text, or text with a very low level of ambiguity (some are capable of finding entries for past participles, gerunds, noun plural forms, and adjective forms). If a SL word has more than one meaning, this approach could produce the wrong results in the TL. For example the translation of the “I am fine” response to “How are you?” could conceivably become “I am a traffic ticket” in the TL. Mistakes such as this will amuse your hosts in the visiting country, who will still appreciate your effort to communicate with them in their language, but they are unacceptable in a formal translation.

Since the 1950’s computing technology has taken giant leaps, and MT along with it. These advances enabled the development of Indirect MT systems. These systems begin the translation process by analyzing the source text to disambiguate the SL sentences and enhance the quality of the translation.

In 1992, Hutchins and Somers (An introduction to machine translation, Academic Press, London) identified two types of MT systems: Transfer-based and Interlingual.

In “The development and use of machine translation systems and computer-based translation tools”, John Hutchins explains the differences between these to MT systems:

“There have in fact been two basic ‘indirect’ approaches. In one the abstract representation is designed to be a kind of language-independent ‘interlingua’, which can potentially serve as an intermediary between a large number of natural languages. Translation is therefore in two basic stages: from the source language into the interlingua, and from the interlingua into the target language. In the other indirect approach (in fact, more common approach) the representation is converted first into an equivalent representation for the target language. Thus there are three basic stages: analysis of the input text into an abstract source representation, transfer to an abstract target representation, and generation into the output language.”

Transfer Based Indirect MT System

A Transfer Based Indirect MT System includes analysis and parsing of the source language text, independently of the TL text. The results of this analysis are used during the transfer to TL process to find the corresponding words in the TL.
 

Interlingual MT Systems

The Interlingual system adds two additional steps to the process: Translation to an intermediate language (interlingua) before transferring the SL text to the TL.

According to Hutchins & Somers (1992), an interlingua is the intermediate representation of meaning that “includes all information necessary for the generation of the target text without `looking back' to the original text. The representation is thus a projection from the source text and at the same time acts as the basis for the generation of the target text; it is an abstract representation of the target text as well as a representation of the source text.”

Some researchers used an artificial language, like Esperanto as the interlingua because artificial languages are considered to be more regular and consistent in their morphology and syntax.

Commercially available MT systems fit the three basic system types (‘transfer’, ‘direct’ and ‘interlingual’). Some best known of the systems in the industry, such as Systran, Logos and the Fujitsu Atlas systems, are based on 'direct translation', but they are vastly improved versions of their predecessors. They are highly modular, easily modifiable and extendable. Systran powers the popular Babel Fish Translation that is available in the Internet. Systran was originally developed in the 1970 to translate Russian to English. Today, it offers a large number of language pairs, including most European as well as some Asian languages. The Commission of European Communities bought an English-French version of this system even though the quality of the translation. After a considerable investment in time and effort by the Commission's evaluators and lexical coding specialists, the quality of the translation had improved enough to do post level editing in multiple language pairs.

There have been also a number of ‘transfer' systems. One of these is METAL, which was supported by Siemens, Germany during the 1980’s. The system became available commercially in the 1990’s but sales were disappointing, and Siemens transferred the rights to METAL to GMS and LANT. The most famous systems based on the 'transfer' process were Ariane at GETA in Grenoble, France - a project that began in the 1960's - and Eurotra - a project funded by the Commission of the European Communities. Neither system met the desired expectations. In the late 1980's Japanese Government agencies, in cooperation with researchers from China, Thailand, Malaysia and Indonesia, sponsored an 'interlingua' system for Asian languages. After more than a decade of effort, this system has not produced the desired results. (For surveys of MT research and development in 1980s and early 1990s see Hutchins 1993, 1994.)

Our own experience with ‘untrained’ MT systems is that translators prefer to start ‘from scratch’ rather post-edit an automated translation. However, even ‘untrained’ MT systems are helpful in many ways. For example, Alta Vista Babel Fish makes it possible to get the idea of what a website in a language we do not know is about - we call this 'gisting'.

Computer Assisted Translation (CAT)

Reverse the letters that make up the MT acronym and you get TM, (Translation Memory), a CAT system that provides a middle ground between MT and unassisted translation. TM is embraced by most translators and translation services. For more on TM systems, please refer to http://www.intersolinc.com/terminology_management.htm

The basic difference between MT and TM systems is that TM systems are intended as an aid to the translator, they are not automated translation systems. Essentially, they allow translators to maintain a collection of terms that they have previously translated so they can re-use them. Translators can then post-edit, or modify their own translations so they become a 100% match with a modified or new version of the SL text. This process is called “fuzzy matching”. The following figure illustrates this process:

In this English to Spanish translation, using the TRADOS TM system, the phrase on the top row, (with the MS Word icon), is the new text to be translated. The phrase in the bottom row indicated by the US flag (for English –US version) is the SL phrase stored in the TM database. The SL (Spanish) text directly underneath is the "fuzzy match" phrase that the translator will have to edit to get to a 100% match with the new SL text.

TM is also been used as the ‘front-end' system for MT systems. MT systems can be ‘trained’ using phrases taken from TM databases.

In conclusion, TM and MT systems are only as good as the data that is fed into them and in both cases these are words and phrases originally produced by human translators. As computing technology continues to advance, the synergy between the computer and the human translator will increase and the results will be mutually beneficial.
 

References

http://inventors.about.com/library/blcoindex.htm
http://www.foreignword.com/Technology/art/Hutchins/hutchins99_2.htm#hist
Different Approaches to Machine Translation, by Jaap Van Der Meer published in MultiLingual Computing Technology, #71, volume 16 Issue 3

 

Join the InterSol, Inc. mailing list
Email:
 

Copyright 1996 - 2008 InterSol, Inc. All Rights Reserved