Machine translation (MT) is automated translation. It is the process by which computer software is used to translate a text from one language to another.
To process any translation, human or automated, the meaning of a text in the original (source) language must be fully restored in the target language, i.e. the translation. While on the surface this seems straightforward, it is far more complex. Translation is not a mere word-for-word substitution.
A translator must interpret and analyze all of the elements in the text and know how each word may influence another. This requires extensive expertise in grammar, syntax (sentence structure), semantics (meanings), etc., in the source and target languages, as well as familiarity with each local region.
Human and machine translation each have their share of challenges. For example, no two individual translators can produce identical translations of the same text in the same language pair, and it may take several rounds of revisions to meet customer satisfaction. But the greater challenge lies in how machine translation can produce publishable quality translations.
Rule-based machine translation relies on countless built-in linguistic rules and millions of bilingual dictionaries for each language pair.
The software parses text and creates a transitional representation from which the text in the target language is generated. This process requires extensive lexicons with morphological, syntactic, and semantic information, and large sets of rules. The software uses these complex rule sets and then transfers the grammatical structure of the source language into the target language.
Rule-based Machine Translations are built on gigantic dictionaries and sophisticated linguistic rules. Users can improve translation quality by adding terminology into the translation process. They create user-defined dictionaries which override the system's default settings.
In most cases, there are two steps: an initial investment that significantly increases the quality at a limited cost, and an ongoing investment to increase quality incrementally. While rule-based MT brings companies to a resonable quality threshold, the quality improvement process is generally long and expensive.
This has been a contributing factor to the slow adoption and usage of MT in the localization industry. Surely, there must be a better approach!
Statistical machine translation utilizes statistical translation models generated from the analysis of monolingual and bilingual content. Essentially this approach uses computing power to build sophisticated data models to translate one source language into another. This makes the usage of statistical MT a far simpler proposition. This has been a significant factor in the broader adoption of statisical machine translation technology in the localization industry.
Building statistical translation models is a quick process, and the technology relies on existing multilingual corpora (which in most cases are existing translation memories). Most professional translators have these freely available.
While a minimum of 2 million words for a specific domain are required, theoretically it is possible to reach an acceptable quality threshold with much less . Additionally, statistical machine translation is CPU intensive and requires an extensive hardware configuration to run translation models for acceptable performance levels.
Rule-based MT is by nature predictable. Dictionary-based customization can improve quality and compliance with corporate terminology. But translation results may lack the fluency readers expect. In terms of investment, the customization cycle needed to reach the quality threshold can be long and costly.
Statistical MT provides good quality when qualified corpora are available. The translation is fluent, meaning it reads well and therefore meets user expectations. Training from good corpora is automated and cheaper. However, statistical MT requires significant hardware to build and manage large translation models.
Statistcial Machine Translation technology is growing in acceptance and is by far the clear leader between both technologies.
While statisical MT requires significant computer processing power to build high-quality translation-models, the increasing availability of cloud-based computing makes this technology a game changer for the localization industry.
Statistical MT is the next TM!
Tony O'Dowd - Evangelist MT