March 19, 2009
Machine Translation Playoffs
(MT = Machine Translation, not Montana.)
I just finished testing six MT products against each other. Skip to the bottom to see my conclusions, or just read along and learn about a very interesting (and soon to be hot) subject.
I’ve followed the progress of machine translation for 15 years; basically I’m fascinated by it. During that time I’ve owned, actively used, and kept upgraded on several desktop MT programs, as well as experimenting with online versions. Many people like to laugh at MT, and some of the translations are pretty funny, or worse, mangled. Professional translators in particular enjoy deriding MT, for obvious reasons. But now, quietly, many of them are incorporating MT into their workflow.
Personally, even in the primitive versions years ago, I have found MT to be extremely useful in my work, like a super-dictionary. Paste in a block of text in the source language, it quickly spits out, at the very least, a big pile of useful words in the target language. For a person like myself with very weak Spanish, it gives a jump-start to anything I’m trying to write.
Well, Folks, fast-forward to 2009. If you’re reading it here first, I’m pleased: MT is at a tipping point where the output is suddenly not just a pile of useful words, but something a person can basically read and understand. And for reasons I’m about to explain, it’s going to get much better, very quickly. Lots of smart people have been working on these systems in the background for years, while meanwhile the world itself has been changing. Now something big has happened. I didn’t read about this. It’s easy to see it with my own eyes. Below are the results of the testing I did. But first, some background.
Very briefly, here’s what’s happened in MT. All early MT, up until maybe five years ago, was what they call rule-based. This was the only way to do it in those days. It means programming grammatical rules from both source and target languages into a computer, so that the computer can parse the source sentence into nouns, verbs, etc, and effectively guess appropriate replacement words or phrases, stringing together a grammatically correct target sentence. People worked very hard on these programs, and their output got better over the years.
More recently a completely different approach has become possible. Many academics in the field of Computational Linguistics saw this coming years ago — a statistical approach to MT which might theoretically work much better than the rule-based approach. (Computational Linguistics is the general field which includes voice-recognition, text-to-speech, and of course translation.) Statistically-based MT works similar to Google Search. It takes advantage of newly-available cloud computing, along with newly-available voluminous parallel text (a.k.a. aligned bilingual material) in machine-readable form. A good example of parallel text would be the United Nations website, massive, deep, easy for a computer to read, and every page perfectly translated by professional translators into several target languages. The idea is simple, the execution less so. But once the algorithms are worked out, you can do any language pair you want as long as you have a big enough corpus of parallel text. Without any knowledge of meaning, the computer looks for statistical correlations at the word, phrase, and sentence level, and after enough exposure to enough good human translations, it is able to guess (much of the time) a workable translation.
With time, the algorithms are improved and debugged. With time, the system gobbles more and more parallel text. Then just like Google Search learns from human Web designers what pages are important, statistically-based machine translation learns from human translators which phrases sound best to humans.
This neatly steps around several insurmountable problems with the rule-based approach. For example, think how many meanings there are to the simple English word "Set." Just knowing whether it’s in the position of a noun or a verb won’t give you a good translation. The statistical approach factors odds, without any intrinsic intelligence, based on whether the word "tennis," "table," "hen," or "Boolean" appear nearby. No human judgment is involved, except at the level of writing the original learning algorithms. Think, too, about all of the weird expressions in any language. A good human translator knows what to do with "death warmed over." With any luck the algorithm will remember seeing this phrase in parallel text before, and simply plagiarize the human translator, rather than swapping in translations for "death" and "warm."
Long story short, I’m here to report that sometime in the last few years the statistical approach passed out the rule-based approach. And right now, statistics are leaving rules in the dust. A decade ago nobody knew for sure which method would prevail, but now the results speak for themselves. Rules still have their place — they can be included in the algorithms to help the computer more accurately correlate parallel text — but the heart of machine translation is now massive computational horsepower operating on massive volumes of (hopefully high-quality) parallel text. Added to this, all of the leaders in the field are designing in a collaborative component, meaning that the system can learn from its own mistakes by capturing the corrections made by human translators during post-editing. Anyway, suffice to say there’s a lot going on.
The testing I did was for my own purposes, to see which MT tool I wanted to use for day to day chores like cranking out e-mails. I used a passage from this blog, about 500 words, and carefully compared the output for readability and usability. Here’s how I ranked them:
1st place: Google Translate
Output was far from perfect, but overall it felt cleanest, easiest to read. Had fewer mangled phrases, and occasionally it showed uncanny common sense. Nice interface. Not only that, Google cares a lot about MT and is in this race to win — their translations are certain to improve quickly. MT is a perfect match for Google as a company, it involves everything they do best. They know that if they can achieve winner-take-all in translation, like they did with search, it will be huge.
Close 2nd place: Babylon/Language Weaver
Language Weaver is a much smaller, independent company 100% devoted to MT, founded by former academics in computational linguistics. Up until now its product has only been available to large corporations and the government. I looked into buying a license for myself, it would have cost $5000, plus $1000/year to maintain — and that’s unidirectional only. Recently, though, they quietly signed a contract to provide back-end translations for Babylon, an Israeli company with an affordable desktop product. Output was very impressive on my sample text, different from Google, sometimes better, sometimes worse.
3rd place: Windows Live Translator
Microsoft has always had a commitment to computational linguistics, and of course they’ve entered the MT race. It’s a bit more of a sideline for them, and they’re clearly playing catch-up. Output included some unforgivable mistakes, but at the same time, the occasional brilliant translation — almost spooky how human. It wasn’t as readable overall though, due to the bloopers. But just like Google and Language Weaver, it is likely to improve quickly. And true to their culture they’ve done some really beautiful work on the interface.
Distant last: Rule-based MT
I own and use (make that "did use") three rule-based products. Systran has been the perceived leader in MT for years, but in my test it did awful. I’m not going to use it anymore unless I can’t get a connection (all three statistical products require that big server in the sky). Another product, LEC Translate 2005, cost me $200 when I bought it, and just never worked. After this latest test, I uninstalled it. The third product is called Power Translator Pro 7. It’s old — it was the reincarnation of Globalink, a very early MT program, and must be at least eight years old now. Oddly, it’s also the precursor to LEC translate. I say oddly, because antiquated as it is, it had decent quality, probably even a little better than Systran. I’m going to keep it. It’s a good old horse, and has some nice dictionary tools.
Bear in mind this was a single test of 500 words, and just English-Spanish. But things are changing so fast a person really should do their own tests anyway once a year, and see what works best for them. I just wrote an e-mail to my domain provider in Mexico. My method was to quickly dictate in English to Google Translate, click, then clean up the Spanish output using alternative suggestions from Windows Live, copy+paste. It worked great — at my level of proficiency I got the e-mail done painlessly, and much faster than if I wrote in Spanish directly. As soon as I finish this post I intend to purchase Babylon to get access to Language Weaver. I’ll tell you how that works. Meanwhile, if you want to just have fun, go to Google Translate and treat yourself to a half-hour exploring the Chinese or Arabic web. Check out Google’s hover feature. Then if you’re still enjoying yourself, visit Windows Live Translator, and play with their Bilingual Viewer to see where the interfaces are going. The whole thing is fun to watch, and moving very quickly.
Filed by Pete under Knowledge management,Progress notes,Recommendations
2 Comments



Pete,
Thanks for an interesting post. I have referenced it on http://www.international-english.co.uk/mt-evaluation.html.
[...] translation, reporting the state of things back then, and testing six leading MT products ("MT playoffs, March 2009"). The results had Google in first place, with Language Weaver next, and Microsoft third. I [...]