In my article Translation 33, I attempted a rough assessment of the efficiency of free online translation software offered by Google, Microsoft (BING), and the venerable Yahoo Babel Fish.
In this test both Google and Microsoft proved to be competent in French and Spanish (into English) translation (at this general level). My stated next step was to check the online translation of other languages with different scripts and/or syntax by taking a look at Russian (as an example of a different script, Cyrillic) and Hindi (both script, Devanagari, and syntax). This is what is attempted in this new article (using short extracts from Russian and Hindi Wikipedias).
A preliminary and very topical comment to make is that further reference to Yahoo’s Babel Fish will not be possible here because, as of 30 May, 2012, Yahoo Babel Fish has been either subsumed into or replaced by BING Translator, as indicated in the following recent note from Microsoft:
“We are pleased to welcome Yahoo! Babel Fish users to the Bing Translator family. We have been working closely with our friends at Yahoo! to make this an easy transition, and Bing Translator is a natural upgrade to the experience with Yahoo! Babel Fish. We support all the languages you used with Babel Fish, and provide a superset of all the features.”
Let us now look at the Russian to English situation. The piece chosen is an extract of 172 words from the Russian Wikipedia article on the Indian writer, intellectual and activist, Arundhati Roy. (Section:Политическая деятельность (Politicheskaya dyeyatyel’nost’) Political activities) The style is simple.Here are the two translations for comparison:
Google Translate: (http://translate.google.com)
Subsequently, Arundhati Roy has used his celebrity to draw public attention to important political issues. In a number of essays and speeches, it is opposed to nuclear weapons in India and neighboring Pakistan, as well as against Indian nationalism [citation needed 537 days]. She also took part in protest actions against the dam project on the Narmada River, as such projects are usually at the expense of the earth’s poorest and marginalized populations. Due to its popularity, Roy was able to attract the attention of national and international media to these problems.
Arundhati Roy’s literary activity is completely focused on illumination and critique of political and social themes. She opposed the so-called “war on terrorism” launched by the United States against the war in Iraq and against the policy of the World Bank and WTO. With its rigid stand it has become one of the best known environmental activists, anti-globalization movement and the peace.
In 2002 the High Court in Delhi has sentenced her to prison because she blamed the judges that they wanted to suppress the protests against the construction of a dam on the Narmada River. However, the symbolic conclusion was only one day. (196 words)
The Microsoft BING version: (http://www.microsofttranslator.com)
Subsequently, Arundhati Roy had used their celebrity to draw public attention to important political problems. In a number of essays and speeches, she spoke out against nuclear weapons for India and neighbouring Pakistan, as well as against Indian nationalism. She also participated in the protest activities against the dam project on the River Narmade, as such projects are carried out, usually at the expense of the poor and disenfranchised. Due to its popularity Roy was able to draw the attention of the national and international MEDIA.
Arundhati Roy completely literary activity focused on reporting and criticism of the political and social order. She opposed the so-called “war on terrorism” launched by the United States against the war in Iraq, as well as against the policies of the World Bank and the WTO. Thanks to its unwavering position it has become one of the most prominent environmental activists, peace and antiglobalizacionnogo movements.
In 2002, the Supreme Court in New Delhi sentenced her to jail because she accused judges that those would like to suppress protests against the construction of a dam on the River Narmade. However, the symbolic conclusion was only one day. (197 words)
As with the previous passages from French and Spanish, both Google and Microsoft convey an English version which makes good general sense, without reaching the standard of a professional translation. In spite of obvious flaws and errors, they are both useful to readers who do not speak Russian. More detailed examination of the translations and the Wikipedia originals will be of particular interest to professional translators and experts in Machine Translation (MT).
It would therefore seem reasonable to hypothesise that the Cyrillic script is not a special obstacle to “Western” Machine translation – just as it is reasonable to state that in learning Russian as a second language the Russian script does not present major problems in reading or writing, at least for Europeans.
Now we can move on to present for examination and comparison the performances by Google and Microsoft with an extract from Hindi Wikipedia (201 words; 10 sentences) on the same writer and activist (Section: kraaNtikaaree vichaar. Revolutionary idea). Warning: Readers may be bemused or irritated by these results.
From U.S. imperialism, nuclear arms race, building dams on the Narmada, etc. Many
Local – international issues have been a voice against Arundhati Roy. But now
He believes that at least non-violent protest and civil disobedience in India
Movements is not the point.
Parliamentary system, part of the Communists and who believe in violent resistance
Arundhati admits that he caught in the Maoist ideology of Gandhi fan
Are not. In his words: – “After all, Gandhi was a superstar. When they are hungry – used to strike the
The hunger – strike were sitting on the superstar. But I do not believe in superstar politics.
If the people of the slums, hunger – strike that, no it does not care. ”
Arundhati Bajharwad that go on flowing in the flow of voices of protest in India
Being ignored. Against anti-people system, judiciary and media
Efforts have failed to impress. He said, “I understand that for us
Is very important to consider where we are right, wrong and where. We have given the arguments
They are right … But the violence did not work. ” (184 words)
US imperialism from the nuclear arms race, the Narmada dam, construction etc
Local-international issues that Arundhati Roy-voice ascendant are. But now
He believes in non-violent protests at India and civil disobedience
Movements not talk.
Became part of the parliamentary system and violent resistance in the count Communists
The Maoists of ideologies in which Arundhati admits that Gandhi’s andhabhakt
Are not. In their words-“end Gandhi a superstar. The hunger-strike, so they
The hunger-strike on superstars. But I do not mind you in the superstar.
If the people of a slum, a hunger-strike that it doesn’t care. ”
Arundhati believes that going away bazarvad flow-down of vowels in India
Unheard. janvirodhi system-the judiciary and media
Efforts have failed to impress. He said, “I think for us
It is important to consider where we are great, and where the wrong right. We gave arguments
They are right … But nonviolence is not effective.” (150 words)
These unsatisfactory performances (which, in my experience are not uncommon nor unrepresentative) clearly need much more attention and comment than the Russian translations above, or the French and Spanish ones. For Machine Translation, there is much more work to be done before satisfactory translations from Hindi to English (and some other languages) can be achieved.
From a reading of the English and without any reference to the original, the best that can be said of the translations is that they give glimpses of the subject material but they are not very useful. One can also see that the syntax is disjointed, many sentences are incomplete, and some references are inaccurate. In both Google and Microsoft versions all lines begin with a capital letter (which suggests a new sentence is beginning). From a comparison with the original one may add that the translations also offer some false information or impressions, as well as obvious problems with vocabulary identification and pronoun gender.
The reason why the Google and Microsoft translation systems have not yet been able to cope more satisfactorily with Hindi (and presumably with a number of other languages) is that they still have basic problems in identifying the complicated script, the very “different” syntax of Hindi and even the organisation of print, sentences and paragraphs.First of all, Hindi does not use upper case letters (nor italics or bold distinctions). Secondly, the main punctuation is a vertical bar as a full stop. Commas are used but often sparsely. The inability to deal with these characteristics must surely contribute to the peculiar look of the translations above, with initial capital letters at the beginning of each line.
Finally, let us look at the first sentence of the Hindi Wikipedia original (in transliterated form) to get a further glimpse of what can go wrong.
Amreekee saMraajyavaad se lekar, parmaanu hathiyaaroN ki hor, Narmada par baaNdh nirmaan aadi kaee sthaaneeye – antarrashtreeya mudhoN ke khilaaf avaaz bulaNd kartee rahee haiN arundhati raay.
(my rough translation:)
From American imperialism, the nuclear arms race, to the construction of the Narmada Dam, etc., Arundhati Roy is raising her voice loudly on many local and international issues.
In the Hindi word order, a list of nominal groups is followed by “etc.” and then (literally) “several local-international issues against” (an example of the numerous Hindi “postpositions”, which are very basic and frequent sentence elements) and, finally, the sentence’s Verb and Subject (Arundhati Roy). Very different from: “From U.S. imperialism, nuclear arms race, building dams on the Narmada, etc. ManyLocal – international issues have been a voice against Arundhati Roy.” and “US imperialism from the nuclear arms race, the Narmada dam, construction etc
Local-international issues that Arundhati Roy-voice ascendant are.”
I gave both systems a second chance by submitting the last part of that first sentence on its own. Without the cumbersome word order, Google did better but BING did not.
के ख़िलाफ़ आवाज़ बुलंद करती रही हैं अरुंधति राय
ke khilaaf aavaaz bulaNd kartee rahee haiN aruNdhati raay [roy]
Google: Arundhati Roy has been a voice against
BING: Is Arundhati Roy of that lofty-sounds
We must be grateful to Google and Microsoft for their valuable work on Hindi but we must also hope that the massive problems, briefly signposted in the above exercise, can be solved in the not too distant future. And similarly for other problem languages.
The next logical step would be to examine the quality of Google and BING translation from English into other languages. I will do my best at a later date, using the same four languages.
Au revoir. Hasta luego. Do sveedanya. Phir milenge.