Posted tagged ‘Wikipedia’

Translation 36. Free Internet Translation Software: The Contest between Google Translate and Microsoft’s BING Translator. Russian and Hindi

13 June 2012

In my article Translation 33, I attempted a rough assessment of the efficiency of free online translation software offered by Google, Microsoft (BING), and the venerable Yahoo Babel Fish.
In this test both Google and Microsoft proved to be competent in French and Spanish (into English) translation (at this general level). My stated next step was to check the online translation of other languages with different scripts and/or syntax by taking a look at Russian (as an example of a different script, Cyrillic) and Hindi (both script, Devanagari, and syntax).  This is what is attempted in this new article (using short extracts from Russian and Hindi Wikipedias).

A preliminary and very topical comment to make is that further reference to Yahoo’s Babel Fish will not be possible here because, as of 30 May, 2012, Yahoo Babel Fish has been either subsumed into or replaced by BING Translator, as indicated in the following recent note from Microsoft:
“We are pleased to welcome Yahoo! Babel Fish users to the Bing Translator family. We have been working closely with our friends at Yahoo! to make this an easy transition, and Bing Translator is a natural upgrade to the experience with Yahoo! Babel Fish. We support all the languages you used with Babel Fish, and provide a superset of all the features.”

Let us now look at the Russian to English situation. The piece chosen is an extract of  172 words from the Russian Wikipedia article on the Indian writer, intellectual and activist, Arundhati Roy. (Section:Политическая деятельность (Politicheskaya dyeyatyel’nost’)  Political activities) The style is simple.Here are the two translations for comparison:

Google Translate: (http://translate.google.com)
Subsequently, Arundhati Roy has used his celebrity to draw public attention to important political issues. In a number of essays and speeches, it is opposed to nuclear weapons in India and neighboring Pakistan, as well as against Indian nationalism [citation needed 537 days]. She also took part in protest actions against the dam project on the Narmada River, as such projects are usually at the expense of the earth’s poorest and marginalized populations. Due to its popularity, Roy was able to attract the attention of national and international media to these problems.

Arundhati Roy’s literary activity is completely focused on illumination and critique of political and social themes. She opposed the so-called “war on terrorism” launched by the United States against the war in Iraq and against the policy of the World Bank and WTO. With its rigid stand it has become one of the best known environmental activists, anti-globalization movement and the peace.

In 2002 the High Court in Delhi has sentenced her to prison because she blamed the judges that they wanted to suppress the protests against the construction of a dam on the Narmada River. However, the symbolic conclusion was only one day. (196 words)

The Microsoft BING version: (http://www.microsofttranslator.com)

Subsequently, Arundhati Roy had used their celebrity to draw public attention to important political problems. In a number of essays and speeches, she spoke out against nuclear weapons for India and neighbouring Pakistan, as well as against Indian nationalism. She also participated in the protest activities against the dam project on the River Narmade, as such projects are carried out, usually at the expense of the poor and disenfranchised. Due to its popularity Roy was able to draw the attention of the national and international MEDIA.

Arundhati Roy completely literary activity focused on reporting and criticism of the political and social order. She opposed the so-called “war on terrorism” launched by the United States against the war in Iraq, as well as against the policies of the World Bank and the WTO. Thanks to its unwavering position it has become one of the most prominent environmental activists, peace and antiglobalizacionnogo movements.

In 2002, the Supreme Court in New Delhi sentenced her to jail because she accused judges that those would like to suppress protests against the construction of a dam on the River Narmade. However, the symbolic conclusion was only one day. (197 words)

As with the previous passages from French and Spanish, both Google and Microsoft convey an English version which makes good general sense, without reaching the standard of a professional translation. In spite of obvious flaws and errors, they are both useful to readers who do not speak Russian. More detailed examination of the translations and the Wikipedia originals will be of particular interest to professional translators and experts in Machine Translation (MT).

It would therefore seem reasonable to hypothesise that the Cyrillic script is not a special obstacle to “Western” Machine translation – just as it is reasonable to state that in learning Russian as a second language the Russian script does not present major problems in reading or writing, at least for Europeans.

Now we can move on to present for examination and comparison the performances by Google and Microsoft with an extract from Hindi Wikipedia (201 words; 10 sentences) on the same writer and activist (Section: kraaNtikaaree vichaar. Revolutionary idea). Warning: Readers may be bemused or irritated by these results.

Google

From U.S. imperialism, nuclear arms race, building dams on the Narmada, etc. Many
Local – international issues have been a voice against Arundhati Roy. But now
He believes that at least non-violent protest and civil disobedience in India
Movements is not the point.
Parliamentary system, part of the Communists and who believe in violent resistance
Arundhati admits that he caught in the Maoist ideology of Gandhi fan
Are not. In his words: – “After all, Gandhi was a superstar. When they are hungry – used to strike the
The hunger – strike were sitting on the superstar. But I do not believe in superstar politics.
If the people of the slums, hunger – strike that, no it does not care. ”

Arundhati Bajharwad that go on flowing in the flow of voices of protest in India
Being ignored. Against anti-people system, judiciary and media
Efforts have failed to impress. He said, “I understand that for us
Is very important to consider where we are right, wrong and where. We have given the arguments
They are right … But the violence did not work. ” (184 words)

Microsoft BING

US imperialism from the nuclear arms race, the Narmada dam, construction etc

Local-international issues that Arundhati Roy-voice ascendant are. But now
He believes in non-violent protests at India and civil disobedience
Movements not talk.
Became part of the parliamentary system and violent resistance in the count Communists
The Maoists of ideologies in which Arundhati admits that Gandhi’s andhabhakt
Are not. In their words-“end Gandhi a superstar. The hunger-strike, so they
The hunger-strike on superstars. But I do not mind you in the superstar.
If the people of a slum, a hunger-strike that it doesn’t care. ”
Arundhati believes that going away bazarvad flow-down of vowels in India
Unheard. janvirodhi system-the judiciary and media
Efforts have failed to impress. He said, “I think for us
It is important to consider where we are great, and where the wrong right. We gave arguments
They are right … But nonviolence is not effective.” (150 words)

These unsatisfactory performances (which, in my experience are not uncommon nor unrepresentative) clearly need much more attention and comment than the Russian translations above, or the French and Spanish ones. For Machine Translation, there is much more work to be done before satisfactory translations from Hindi to English (and some other languages) can be achieved.

From a reading of the English and without any reference to the original, the best that can be said of the translations is that they give glimpses of the subject material but they are not very useful. One can also see that the syntax is disjointed, many sentences are incomplete, and some references are inaccurate. In both Google and Microsoft versions all lines begin with a capital letter (which suggests a new sentence is beginning). From a comparison with the original one may add that the translations also offer some false information or impressions, as well as obvious problems with vocabulary identification and pronoun gender.

The reason why the Google and Microsoft translation systems have not yet been able to cope more satisfactorily with Hindi (and presumably with a number of other languages) is that they still have basic problems in identifying the complicated script, the very “different” syntax of Hindi and even the organisation of print, sentences and paragraphs.First of all, Hindi does not use upper case letters (nor italics or bold distinctions). Secondly, the main punctuation is a vertical bar as a full stop. Commas are used but often sparsely. The inability to deal with these characteristics must surely contribute to the peculiar look of the translations above, with initial capital letters at the beginning of each line.

Finally, let us look at the first sentence of the Hindi Wikipedia original (in transliterated form) to get a further glimpse of what can go wrong.

Amreekee saMraajyavaad se lekar, parmaanu hathiyaaroN ki hor, Narmada par baaNdh nirmaan aadi kaee sthaaneeye – antarrashtreeya mudhoN ke khilaaf avaaz bulaNd kartee rahee haiN arundhati raay. 

(my rough translation:)
From American imperialism, the nuclear arms race, to the construction of the Narmada Dam, etc., Arundhati Roy is raising her voice loudly on many local and international issues.

In the Hindi word order, a list of nominal groups is followed by “etc.” and then (literally) “several local-international issues against” (an example of the numerous Hindi “postpositions”, which are very basic and frequent sentence elements) and, finally, the sentence’s Verb and Subject (Arundhati Roy). Very different from: “From U.S. imperialism, nuclear arms race, building dams on the Narmada, etc. ManyLocal – international issues have been a voice against Arundhati Roy.”  and “US imperialism from the nuclear arms race, the Narmada dam, construction etc
Local-international issues that Arundhati Roy-voice ascendant are.”

I gave both systems a second chance by submitting the last part of that first sentence on its own. Without the cumbersome word order, Google did better but BING did not.

के ख़िलाफ़ आवाज़ बुलंद करती रही हैं अरुंधति राय
ke khilaaf aavaaz bulaNd kartee rahee haiN aruNdhati raay [roy]

Google: Arundhati Roy has been a voice against
BING: Is Arundhati Roy of that lofty-sounds
*

We must be grateful to Google and Microsoft for their valuable work on Hindi but we must also hope that the massive problems, briefly signposted in the above exercise, can be solved in the not too distant future. And similarly for other problem languages.

The next logical step would be to examine the quality of Google and BING translation from English into other languages. I will do my best at a later date, using the same four languages.

Au revoir. Hasta luego. Do sveedanya. Phir milenge.

Advertisements

Favourites on this Blog – for Holiday Reading

27 December 2011

Of the one hundred and eleven blogs posted here since 2008, these are the 16 that have attracted most attention. Unlike other more ephemeral blogs, the subject matter seems to remain of interest.

With my good wishes for the New Year.

General
1.
New Hope for Disempowered Women

2.
‘The Fragmentation of Information in Wikipedia’

3.
‘Please dress up the Em dash’

4.
‘Global warming debate. 1’

5.
‘Global warming debate. 2’
‘Global Warming Controversy. Part 2. Global Warming Scepticism: Some Basic Data & Chronological Notes’

6.
‘Julia Owen and bee stings in Bromley’

7.
‘Julia Owen, Retinitis Pigmentosa, and the Media. Part 1’
(Part 2 will follow in the New Year.)

Languages

1.
Of 33 offerings on Translation and Interpreting topics, this item has captured most attention:
‘Translation 8. Fluency in foreign languages. The case of Dr Condoleezza Rice’
(See also ‘Translation. 30’.)

2.
‘Translation 32. David Bellos’s Revealing Book on Translation and the Meaning of Everything’

3.
‘Spanish Pronunciation in the Media’

Spain

1.
‘The European Union’s verdict on the Franco Régime in Spain (1939-1975)’

2.
‘Justo Gallego – the lone twentieth century Cathedral Builder’

India

1.
‘Contemporary India. 1. Basic Sources of Information’

2.
‘A Visit to Sathya Sai Baba’s ashram in October 2008’

3.
‘Sathya Sai Baba: Questionable Stories and Claims. Part 1’

4.
‘Fuzzy Dates in the Official Biography of Sathya Sai Baba. A Re-examination’

Translation 33. Free Internet Translation Software: The Contest between Google Translate and Microsoft’s BING Translator

24 November 2011

Machine Translation (MT) software comes in many forms and in two specific categories: commercial, and free of charge. At the top end of the commercial offerings are sophisticated and expensive software tools used by professional freelance translators and translation companies in order to ease and speed up their laborious tasks. The name TRADOS is one of the most used. It offers packages costing from 600 to 2,500 Euros. At the lower commercial level there are many products costing between $60 and $120 for help with translating between the major European languages, or at least between English and those languages. For free Internet translation services, the current leader is Google Translate, closely followed by its recent challenger Microsoft’s BING Translator. Both produce fast but basic translations of all sorts of Internet material, in a very wide range of languages and language pairs. For a number of years, the earlier Internet leader was Altavista’s Babelfish. Under the Yahoo label, this free programme is still available and widely used but with the two younger competitors making fast progress with their more effective MT formulas, it is showing its age.

As a preliminary sample of MT, take the following absurdly easy test used by blog researchers comparing and rating ten budget software translation packages. From this site,
the references lead to this basic test, from Spanish to English.

“Abuela, ¿por qué tienes los ojos tan grandes?” Caperucita Roja preguntó. “Para que yo pueda ver mejor,” Dijo la abuela. “¡Oh, abuelita, ¿por qué tienes la boca tan grande?” “Para poder comerte mejor!” Entonces, la abuela salta de la cama.

They offer the following as a “Correct Translation” against which to compare the ten commercial contenders:
“Grandma, why do you have such big eyes?” Little Red Riding Hood asked. “So that I can see better.” the grandma said. “Oh, Grandma, why do you have such a big mouth?” “So I can eat better!” Then, the grandma jumps out of the bed.

For a description of the major three free Internet MT systems listed above and a judgement on their relative qualities, see John Yunker’s articles on the work of Ethan Shen, starting with this one and following the links).(Shen pronounces Google Translate as the overall winner.)

Another strong recommendation of Google’s quality and breadth of coverage as well as a clear explanation of the Google method is to be found in Chapter 23 of David Bellos’s recent wide-ranging book on Translation, Is That a Fish in Your Ear – by now a runaway bestseller.

The chapter offers a potted history of MT and expresses Bellos’s very positive view of the advances in MT achieved by Google, emphasising its novel approach to the task of MT. In a recent article, Bellos offers an edited version of pages 263-266 of that chapter (‘The Adventure of Automated Language Translation Machines’) in which, in characteristic manner, he succinctly explains the complex Google system to us:

“Using software originally developed in the 1980s by researchers at IBM, Google has created an automatic translation tool that is unlike all others. It is not based on the intellectual presuppositions of early machine translation efforts – it isn’t an algorithm designed only to extract the meaning of an expression from its syntax and vocabulary.
“In fact, at bottom, it doesn’t deal with meaning at all. Instead of taking a linguistic expression as something that requires decoding, Google Translate (GT) takes it as something that has probably been said before.
“It uses vast computing power to scour the internet in the blink of an eye, looking for the expression in some text that exists alongside its paired translation.
“The corpus it can scan includes all the paper put out since 1957 by the EU in two dozen languages, everything the UN and its agencies have ever done in writing in six official languages, and huge amounts of other material, from the records of international tribunals to company reports and all the articles and books in bilingual form that have been put up on the web by individuals, libraries, booksellers, authors and academic departments.
“Drawing on the already established patterns of matches between these millions of paired documents, Google Translate uses statistical methods to pick out the most probable acceptable version of what’s been submitted to it.”

Although he admits that Google Translate results are not always satisfactory, Bellos forecasts a rosy future for MT and for Google in particular as it improves and adds to its fabulous corpora in 58 language.

To give an idea of the standard of translation achieved by Google, and to give a glimpse of what Professor Bellos’s enthusiasm is founded on, I propose to offer and examine samples of translations into English from four languages. The additional factor is that BING (which offers 2-way translations to and from 37 languages as compared with the 58 Google pairs cited by Professor Bellos) will be subjected to the same tests, as evidence of this battle of the Free to Ether Translation Titans. (Results from Yahoo’s Babelfish are offered at the end of the piece.)

Firstly (in the current article) I present and compare translations from French and Spanish into English. In a later blog article I hope to offer similar material from Russian and Hindi (probably transliterated to fit in the WordPress system). From these disparate examples, we may be able to discern the strengths of the two software programmes and some of the problems which still remain to be overcome in the search for workable and useful translations into and out of all printed languages.

By way of Prologue to the proposed comparisons, if we try the ‘Little Red Riding Hood’ test sample on Google and BING, we get the following results.

Google Translate:
“Grandma, why are your eyes so big?” Little Red Riding Hood said. “So that I can see better,” said the grandmother. “Oh, Grandma, why your mouth is so big?” “To eat better!” Then the grandmother jumps out of bed.

There are two unsatisfactory translations here:
why your mouth is so big?” “To eat better!”

BING Translator:
“Grandmother, why you have such large eyes?” Little Red Riding Hood asked. “So that I can see better,” said the grandmother. “Oh, grandmother, why have the big mouth?!” “To be able to eat better!” Then Grandma jumps out of bed.

Again, two unsatisfactory translations, and for the same segment:
why have the big mouth?!” and “To be able to eat better!”

Both Google and BING completely miss the agglutinated Spanish pronoun in “comer” + “-te” (“to eat YOU better”), but, IMHO, Google is marginally in front of BING in the second listed infelicity.

Now let us move on to a more challenging test of MT ability. For this I have chosen short segments from French and Spanish Wikipedia on a topic of recent interest.

1.
French Wikipedia: ‘Crise financière mondiale débutant en 2007’
“La crise financière mondiale qui a commencé en 2007 est une crise financière marquée par une crise de liquidité et parfois par des crise [sic: = crises] de solvabilité tant au niveau des banques que des Etats, et une raréfaction du crédit au niveau des entreprises. Amorcée en juillet 2007, elle trouve son origine dans le dégonflement de bulles de prix (dont la bulle immobilière américaine des années 2000) et les pertes importantes des établissements financiers provoquées par la crise des subprimes. C’est la crise la plus grave de l’histoire des bourses de valeurs, après celle de 1873, découlant de la crise bancaire de mai 1873.
“La crise financière de l’automne 2008 amplifie le mouvement et provoque une chute des cours des marchés boursiers et la faillite de plusieurs établissements financiers. Pour éviter une crise systémique, les Etats doivent intervenir et sauver de nombreuses banques ce qui provoquera une crise de la dette publique en Islande d’abord puis en Irlande. Par ailleurs, elle provoque une récession touchant l’ensemble de la planète. Les finances publiques ont été lourdement sollicitées pour résoudre cette crise. Le déficit public s’est creusé dans de nombreux pays, après un recul du produit intérieur brut mondial de 2,2% en 2009.”
(200 words)

(Points for attention are italicised.)

Google:
“The global financial crisis that began in 2007 is a financial crisis marked by a liquidity crisis and sometimes by solvency crisis at both banks of the States, and a credit crunch at the company level. Began in July 2007, it has its origins in the bursting of price bubbles (including the U.S. housing bubble of the 2000s) and losses of financial institutions caused by the subprime crisis. This is the worst crisis in the history of stock exchanges, after that of 1873, arising from the banking crisis of May 1873.
The financial crisis of autumn 2008 amplifies the movement and causes a fall in equity markets and the collapse of several financial institutions. To avoid a systemic crisis, governments must step in and save many banks which will cause a crisis of public debt in Iceland first and then in Ireland. Moreover, it causes a recession in the entire world. Public finances were heavily used to solve this crisis. The deficit has widened in many countries, after a decline in global GDP by 2.2% in 2009.” (177 words)

Microsoft:

“The global financial crisis that began in 2007 is a financial crisis marked by a crisis of liquidity and solvency crisis sometimes both at the level of the banks of the States, and a scarcity of credit at the level of enterprises. Started in July 2007, it finds its origin in the bursting of bubbles of awards (including the the 2000 US housing bubble) and losses of financial institutions caused by the subprime crisis. It is the most serious crisis in the history of stock exchanges, after 1873, arising from the banking crisis of May 1873.
The financial crisis of autumn 2008 amplifies the movement and causes a collapse in stock market prices and the bankruptcy of several financial institutions. To prevent a systemic crisis, States should intervene and save many banks which will cause a crisis of public debt in Iceland first and then in Ireland. In addition, it causes a recession affecting the entire planet. Public finances were heavily sought to resolve this crisis. The public deficit widened in many countries, after a decline of 2.2% in 2009 world gross domestic product.” (169 words)

These are worthy attempts, useful to the general reader looking for a gist, and produced, on demand, in a few seconds. All that is needed to make them more reliable is shown below (in bold type).

Google, improved:

“The global financial crisis that began in 2007 is a financial crisis marked by a liquidity crisis and sometimes by solvency crises for both banks and States, and a credit crunch at the company level. Beginning in July 2007, it has its origins in the bursting of price bubbles (including the U.S. housing bubble of the 2000s) and serious losses by financial institutions caused by the subprime crisis. This is the worst crisis in the history of stock exchanges, after that of 1873, arising from the banking crisis of May 1873.
The financial crisis of autumn 2008 amplifies the movement and causes a fall in equity markets and the collapse of several financial institutions. To avoid a systemic crisis, governments had to step in and save many banks, which was to cause a crisis of public debt first in Iceland and then in Ireland. Moreover, it caused a recession in the entire world. Public finances were heavily used to solve this crisis. The deficit has widened in many countries, after a decline in global GDP of 2.2% in 2009.” (177 words)

BING, improved:

“The global financial crisis that began in 2007 is a financial crisis marked by a crisis of liquidity and sometimes by solvency crises both at the level of the banks and of the States, and by a scarcity of credit at the company level. Commencing in July 2007, it has its origin in the bursting of price bubbles (including the 2000 US housing bubble) and the serious losses of financial institutions caused by the subprime crisis. It is the most serious crisis in the history of stock exchanges, after the 1873 crisis, arising from the banking crisis of May 1873.
The financial crisis of autumn 2008 amplifies the movement and causes a collapse in stock market prices and the bankruptcy of several financial institutions. To prevent a systemic crisis, States had to intervene and save many banks, which was to cause a crisis of public debt first in Iceland and then in Ireland. In addition, it caused a recession affecting the entire planet. Public finances were heavily drawn on to resolve this crisis. The public deficit widened in many countries, after a decline of 2.2% in world gross domestic product in 2009.” (169 words)

2.
Spanish Wikipedia: ‘Crisis económica de 2008-2011’

“Por crisis económica de 2008 a 2011 se conoce a la crisis económica mundial que comenzó ese año, originada en los Estados Unidos. Entre los principales factores causantes de la crisis estarían los altos precios de las materias primas, la sobrevalorización del producto, una crisis alimentaria mundial y energética, una elevada inflación planetaria y la amenaza de una recesión en todo el mundo, así como una crisis crediticia, hipotecaria y de confianza en los mercados. La causa raíz de toda crisis según la Teoría austríaca del ciclo económico es una expansión artificial del crédito. En palabras de Jesús Huerta de Soto «esta crisis surge de la expansión crediticia ficticia orquestada por los bancos centrales, y que ha motivado que los empresarios invirtieran donde no debían”.
“La crisis iniciada en el 2008 ha sido señalada por muchos especialistas internacionales como la “crisis de los países desarrollados”, ya que sus consecuencias se observan fundamentalmente en los países más ricos del mundo.” (159 words)

(Points for attention are italicised.)

Google

In economic crisis from 2008 to 2011 is known to the world economic crisis that began that year, which originated in the United States. Among the main factors causing the crisis would be the high prices of raw materials, the overvaluation of the product, a global food and energy crisis, high inflation and the threat of global recession around the world and a credit crisis trust and mortgage markets. The root cause of all crises as the Austrian theory of business cycle is an artificial expansion of credit. In the words of Jesus Huerta de Soto “this crisis arises from the fictitious credit expansion orchestrated by central banks, and has motivated entrepreneurs to invest where there were”.
The crisis that began in 2008 has been noted by many international experts as the “crisis of the developed countries”, since its effects are observed mainly in the richer countries of the world. (150 words)

Microsoft

The global economic crisis that began that year, originating in the United States is known by economic crisis of 2008 to 2011. Among the major causative factors of the crisis would be high prices of raw materials, the sobrevalorización of the product, energy and global food crisis, high global inflation and the threat of a recession around the world, as well as a loan, mortgage crisis and confidence in the markets. Caused by following every crisis according to the Austrian business cycle theory is an artificial expansion of credit. In the words of Jesus Huerta de Soto “this crisis stems from the fictional credit expansion orchestrated by central banks, and that has motivated entrepreneurs to invest where wrong”.
The crisis which began in 2008 has been brought by many international experts as the ‘crisis of developed countries’, already that its consequences are observed mainly in countries richest in the world. (150 words)

Google, improved

The
economic crisis of 2008 to 2011 is the title given to the world economic crisis that began that year and originated in the United States. Among the main factors causing the crisis would be the high prices of raw materials, the overvaluation of the product, a global food and energy crisis, high inflation and the threat of global recession around the world and a crisis in credit, mortgages and market confidence. The root cause of all crises according to the Austrian theory of the business cycle is an artificial expansion of credit. In the words of Jesus Huerta de Soto “this crisis arises from the fictitious credit expansion orchestrated by central banks, and has motivated entrepreneurs to invest where they should not have done“.
The crisis that began in 2008 has been noted by many international experts as the “crisis of the developed countries”, since its effects are observed mainly in the richer countries of the world. (158 words)

BING, improved

The global economic crisis that began in 2008, originating in the United States, is known as the economic crisis of 2008 to 2011. Among the major causative factors of the crisis would be high prices of raw materials, the overvaluation of the product, a global food and energy crisis, high global inflation and the threat of a recession around the world, as well as a loan crisis, a mortgage crisis and loss of confidence in the markets. The root cause of every crisis, according to the Austrian business cycle theory is an artificial expansion of credit. In the words of Jesus Huerta de Soto “this crisis stems from the fictional credit expansion orchestrated by central banks, and that has motivated entrepreneurs to invest where they should not have done.”
The crisis which began in 2008 has been labelled by many international experts as the ‘crisis of developed countries’, since its consequences are observed mainly in the richest countries in the world. (161 words)

So, on the above evidence, both of these translation tools, Google and BING, offer a very useful BASIC – and lightning fast – FREE service for French and Spanish to English to millions of Internet users. (The situation of English INTO French and Spanish needs separate attention and may be dealt with in a future blog.)

For comparison, here are the results obtained Yahoo’s Babelfish with its updated but ageing technology. Note the higher number of italicised items, and their nature.

1. French to English:
“The world financial crisis which started in 2007 is a financial crisis marked by a crisis of liquidity and sometimes by crisis of solvency as well on the level of the banks as of the States, and a rarefaction of the credit on the level of the companies. Started in July 2007, it finds its origin in the deflation of bubbles of price (of which the American real estate bubble of the years 2000) and important losses of the financial institutions caused by the crisis of the subprimes. C’ is the most serious crisis of l’ history of the purses of values, after that of 1873, rising from the banking crisis of May 1873. The financial crisis of l’ autumn 2008 amplifies the movement and causes a fall of the courses of the stockmarkets and the bankruptcy of several financial institutions. To avoid a systemic crisis, the States must intervene and save many banks what will cause a crisis of the national debt in Iceland d’ access then in Ireland.
In addition, it causes a recession concerning l’ together of planet. Public finances were heavily requested to solve this crisis. The public deficit s’ is dug in many countries, after a retreat of the world gross domestic product of 2,2% in 2009.”

2. Spanish to English

By economic crisis from 2008 to 2011 it is known world-wide the economic crisis that began that year, originated in the United States. Between the main factors causes of the crisis they would be the high prices of the raw materials, the sobrevaluation of the product, world-wide an nourishing crisis and energetics, a high planetary inflation and the threat of a recession anywhere in the world, as well as a credit, hypothecating crisis and of confidence in the markets. The root cause of all crisis according to the Austrian Theory of the economic cycle is an artificial expansion of the credit. In words of Jesus Kitchen garden of Grove “this crisis arises from the fictitious credit expansion orchestrated by the central banks, and that have motivated that the industralists invested where they did not have”. The crisis initiated in the 2008 has been indicated by many international specialists like the “crisis of the developed countries”, since their consequences are observed essentially in the richest countries of the world.”
*

In a later blog, passages will be selected from two languages which are “more foreign” to English speakers, and for which less raw material has been available to the colossal Internet data banks on which Google Translate and Microsoft Translator rely for their lightning fast searches. The samples will be taken from Russian and Hindi, languages whose structures differ more basically from English than its familiar French and Spanish cousins.

Da svidanya. Phir milenge

(For a lighter and enlightening finish to this long essay, Google’s own explanation of its system is to be found here.)

Translation 31. David Bellos on Google Translate and Much Else

24 September 2011

In this article published in The Independent recently (which I found insightful but about which many of the 128 commenters expressed reservations and doubts), Professor David Bellos, a French literary translator, sings the praises of Google Translate, that extraordinary multilingual translation tool used by millions of web surfers. Bellos illuminates several points which many casual users of Google Translate may not have been aware of, including two which aroused my special interest.
1. English is the predominant “tool” in the Google operation. Because of its ubiquitousness in print, English provides Google with the major part of the input material on which its very complex translation operations are based.
2. Literary translators (like Bellos) play a quite important part in providing the essential raw material on which Google Translate relies.

Much more importantly, this thousand word article is an extract from Bellos’s recently published book on Translation: Is that a Fish in your Ear? and the Meaning of Everything, published in USA (by Faber) and more recently in UK (by Penguin). (Check the Wikipedia page on David Bellos.)

I have just ordered a copy and will offer more comments when I receive the book.
*
(See also a much earlier article , with references, published by the New York Times.)

Parochial Health Alert for Your Consideration: Leopard Geckos from Overseas

5 June 2009

A small but significant number of my potential 6 billion readers should be grateful to The Mornington and Southern Peninsula Mail (Victoria, Australia) for alerting its few thousand readers to the possible dangers (to a small minority of human beings, and to other reptiles) of keeping as pets imported leopard geckos from India, Pakistan or Afghanistan.

Here is their sombre news of 3 June 2009, available on http://www.morningtonpeninsulamail.com.au

Gecko Raid Nets Collector

“A Bittern woman is likely to be charged with keeping adult leopard geckos following a raid by wildlife officers.

Department of Sustainability and Environment investigator Keith Larner said the woman, 41, faced penalties of up to A$110,000 in fines and/or two years in jail for keeping the banned lizards.

DSE and Victoria Police officers obtained a search warrant and raided her home after DSE received an anonymous call to its customer service centre.

Officers found three leopard geckos, which are native to India, Afghanistan and Pakistan and can carry parasitic diseases including cryptosporidium and coccidiosis, which are highly contagious to humans and reptiles.

“We don’t know who called, but we greatly appreciate the information,” Mr Larner said.

He said it was illegal under state and federal legislation to possess, breed, or trade exotic reptiles such as leopard geckos.

“Keeping exotic reptiles is selfish and highly irresponsible,” Mr Larner said. “It’s alarming to see the lengths people will go to just so they can have an exotic pet.”

Consistent with most exotic reptiles seized in Vitoria , the geckos will be euthanised due to the risk of disease.

The leopard gecko has been captive-bred in the United States for more than 30 years and is one of the most comonly kept lizards, with some living to 25 years old.

It comes in a variety of colours, patterns and sizes, and grows up to 28 centimetres long.

Rare coloured geckos can cost $4,000. Collectors are intrigued by its eyelids, ability to wash its eyes with its tongue and a tail that drops off when it is threatened.

Unlike other species of gecko, leopard geckos have small claws instead of adhesive toe pads and cannot climb walls.

(Anyone with information on exotic reptiles held in the community can call the DSE on 136 186. Information will be treated in confidence.)”

*

Interestingly, Wikipedia, less alarmist but more ubiquitous than The Mornington and Southern Peninsula Mail (Victoria, Australia), gives the following information under ‘Cryptosporidiosis’ as an acute short-term infection – except for those with immune system problems. Wikipedia does not, however, duplicate this information in its article on ‘Leopard Gecko’. But any observant Wiki-serf can rectify that in a couple of minutes.

“Cryptosporidiosis, also known as crypto,[1] is a parasitic disease caused by Cryptosporidium, a protozoan parasite in the phylum Apicomplexa. It affects the intestines of mammals and is typically an acute short-term infection. It is spread through the fecal-oral route, often through contaminated water;[1] the main symptom is self-limiting diarrhea in people with intact immune systems. In immunocompromised individuals, such as AIDS patients, the symptoms are particularly severe and often fatal. Cryptosporidium is the organism most commonly isolated in HIV positive patients presenting with diarrhea.”

Wikipedia also classifies ‘Coccidia’ as a human parasite.

The Fragmentation of Information in Wikipedia

30 April 2009

(A preliminary analysis, with reference to Spanish Wikipedia’s multiple offerings on the Spanish Civil War / la Guerra Civil Española)


In the Spanish version of Wikipedia (which currently covers a wide range of 467,000 entries; the ‘senior’ English Wikipedia claims 2,859,000 items), there are multiple separate entries on the core subject, Spanish Civil War (Guerra Civil Española), an international encyclopedia topic which has been widely discussed for over 70 years and which, according to some estimates, has inspired 12,000 books and pamphlets in many languages.


The widely dispersed multiplicity of Wikipedia entries on many subjects is at least partly due to Wikipedia’s own intricate rules, prohibitions and recommendations and its faithful Users’ successful or unsuccessful adherence to them. For example, in specific advice to new wikipedian contributors, Wikipedia’s strong preference for short articles is stressed and an optimal length of 32 KB (i.e. about 1,000 words, or 2 and a half pages) is recommended. Another cause of data fragmentation is a process which Wikipedia itself terms ‘ forks’ or ‘forking’. This splitting of a topic into various entries (with different titles) is the subject of a set of basic labyrinthine Wikipedia rules and analyses (in its English version) which are intended to demarcate the difference between a “Content fork” (two articles on one topic, particularly in cases of disagreements or similar difficulties among contributors and ‘referees’), which Wikipedia strictly forbids, and a POV (Point of View) Fork, which it recommends, notably cases separating Critical aspects from the Topic itself, as is often the case in topics where a set of political, religious or spiritual beliefs and activities is offered in one entry and any criticism of these beliefs and activities, or a description of the relevant Organisation, is relegated to a separate (and often alphabetically distant) entry. However, in addition to this sort of approved dilution of major (or controversial) topics, many unrecommended content forks also occur on Wikipedia, and remain there, without being deleted or fused with other major aspects, as Wikipedia expressly stipulates. (See WP:CFORK and WP:POVFORK.) A collateral consequence of these anomalies is that, to be more realistic, Wikipedia’s statistics for its total entries should be adjusted to take this bloating factor into account.


A further problem is that, unless in this medium which offers instant direct hyperlinks, very comprehensive linkage is provided between fragmented segments of information on a core topic, the encyclopedia reader will not have easy access to enough of these ‘forks’. This is precisely what seems to have occurred in the case of the multiple Spanish entries for the Civil War. Here the informational value of the sum of knowledge contributed is compromised by the inadequate number of links between an accumulation of well over one hundred related entries, especially between the major ones, often of the ‘Point of View’ type (for example, Terror Rojo en España, Represión franquista, Bando nacional (Nationalist), Bando republicano).

The rest of this brief article will present evidence gleaned from a survey of the information offered by the Spanish Wikipedia in relation to a very prominent and complex topic: ‘la Guerra Civil Española’

List of Articles

Guerra Civil Española

This general article should be the longest and principal one, with adequate references and Hyperlinks to relevant related Wikipedia entries. Unfortunately, official action has been taken to freeze or mummify it in a ‘protected’ form, presumably to guard against the risk of vandalism, perhaps in the wake of the recent strong debates in Spain relating to ‘revisionism’ on the subject of the War, the participants, the antecedents and aftermath. Therefore, in protected entries, Wikipedia’s celebrated openness to all contributors is suspended, until the protection is lifted by the ‘burócratas’(trusted supervisors). In this case, it means that no changes can be made to improve the inadequate links to other articles and that the inexplicably inadequate ‘Bibliography’ of 5 items (only one of which is a major one) lowers the value of the entry and its use to readers. (The existence of this pathetic Bibliography in an otherwise lengthy and informative article is an interesting example of the weaker aspects of the otherwise fabulous Wikipedia project, which insists so strongly on the backing of reputable sources and one or two other problematical criteria.)

The diversity of many other segments and the presence and absence of direct links within the ‘Guerra Civil Española’ topic form the body of this article.

Francisco Franco

From the list of links offered above and below, the only ones given are: ‘Franquismo’ and ‘Simbología del Franquismo’.

Dictadura de Francisco Franco

Franquismo

Terror Rojo en España (Red Terror in Spain)
This includes a section on ‘Terror Blanco y Rojo’ and a few paragraphs in English, from Antony Beevor and Stanley Payne, probably from the English Wikipedia entry: ‘White Terror in Spain’.

Valle de los Caídos
(An unbalanced entry, with no links.)

Personajes relevantes de la Guerra Civil Española

Simbología del franquismo

Cronología de la Guerra Civil Española

Bando nacional (The Nationalist Side, i.e. The Franco Uprising)
Brief. Links to: ‘Guerra Civil Española’ and ‘Nacionalismo español’.

Bando republicano (The Republican Side, i.e. The variegated Supporters of the Left-wing Republican Government)
Equally brief. Links to: ‘G.C.E.’ and ‘Revolución española de 1936’.

Ofensiva de Cataluña

Guerra Civil Española en el País Vasco (… in the Basque Country)

There is also a considerable number of articles (short and long) on the war, battles in other different regions of Spain, atrocities, victims, etc.

Wikipedia entries published during the current vigorous debate in Spain, since 2004

Since 2000, many revisionist books and some replies have been published in Spain (some of them are bestsellers) on different aspects of the Spanish Civil War, whose 70th anniversary was greatly celebrated by both ‘sides’ – and others – in 2006. Moreover, in the 2004 elections, the Socialist Party and its allies dramatically defeated the ruling nationalist conservative Partido Popular, a slightly ironic replay of 1934, two years before the outbreak of the Spanish Civil War.

Ley de memoria histórica de España
Old controversial claims have finally been promoted in this new law. Links: ‘Franquismo’, ‘Guerra Civil Española’ and ‘Víctimas de la Guerra Civil Española’.

Represión política en España
Published in August 2006 (apparently by Professor Ángel Luis Alfaro, one of the few wikipedians who do not hide behind a pseudonym). Links to: ‘Guerra Civil Española’ and ‘Franquismo’.

A part of this historical entry covers the Civil War. The Bibliography is brief, but interesting. A few items should be added. Unlike many other entries, this one offers many useful links to entries dealing with important topics of the Civil War.

Víctimas de la Guerra Civil Española
First published on 23 October 2005 by User ‘Nemo’. Links to all of the following:
Guerra Civil Española
Revolución social española de 1936
Depuración del Magisterio español tras la Guerra Civil Española
Causa General
Anexo:Mortalidad en la Guerra Civil Española, por inscripción en juzgados
Víctimas de la Guerra Civil en Navarra
Víctimas de la persecución religiosa durante la Guerra Civil Española
Masacre de Badajoz
Matanzas de Paracuellos
Crímenes del túnel de la muerte de Usera
Las checas
Víctimas de la Guerra Civil en Cantabria
Masacre de la carretera Málaga-Almería
Las Trece Rosas
Niños de Rusia
Represión política en España
Represión franquista
Exilio republicano

Víctimas de la persecución religiosa durante la Guerra Civil Española
Links to: ‘G.C.E.’ and ‘Revolución española de 1936’.

Depuración del Magisterio español tras la Guerra Civil Española
A long essay on an alleged Francoist postwar injustice, published on 29 January 2007 by an anonymous non-User. No links are given.

Causa General

An investigation into crimes committed during the “Red” occupation of Spain, ordered by Franco in 1940. A short stub, posted on 5 February 2008.

Represión franquista
First published on 13 September 2008. Its counterpart in the English Wikipedia is ‘White Terror (Spain)’ but this version is briefer. It offers a link with ‘Represión política en Espana’ but, because of the contents of ‘Terror Rojo…’ (see above), this entry appears to be superfluous and therefore in need of deletion.

Categoriás y Anexos

These general ‘Categories’ and ‘Appendices’ offer links to further lists of entries, or to specific details relevant to the main topic: the Civil War in Spain. Among the latter is the following very recent item:

Anexo:Imputados en el auto de 16 de octubre de 2008 del Juzgado Central de Instrucción nº 005 de la Audiencia Nacional

This presents a list of 35 deceased top Francoist officials (including the ‘Caudillo’ himself) who were declared to be no longer legally responsible for illegal detention and crimes against humanity during the Civil War and Postwar periods.

The following Appendix is a painstaking gathering of data on Civil War deaths as recorded in Municipal Registries.
Anexo:Mortalidad en la Guerra Civil Española, por inscripción en juzgados

It was first published by User ‘Jorab’ on 20 November 2007. This Basque ‘wikipedista’ is an example of those dedicated individual contributors of data who supply the major part of Wikipedia information (in all languages), with a total of 9939 contributions to his credit – most of them on similar detailed aspects of the Spanish Civil War in his region. (All statistical details like Users’  numbers of contributions, dates, etc., as well as the contributions themselves, are carefully recorded, updated and are instantly available from the Wikipedia system.)

Categoría:Guerra Civil Española
Another reference list of articles on the War.

Categoría:Franquismo
Another list of articles about Francoism.

Categoría:Batallas de la Guerra Civil Española
34 separate articles.

Categoría:Víctimas de la represión en la zona republicana
Articles on individual victims or groups of victims in the Republican Government zones.

Categoría:Víctimas de la represión en la zona franquista
Articles on individual victims or groups of victims of the Franco-held zones.

The above list may be of some use as a reading guide for the subject under examination, but it would have been more appropriate if Wikipedia had devised a better way of presenting its major or multi-faceted topics. As can easily be appreciated, the content of the above articles is encyclopedic in quantity but the Wikipedia way of arranging it and presenting it to Internet readers needs further refining.


(more…)

Harry McCracken on Knol

3 September 2008

The purpose of today’s short blog is to draw further attention (where necessary) to three stimulating and informative articles by a senior IT expert and former Editor-in-Chief of the prestigious magazine PC World, Harry McCracken. The 3 timely contributions raise two separate but connected issues concerning the Google colossus: its voracious appetite for IT projects in very different fields and the recent unveiling to netizens of the beta version of Google’s penultimate tentacle, Knol, as a new (and commercial) competitor in the online encyclopedia market. (And only a few days ago the IT world witnessed the baptism of their Chrome Browser…)

In just under a year, McCracken has shared his changing views about Knol on three occasions:

a basically negative reception of the announcement of the Knol project:

16 December 2007: Knol: ‘Google Ennui Sets In, at Least For Me’;

an optimistic greeting of the recent launch of the first Knol beta offerings:

23 July 2008: ‘Knol’s Well: Google’s Encyclopedia Looks Cool’;

an expression of doubts about perceived weaknesses and inconsistencies in the Knol project based on a small sample of comparisons between the larger number of Knols currently available and Wikipedia offerings:

1 September 2008: ‘Google’s Knol: So Far, Not So Good’.

If you have not read Harry’s evidence and arguments contained in the latest of these three (which provides links to the two previous articles), I strongly recommend it to you.

Google’s Knol: So Far, Not so Good.

Of particular interest is that although, as McCracken admits, his opinion has twice veered sharply, there is a consistency in his concerns that in addition to its phenomenally successful stellar enterprises, Google has shown a tendency to launch projects which do not achieve the sort of resounding success that Google Search and Google Maps, for example, have garnered. McCracken’s reservation is that, although he still feels the idea of Knol is cool, on the present preliminary evidence, there is reasonable cause to suspect that Knol may end up as one of that second category.

Just one short quotation to whet your appetite further:

“…Google is better at getting things started than finishing them. Services like Google Base and Google Page Creator remain rough drafts at best, eons after they debuted. Even a company with resources as vast as Google can’t do everything and do everything well.”

In the two days following that blog article on 1 September 2008, Harry has already posted a barrage of articles about Google’s launch of the beta of their Chrome Web Browser project.

Perhaps he never sleeps.