Posted tagged ‘Wikipedia’

Translation 36. Free Internet Translation Software: The Contest between Google Translate and Microsoft’s BING Translator. Russian and Hindi

13 June 2012

In my article Translation 33, I attempted a rough assessment of the efficiency of free online translation software offered by Google, Microsoft (BING), and the venerable Yahoo Babel Fish.
In this test both Google and Microsoft proved to be competent in French and Spanish (into English) translation (at this general level). My stated next step was to check the online translation of other languages with different scripts and/or syntax by taking a look at Russian (as an example of a different script, Cyrillic) and Hindi (both script, Devanagari, and syntax).  This is what is attempted in this new article (using short extracts from Russian and Hindi Wikipedias).

A preliminary and very topical comment to make is that further reference to Yahoo’s Babel Fish will not be possible here because, as of 30 May, 2012, Yahoo Babel Fish has been either subsumed into or replaced by BING Translator, as indicated in the following recent note from Microsoft:
“We are pleased to welcome Yahoo! Babel Fish users to the Bing Translator family. We have been working closely with our friends at Yahoo! to make this an easy transition, and Bing Translator is a natural upgrade to the experience with Yahoo! Babel Fish. We support all the languages you used with Babel Fish, and provide a superset of all the features.”

Let us now look at the Russian to English situation. The piece chosen is an extract of  172 words from the Russian Wikipedia article on the Indian writer, intellectual and activist, Arundhati Roy. (Section:Политическая деятельность (Politicheskaya dyeyatyel’nost’)  Political activities) The style is simple.Here are the two translations for comparison:

Google Translate: (http://translate.google.com)
Subsequently, Arundhati Roy has used his celebrity to draw public attention to important political issues. In a number of essays and speeches, it is opposed to nuclear weapons in India and neighboring Pakistan, as well as against Indian nationalism [citation needed 537 days]. She also took part in protest actions against the dam project on the Narmada River, as such projects are usually at the expense of the earth’s poorest and marginalized populations. Due to its popularity, Roy was able to attract the attention of national and international media to these problems.

Arundhati Roy’s literary activity is completely focused on illumination and critique of political and social themes. She opposed the so-called “war on terrorism” launched by the United States against the war in Iraq and against the policy of the World Bank and WTO. With its rigid stand it has become one of the best known environmental activists, anti-globalization movement and the peace.

In 2002 the High Court in Delhi has sentenced her to prison because she blamed the judges that they wanted to suppress the protests against the construction of a dam on the Narmada River. However, the symbolic conclusion was only one day. (196 words)

The Microsoft BING version: (http://www.microsofttranslator.com)

Subsequently, Arundhati Roy had used their celebrity to draw public attention to important political problems. In a number of essays and speeches, she spoke out against nuclear weapons for India and neighbouring Pakistan, as well as against Indian nationalism. She also participated in the protest activities against the dam project on the River Narmade, as such projects are carried out, usually at the expense of the poor and disenfranchised. Due to its popularity Roy was able to draw the attention of the national and international MEDIA.

Arundhati Roy completely literary activity focused on reporting and criticism of the political and social order. She opposed the so-called “war on terrorism” launched by the United States against the war in Iraq, as well as against the policies of the World Bank and the WTO. Thanks to its unwavering position it has become one of the most prominent environmental activists, peace and antiglobalizacionnogo movements.

In 2002, the Supreme Court in New Delhi sentenced her to jail because she accused judges that those would like to suppress protests against the construction of a dam on the River Narmade. However, the symbolic conclusion was only one day. (197 words)

As with the previous passages from French and Spanish, both Google and Microsoft convey an English version which makes good general sense, without reaching the standard of a professional translation. In spite of obvious flaws and errors, they are both useful to readers who do not speak Russian. More detailed examination of the translations and the Wikipedia originals will be of particular interest to professional translators and experts in Machine Translation (MT).

It would therefore seem reasonable to hypothesise that the Cyrillic script is not a special obstacle to “Western” Machine translation – just as it is reasonable to state that in learning Russian as a second language the Russian script does not present major problems in reading or writing, at least for Europeans.

Now we can move on to present for examination and comparison the performances by Google and Microsoft with an extract from Hindi Wikipedia (201 words; 10 sentences) on the same writer and activist (Section: kraaNtikaaree vichaar. Revolutionary idea). Warning: Readers may be bemused or irritated by these results.

Google

From U.S. imperialism, nuclear arms race, building dams on the Narmada, etc. Many
Local – international issues have been a voice against Arundhati Roy. But now
He believes that at least non-violent protest and civil disobedience in India
Movements is not the point.
Parliamentary system, part of the Communists and who believe in violent resistance
Arundhati admits that he caught in the Maoist ideology of Gandhi fan
Are not. In his words: – “After all, Gandhi was a superstar. When they are hungry – used to strike the
The hunger – strike were sitting on the superstar. But I do not believe in superstar politics.
If the people of the slums, hunger – strike that, no it does not care. ”

Arundhati Bajharwad that go on flowing in the flow of voices of protest in India
Being ignored. Against anti-people system, judiciary and media
Efforts have failed to impress. He said, “I understand that for us
Is very important to consider where we are right, wrong and where. We have given the arguments
They are right … But the violence did not work. ” (184 words)

Microsoft BING

US imperialism from the nuclear arms race, the Narmada dam, construction etc

Local-international issues that Arundhati Roy-voice ascendant are. But now
He believes in non-violent protests at India and civil disobedience
Movements not talk.
Became part of the parliamentary system and violent resistance in the count Communists
The Maoists of ideologies in which Arundhati admits that Gandhi’s andhabhakt
Are not. In their words-“end Gandhi a superstar. The hunger-strike, so they
The hunger-strike on superstars. But I do not mind you in the superstar.
If the people of a slum, a hunger-strike that it doesn’t care. ”
Arundhati believes that going away bazarvad flow-down of vowels in India
Unheard. janvirodhi system-the judiciary and media
Efforts have failed to impress. He said, “I think for us
It is important to consider where we are great, and where the wrong right. We gave arguments
They are right … But nonviolence is not effective.” (150 words)

These unsatisfactory performances (which, in my experience are not uncommon nor unrepresentative) clearly need much more attention and comment than the Russian translations above, or the French and Spanish ones. For Machine Translation, there is much more work to be done before satisfactory translations from Hindi to English (and some other languages) can be achieved.

From a reading of the English and without any reference to the original, the best that can be said of the translations is that they give glimpses of the subject material but they are not very useful. One can also see that the syntax is disjointed, many sentences are incomplete, and some references are inaccurate. In both Google and Microsoft versions all lines begin with a capital letter (which suggests a new sentence is beginning). From a comparison with the original one may add that the translations also offer some false information or impressions, as well as obvious problems with vocabulary identification and pronoun gender.

The reason why the Google and Microsoft translation systems have not yet been able to cope more satisfactorily with Hindi (and presumably with a number of other languages) is that they still have basic problems in identifying the complicated script, the very “different” syntax of Hindi and even the organisation of print, sentences and paragraphs.First of all, Hindi does not use upper case letters (nor italics or bold distinctions). Secondly, the main punctuation is a vertical bar as a full stop. Commas are used but often sparsely. The inability to deal with these characteristics must surely contribute to the peculiar look of the translations above, with initial capital letters at the beginning of each line.

Finally, let us look at the first sentence of the Hindi Wikipedia original (in transliterated form) to get a further glimpse of what can go wrong.

Amreekee saMraajyavaad se lekar, parmaanu hathiyaaroN ki hor, Narmada par baaNdh nirmaan aadi kaee sthaaneeye – antarrashtreeya mudhoN ke khilaaf avaaz bulaNd kartee rahee haiN arundhati raay. 

(my rough translation:)
From American imperialism, the nuclear arms race, to the construction of the Narmada Dam, etc., Arundhati Roy is raising her voice loudly on many local and international issues.

In the Hindi word order, a list of nominal groups is followed by “etc.” and then (literally) “several local-international issues against” (an example of the numerous Hindi “postpositions”, which are very basic and frequent sentence elements) and, finally, the sentence’s Verb and Subject (Arundhati Roy). Very different from: “From U.S. imperialism, nuclear arms race, building dams on the Narmada, etc. ManyLocal – international issues have been a voice against Arundhati Roy.”  and “US imperialism from the nuclear arms race, the Narmada dam, construction etc
Local-international issues that Arundhati Roy-voice ascendant are.”

I gave both systems a second chance by submitting the last part of that first sentence on its own. Without the cumbersome word order, Google did better but BING did not.

के ख़िलाफ़ आवाज़ बुलंद करती रही हैं अरुंधति राय
ke khilaaf aavaaz bulaNd kartee rahee haiN aruNdhati raay [roy]

Google: Arundhati Roy has been a voice against
BING: Is Arundhati Roy of that lofty-sounds
*

We must be grateful to Google and Microsoft for their valuable work on Hindi but we must also hope that the massive problems, briefly signposted in the above exercise, can be solved in the not too distant future. And similarly for other problem languages.

The next logical step would be to examine the quality of Google and BING translation from English into other languages. I will do my best at a later date, using the same four languages.

Au revoir. Hasta luego. Do sveedanya. Phir milenge.

Advertisement

Favourites on this Blog – for Holiday Reading

27 December 2011

Of the one hundred and eleven blogs posted here since 2008, these are the 16 that have attracted most attention. Unlike other more ephemeral blogs, the subject matter seems to remain of interest.

With my good wishes for the New Year.

General
1.
New Hope for Disempowered Women

2.
‘The Fragmentation of Information in Wikipedia’

3.
‘Please dress up the Em dash’

4.
‘Global warming debate. 1’

5.
‘Global warming debate. 2’
‘Global Warming Controversy. Part 2. Global Warming Scepticism: Some Basic Data & Chronological Notes’

6.
‘Julia Owen and bee stings in Bromley’

7.
‘Julia Owen, Retinitis Pigmentosa, and the Media. Part 1’
(Part 2 will follow in the New Year.)

Languages

1.
Of 33 offerings on Translation and Interpreting topics, this item has captured most attention:
‘Translation 8. Fluency in foreign languages. The case of Dr Condoleezza Rice’
(See also ‘Translation. 30’.)

2.
‘Translation 32. David Bellos’s Revealing Book on Translation and the Meaning of Everything’

3.
‘Spanish Pronunciation in the Media’

Spain

1.
‘The European Union’s verdict on the Franco Régime in Spain (1939-1975)’

2.
‘Justo Gallego – the lone twentieth century Cathedral Builder’

India

1.
‘Contemporary India. 1. Basic Sources of Information’

2.
‘A Visit to Sathya Sai Baba’s ashram in October 2008’

3.
‘Sathya Sai Baba: Questionable Stories and Claims. Part 1’

4.
‘Fuzzy Dates in the Official Biography of Sathya Sai Baba. A Re-examination’

Translation 33. Free Internet Translation Software: The Contest between Google Translate and Microsoft’s BING Translator

24 November 2011

Machine Translation (MT) software comes in many forms and in two specific categories: commercial, and free of charge. At the top end of the commercial offerings are sophisticated and expensive software tools used by professional freelance translators and translation companies in order to ease and speed up their laborious tasks. The name TRADOS is one of the most used. It offers packages costing from 600 to 2,500 Euros. At the lower commercial level there are many products costing between $60 and $120 for help with translating between the major European languages, or at least between English and those languages. For free Internet translation services, the current leader is Google Translate, closely followed by its recent challenger Microsoft’s BING Translator. Both produce fast but basic translations of all sorts of Internet material, in a very wide range of languages and language pairs. For a number of years, the earlier Internet leader was Altavista’s Babelfish. Under the Yahoo label, this free programme is still available and widely used but with the two younger competitors making fast progress with their more effective MT formulas, it is showing its age.

As a preliminary sample of MT, take the following absurdly easy test used by blog researchers comparing and rating ten budget software translation packages. From this site,
the references lead to this basic test, from Spanish to English.

“Abuela, ¿por qué tienes los ojos tan grandes?” Caperucita Roja preguntó. “Para que yo pueda ver mejor,” Dijo la abuela. “¡Oh, abuelita, ¿por qué tienes la boca tan grande?” “Para poder comerte mejor!” Entonces, la abuela salta de la cama.

They offer the following as a “Correct Translation” against which to compare the ten commercial contenders:
“Grandma, why do you have such big eyes?” Little Red Riding Hood asked. “So that I can see better.” the grandma said. “Oh, Grandma, why do you have such a big mouth?” “So I can eat better!” Then, the grandma jumps out of the bed.

For a description of the major three free Internet MT systems listed above and a judgement on their relative qualities, see John Yunker’s articles on the work of Ethan Shen, starting with this one and following the links).(Shen pronounces Google Translate as the overall winner.)

Another strong recommendation of Google’s quality and breadth of coverage as well as a clear explanation of the Google method is to be found in Chapter 23 of David Bellos’s recent wide-ranging book on Translation, Is That a Fish in Your Ear – by now a runaway bestseller.

The chapter offers a potted history of MT and expresses Bellos’s very positive view of the advances in MT achieved by Google, emphasising its novel approach to the task of MT. In a recent article, Bellos offers an edited version of pages 263-266 of that chapter (‘The Adventure of Automated Language Translation Machines’) in which, in characteristic manner, he succinctly explains the complex Google system to us:

“Using software originally developed in the 1980s by researchers at IBM, Google has created an automatic translation tool that is unlike all others. It is not based on the intellectual presuppositions of early machine translation efforts – it isn’t an algorithm designed only to extract the meaning of an expression from its syntax and vocabulary.
“In fact, at bottom, it doesn’t deal with meaning at all. Instead of taking a linguistic expression as something that requires decoding, Google Translate (GT) takes it as something that has probably been said before.
“It uses vast computing power to scour the internet in the blink of an eye, looking for the expression in some text that exists alongside its paired translation.
“The corpus it can scan includes all the paper put out since 1957 by the EU in two dozen languages, everything the UN and its agencies have ever done in writing in six official languages, and huge amounts of other material, from the records of international tribunals to company reports and all the articles and books in bilingual form that have been put up on the web by individuals, libraries, booksellers, authors and academic departments.
“Drawing on the already established patterns of matches between these millions of paired documents, Google Translate uses statistical methods to pick out the most probable acceptable version of what’s been submitted to it.”

Although he admits that Google Translate results are not always satisfactory, Bellos forecasts a rosy future for MT and for Google in particular as it improves and adds to its fabulous corpora in 58 language.

To give an idea of the standard of translation achieved by Google, and to give a glimpse of what Professor Bellos’s enthusiasm is founded on, I propose to offer and examine samples of translations into English from four languages. The additional factor is that BING (which offers 2-way translations to and from 37 languages as compared with the 58 Google pairs cited by Professor Bellos) will be subjected to the same tests, as evidence of this battle of the Free to Ether Translation Titans. (Results from Yahoo’s Babelfish are offered at the end of the piece.)

Firstly (in the current article) I present and compare translations from French and Spanish into English. In a later blog article I hope to offer similar material from Russian and Hindi (probably transliterated to fit in the WordPress system). From these disparate examples, we may be able to discern the strengths of the two software programmes and some of the problems which still remain to be overcome in the search for workable and useful translations into and out of all printed languages.

By way of Prologue to the proposed comparisons, if we try the ‘Little Red Riding Hood’ test sample on Google and BING, we get the following results.

Google Translate:
“Grandma, why are your eyes so big?” Little Red Riding Hood said. “So that I can see better,” said the grandmother. “Oh, Grandma, why your mouth is so big?” “To eat better!” Then the grandmother jumps out of bed.

There are two unsatisfactory translations here:
why your mouth is so big?” “To eat better!”

BING Translator:
“Grandmother, why you have such large eyes?” Little Red Riding Hood asked. “So that I can see better,” said the grandmother. “Oh, grandmother, why have the big mouth?!” “To be able to eat better!” Then Grandma jumps out of bed.

Again, two unsatisfactory translations, and for the same segment:
why have the big mouth?!” and “To be able to eat better!”

Both Google and BING completely miss the agglutinated Spanish pronoun in “comer” + “-te” (“to eat YOU better”), but, IMHO, Google is marginally in front of BING in the second listed infelicity.

Now let us move on to a more challenging test of MT ability. For this I have chosen short segments from French and Spanish Wikipedia on a topic of recent interest.

1.
French Wikipedia: ‘Crise financière mondiale débutant en 2007’
“La crise financière mondiale qui a commencé en 2007 est une crise financière marquée par une crise de liquidité et parfois par des crise [sic: = crises] de solvabilité tant au niveau des banques que des Etats, et une raréfaction du crédit au niveau des entreprises. Amorcée en juillet 2007, elle trouve son origine dans le dégonflement de bulles de prix (dont la bulle immobilière américaine des années 2000) et les pertes importantes des établissements financiers provoquées par la crise des subprimes. C’est la crise la plus grave de l’histoire des bourses de valeurs, après celle de 1873, découlant de la crise bancaire de mai 1873.
“La crise financière de l’automne 2008 amplifie le mouvement et provoque une chute des cours des marchés boursiers et la faillite de plusieurs établissements financiers. Pour éviter une crise systémique, les Etats doivent intervenir et sauver de nombreuses banques ce qui provoquera une crise de la dette publique en Islande d’abord puis en Irlande. Par ailleurs, elle provoque une récession touchant l’ensemble de la planète. Les finances publiques ont été lourdement sollicitées pour résoudre cette crise. Le déficit public s’est creusé dans de nombreux pays, après un recul du produit intérieur brut mondial de 2,2% en 2009.”
(200 words)

(Points for attention are italicised.)

Google:
“The global financial crisis that began in 2007 is a financial crisis marked by a liquidity crisis and sometimes by solvency crisis at both banks of the States, and a credit crunch at the company level. Began in July 2007, it has its origins in the bursting of price bubbles (including the U.S. housing bubble of the 2000s) and losses of financial institutions caused by the subprime crisis. This is the worst crisis in the history of stock exchanges, after that of 1873, arising from the banking crisis of May 1873.
The financial crisis of autumn 2008 amplifies the movement and causes a fall in equity markets and the collapse of several financial institutions. To avoid a systemic crisis, governments must step in and save many banks which will cause a crisis of public debt in Iceland first and then in Ireland. Moreover, it causes a recession in the entire world. Public finances were heavily used to solve this crisis. The deficit has widened in many countries, after a decline in global GDP by 2.2% in 2009.” (177 words)

Microsoft:

“The global financial crisis that began in 2007 is a financial crisis marked by a crisis of liquidity and solvency crisis sometimes both at the level of the banks of the States, and a scarcity of credit at the level of enterprises. Started in July 2007, it finds its origin in the bursting of bubbles of awards (including the the 2000 US housing bubble) and losses of financial institutions caused by the subprime crisis. It is the most serious crisis in the history of stock exchanges, after 1873, arising from the banking crisis of May 1873.
The financial crisis of autumn 2008 amplifies the movement and causes a collapse in stock market prices and the bankruptcy of several financial institutions. To prevent a systemic crisis, States should intervene and save many banks which will cause a crisis of public debt in Iceland first and then in Ireland. In addition, it causes a recession affecting the entire planet. Public finances were heavily sought to resolve this crisis. The public deficit widened in many countries, after a decline of 2.2% in 2009 world gross domestic product.” (169 words)

These are worthy attempts, useful to the general reader looking for a gist, and produced, on demand, in a few seconds. All that is needed to make them more reliable is shown below (in bold type).

Google, improved:

“The global financial crisis that began in 2007 is a financial crisis marked by a liquidity crisis and sometimes by solvency crises for both banks and States, and a credit crunch at the company level. Beginning in July 2007, it has its origins in the bursting of price bubbles (including the U.S. housing bubble of the 2000s) and serious losses by financial institutions caused by the subprime crisis. This is the worst crisis in the history of stock exchanges, after that of 1873, arising from the banking crisis of May 1873.
The financial crisis of autumn 2008 amplifies the movement and causes a fall in equity markets and the collapse of several financial institutions. To avoid a systemic crisis, governments had to step in and save many banks, which was to cause a crisis of public debt first in Iceland and then in Ireland. Moreover, it caused a recession in the entire world. Public finances were heavily used to solve this crisis. The deficit has widened in many countries, after a decline in global GDP of 2.2% in 2009.” (177 words)

BING, improved:

“The global financial crisis that began in 2007 is a financial crisis marked by a crisis of liquidity and sometimes by solvency crises both at the level of the banks and of the States, and by a scarcity of credit at the company level. Commencing in July 2007, it has its origin in the bursting of price bubbles (including the 2000 US housing bubble) and the serious losses of financial institutions caused by the subprime crisis. It is the most serious crisis in the history of stock exchanges, after the 1873 crisis, arising from the banking crisis of May 1873.
The financial crisis of autumn 2008 amplifies the movement and causes a collapse in stock market prices and the bankruptcy of several financial institutions. To prevent a systemic crisis, States had to intervene and save many banks, which was to cause a crisis of public debt first in Iceland and then in Ireland. In addition, it caused a recession affecting the entire planet. Public finances were heavily drawn on to resolve this crisis. The public deficit widened in many countries, after a decline of 2.2% in world gross domestic product in 2009.” (169 words)

2.
Spanish Wikipedia: ‘Crisis económica de 2008-2011’

“Por crisis económica de 2008 a 2011 se conoce a la crisis económica mundial que comenzó ese año, originada en los Estados Unidos. Entre los principales factores causantes de la crisis estarían los altos precios de las materias primas, la sobrevalorización del producto, una crisis alimentaria mundial y energética, una elevada inflación planetaria y la amenaza de una recesión en todo el mundo, así como una crisis crediticia, hipotecaria y de confianza en los mercados. La causa raíz de toda crisis según la Teoría austríaca del ciclo económico es una expansión artificial del crédito. En palabras de Jesús Huerta de Soto «esta crisis surge de la expansión crediticia ficticia orquestada por los bancos centrales, y que ha motivado que los empresarios invirtieran donde no debían”.
“La crisis iniciada en el 2008 ha sido señalada por muchos especialistas internacionales como la “crisis de los países desarrollados”, ya que sus consecuencias se observan fundamentalmente en los países más ricos del mundo.” (159 words)

(Points for attention are italicised.)

Google

In economic crisis from 2008 to 2011 is known to the world economic crisis that began that year, which originated in the United States. Among the main factors causing the crisis would be the high prices of raw materials, the overvaluation of the product, a global food and energy crisis, high inflation and the threat of global recession around the world and a credit crisis trust and mortgage markets. The root cause of all crises as the Austrian theory of business cycle is an artificial expansion of credit. In the words of Jesus Huerta de Soto “this crisis arises from the fictitious credit expansion orchestrated by central banks, and has motivated entrepreneurs to invest where there were”.
The crisis that began in 2008 has been noted by many international experts as the “crisis of the developed countries”, since its effects are observed mainly in the richer countries of the world. (150 words)

Microsoft

The global economic crisis that began that year, originating in the United States is known by economic crisis of 2008 to 2011. Among the major causative factors of the crisis would be high prices of raw materials, the sobrevalorización of the product, energy and global food crisis, high global inflation and the threat of a recession around the world, as well as a loan, mortgage crisis and confidence in the markets. Caused by following every crisis according to the Austrian business cycle theory is an artificial expansion of credit. In the words of Jesus Huerta de Soto “this crisis stems from the fictional credit expansion orchestrated by central banks, and that has motivated entrepreneurs to invest where wrong”.
The crisis which began in 2008 has been brought by many international experts as the ‘crisis of developed countries’, already that its consequences are observed mainly in countries richest in the world. (150 words)

Google, improved

The
economic crisis of 2008 to 2011 is the title given to the world economic crisis that began that year and originated in the United States. Among the main factors causing the crisis would be the high prices of raw materials, the overvaluation of the product, a global food and energy crisis, high inflation and the threat of global recession around the world and a crisis in credit, mortgages and market confidence. The root cause of all crises according to the Austrian theory of the business cycle is an artificial expansion of credit. In the words of Jesus Huerta de Soto “this crisis arises from the fictitious credit expansion orchestrated by central banks, and has motivated entrepreneurs to invest where they should not have done“.
The crisis that began in 2008 has been noted by many international experts as the “crisis of the developed countries”, since its effects are observed mainly in the richer countries of the world. (158 words)

BING, improved

The global economic crisis that began in 2008, originating in the United States, is known as the economic crisis of 2008 to 2011. Among the major causative factors of the crisis would be high prices of raw materials, the overvaluation of the product, a global food and energy crisis, high global inflation and the threat of a recession around the world, as well as a loan crisis, a mortgage crisis and loss of confidence in the markets. The root cause of every crisis, according to the Austrian business cycle theory is an artificial expansion of credit. In the words of Jesus Huerta de Soto “this crisis stems from the fictional credit expansion orchestrated by central banks, and that has motivated entrepreneurs to invest where they should not have done.”
The crisis which began in 2008 has been labelled by many international experts as the ‘crisis of developed countries’, since its consequences are observed mainly in the richest countries in the world. (161 words)

So, on the above evidence, both of these translation tools, Google and BING, offer a very useful BASIC – and lightning fast – FREE service for French and Spanish to English to millions of Internet users. (The situation of English INTO French and Spanish needs separate attention and may be dealt with in a future blog.)

For comparison, here are the results obtained Yahoo’s Babelfish with its updated but ageing technology. Note the higher number of italicised items, and their nature.

1. French to English:
“The world financial crisis which started in 2007 is a financial crisis marked by a crisis of liquidity and sometimes by crisis of solvency as well on the level of the banks as of the States, and a rarefaction of the credit on the level of the companies. Started in July 2007, it finds its origin in the deflation of bubbles of price (of which the American real estate bubble of the years 2000) and important losses of the financial institutions caused by the crisis of the subprimes. C’ is the most serious crisis of l’ history of the purses of values, after that of 1873, rising from the banking crisis of May 1873. The financial crisis of l’ autumn 2008 amplifies the movement and causes a fall of the courses of the stockmarkets and the bankruptcy of several financial institutions. To avoid a systemic crisis, the States must intervene and save many banks what will cause a crisis of the national debt in Iceland d’ access then in Ireland.
In addition, it causes a recession concerning l’ together of planet. Public finances were heavily requested to solve this crisis. The public deficit s’ is dug in many countries, after a retreat of the world gross domestic product of 2,2% in 2009.”

2. Spanish to English

By economic crisis from 2008 to 2011 it is known world-wide the economic crisis that began that year, originated in the United States. Between the main factors causes of the crisis they would be the high prices of the raw materials, the sobrevaluation of the product, world-wide an nourishing crisis and energetics, a high planetary inflation and the threat of a recession anywhere in the world, as well as a credit, hypothecating crisis and of confidence in the markets. The root cause of all crisis according to the Austrian Theory of the economic cycle is an artificial expansion of the credit. In words of Jesus Kitchen garden of Grove “this crisis arises from the fictitious credit expansion orchestrated by the central banks, and that have motivated that the industralists invested where they did not have”. The crisis initiated in the 2008 has been indicated by many international specialists like the “crisis of the developed countries”, since their consequences are observed essentially in the richest countries of the world.”
*

In a later blog, passages will be selected from two languages which are “more foreign” to English speakers, and for which less raw material has been available to the colossal Internet data banks on which Google Translate and Microsoft Translator rely for their lightning fast searches. The samples will be taken from Russian and Hindi, languages whose structures differ more basically from English than its familiar French and Spanish cousins.

Da svidanya. Phir milenge

(For a lighter and enlightening finish to this long essay, Google’s own explanation of its system is to be found here.)

Translation 31. David Bellos on Google Translate and Much Else

24 September 2011

In this article published in The Independent recently (which I found insightful but about which many of the 128 commenters expressed reservations and doubts), Professor David Bellos, a French literary translator, sings the praises of Google Translate, that extraordinary multilingual translation tool used by millions of web surfers. Bellos illuminates several points which many casual users of Google Translate may not have been aware of, including two which aroused my special interest.
1. English is the predominant “tool” in the Google operation. Because of its ubiquitousness in print, English provides Google with the major part of the input material on which its very complex translation operations are based.
2. Literary translators (like Bellos) play a quite important part in providing the essential raw material on which Google Translate relies.

Much more importantly, this thousand word article is an extract from Bellos’s recently published book on Translation: Is that a Fish in your Ear? and the Meaning of Everything, published in USA (by Faber) and more recently in UK (by Penguin). (Check the Wikipedia page on David Bellos.)

I have just ordered a copy and will offer more comments when I receive the book.
*
(See also a much earlier article , with references, published by the New York Times.)

The Fragmentation of Information in Wikipedia

30 April 2009

(A preliminary analysis, with reference to Spanish Wikipedia’s multiple offerings on the Spanish Civil War / la Guerra Civil Española)


In the Spanish version of Wikipedia (which currently covers a wide range of 467,000 entries; the ‘senior’ English Wikipedia claims 2,859,000 items), there are multiple separate entries on the core subject, Spanish Civil War (Guerra Civil Española), an international encyclopedia topic which has been widely discussed for over 70 years and which, according to some estimates, has inspired 12,000 books and pamphlets in many languages.


The widely dispersed multiplicity of Wikipedia entries on many subjects is at least partly due to Wikipedia’s own intricate rules, prohibitions and recommendations and its faithful Users’ successful or unsuccessful adherence to them. For example, in specific advice to new wikipedian contributors, Wikipedia’s strong preference for short articles is stressed and an optimal length of 32 KB (i.e. about 1,000 words, or 2 and a half pages) is recommended. Another cause of data fragmentation is a process which Wikipedia itself terms ‘ forks’ or ‘forking’. This splitting of a topic into various entries (with different titles) is the subject of a set of basic labyrinthine Wikipedia rules and analyses (in its English version) which are intended to demarcate the difference between a “Content fork” (two articles on one topic, particularly in cases of disagreements or similar difficulties among contributors and ‘referees’), which Wikipedia strictly forbids, and a POV (Point of View) Fork, which it recommends, notably cases separating Critical aspects from the Topic itself, as is often the case in topics where a set of political, religious or spiritual beliefs and activities is offered in one entry and any criticism of these beliefs and activities, or a description of the relevant Organisation, is relegated to a separate (and often alphabetically distant) entry. However, in addition to this sort of approved dilution of major (or controversial) topics, many unrecommended content forks also occur on Wikipedia, and remain there, without being deleted or fused with other major aspects, as Wikipedia expressly stipulates. (See WP:CFORK and WP:POVFORK.) A collateral consequence of these anomalies is that, to be more realistic, Wikipedia’s statistics for its total entries should be adjusted to take this bloating factor into account.


A further problem is that, unless in this medium which offers instant direct hyperlinks, very comprehensive linkage is provided between fragmented segments of information on a core topic, the encyclopedia reader will not have easy access to enough of these ‘forks’. This is precisely what seems to have occurred in the case of the multiple Spanish entries for the Civil War. Here the informational value of the sum of knowledge contributed is compromised by the inadequate number of links between an accumulation of well over one hundred related entries, especially between the major ones, often of the ‘Point of View’ type (for example, Terror Rojo en España, Represión franquista, Bando nacional (Nationalist), Bando republicano).

The rest of this brief article will present evidence gleaned from a survey of the information offered by the Spanish Wikipedia in relation to a very prominent and complex topic: ‘la Guerra Civil Española’

List of Articles

Guerra Civil Española

This general article should be the longest and principal one, with adequate references and Hyperlinks to relevant related Wikipedia entries. Unfortunately, official action has been taken to freeze or mummify it in a ‘protected’ form, presumably to guard against the risk of vandalism, perhaps in the wake of the recent strong debates in Spain relating to ‘revisionism’ on the subject of the War, the participants, the antecedents and aftermath. Therefore, in protected entries, Wikipedia’s celebrated openness to all contributors is suspended, until the protection is lifted by the ‘burócratas’(trusted supervisors). In this case, it means that no changes can be made to improve the inadequate links to other articles and that the inexplicably inadequate ‘Bibliography’ of 5 items (only one of which is a major one) lowers the value of the entry and its use to readers. (The existence of this pathetic Bibliography in an otherwise lengthy and informative article is an interesting example of the weaker aspects of the otherwise fabulous Wikipedia project, which insists so strongly on the backing of reputable sources and one or two other problematical criteria.)

The diversity of many other segments and the presence and absence of direct links within the ‘Guerra Civil Española’ topic form the body of this article.

Francisco Franco

From the list of links offered above and below, the only ones given are: ‘Franquismo’ and ‘Simbología del Franquismo’.

Dictadura de Francisco Franco

Franquismo

Terror Rojo en España (Red Terror in Spain)
This includes a section on ‘Terror Blanco y Rojo’ and a few paragraphs in English, from Antony Beevor and Stanley Payne, probably from the English Wikipedia entry: ‘White Terror in Spain’.

Valle de los Caídos
(An unbalanced entry, with no links.)

Personajes relevantes de la Guerra Civil Española

Simbología del franquismo

Cronología de la Guerra Civil Española

Bando nacional (The Nationalist Side, i.e. The Franco Uprising)
Brief. Links to: ‘Guerra Civil Española’ and ‘Nacionalismo español’.

Bando republicano (The Republican Side, i.e. The variegated Supporters of the Left-wing Republican Government)
Equally brief. Links to: ‘G.C.E.’ and ‘Revolución española de 1936’.

Ofensiva de Cataluña

Guerra Civil Española en el País Vasco (… in the Basque Country)

There is also a considerable number of articles (short and long) on the war, battles in other different regions of Spain, atrocities, victims, etc.

Wikipedia entries published during the current vigorous debate in Spain, since 2004

Since 2000, many revisionist books and some replies have been published in Spain (some of them are bestsellers) on different aspects of the Spanish Civil War, whose 70th anniversary was greatly celebrated by both ‘sides’ – and others – in 2006. Moreover, in the 2004 elections, the Socialist Party and its allies dramatically defeated the ruling nationalist conservative Partido Popular, a slightly ironic replay of 1934, two years before the outbreak of the Spanish Civil War.

Ley de memoria histórica de España
Old controversial claims have finally been promoted in this new law. Links: ‘Franquismo’, ‘Guerra Civil Española’ and ‘Víctimas de la Guerra Civil Española’.

Represión política en España
Published in August 2006 (apparently by Professor Ángel Luis Alfaro, one of the few wikipedians who do not hide behind a pseudonym). Links to: ‘Guerra Civil Española’ and ‘Franquismo’.

A part of this historical entry covers the Civil War. The Bibliography is brief, but interesting. A few items should be added. Unlike many other entries, this one offers many useful links to entries dealing with important topics of the Civil War.

Víctimas de la Guerra Civil Española
First published on 23 October 2005 by User ‘Nemo’. Links to all of the following:
Guerra Civil Española
Revolución social española de 1936
Depuración del Magisterio español tras la Guerra Civil Española
Causa General
Anexo:Mortalidad en la Guerra Civil Española, por inscripción en juzgados
Víctimas de la Guerra Civil en Navarra
Víctimas de la persecución religiosa durante la Guerra Civil Española
Masacre de Badajoz
Matanzas de Paracuellos
Crímenes del túnel de la muerte de Usera
Las checas
Víctimas de la Guerra Civil en Cantabria
Masacre de la carretera Málaga-Almería
Las Trece Rosas
Niños de Rusia
Represión política en España
Represión franquista
Exilio republicano

Víctimas de la persecución religiosa durante la Guerra Civil Española
Links to: ‘G.C.E.’ and ‘Revolución española de 1936’.

Depuración del Magisterio español tras la Guerra Civil Española
A long essay on an alleged Francoist postwar injustice, published on 29 January 2007 by an anonymous non-User. No links are given.

Causa General

An investigation into crimes committed during the “Red” occupation of Spain, ordered by Franco in 1940. A short stub, posted on 5 February 2008.

Represión franquista
First published on 13 September 2008. Its counterpart in the English Wikipedia is ‘White Terror (Spain)’ but this version is briefer. It offers a link with ‘Represión política en Espana’ but, because of the contents of ‘Terror Rojo…’ (see above), this entry appears to be superfluous and therefore in need of deletion.

Categoriás y Anexos

These general ‘Categories’ and ‘Appendices’ offer links to further lists of entries, or to specific details relevant to the main topic: the Civil War in Spain. Among the latter is the following very recent item:

Anexo:Imputados en el auto de 16 de octubre de 2008 del Juzgado Central de Instrucción nº 005 de la Audiencia Nacional

This presents a list of 35 deceased top Francoist officials (including the ‘Caudillo’ himself) who were declared to be no longer legally responsible for illegal detention and crimes against humanity during the Civil War and Postwar periods.

The following Appendix is a painstaking gathering of data on Civil War deaths as recorded in Municipal Registries.
Anexo:Mortalidad en la Guerra Civil Española, por inscripción en juzgados

It was first published by User ‘Jorab’ on 20 November 2007. This Basque ‘wikipedista’ is an example of those dedicated individual contributors of data who supply the major part of Wikipedia information (in all languages), with a total of 9939 contributions to his credit – most of them on similar detailed aspects of the Spanish Civil War in his region. (All statistical details like Users’  numbers of contributions, dates, etc., as well as the contributions themselves, are carefully recorded, updated and are instantly available from the Wikipedia system.)

Categoría:Guerra Civil Española
Another reference list of articles on the War.

Categoría:Franquismo
Another list of articles about Francoism.

Categoría:Batallas de la Guerra Civil Española
34 separate articles.

Categoría:Víctimas de la represión en la zona republicana
Articles on individual victims or groups of victims in the Republican Government zones.

Categoría:Víctimas de la represión en la zona franquista
Articles on individual victims or groups of victims of the Franco-held zones.

The above list may be of some use as a reading guide for the subject under examination, but it would have been more appropriate if Wikipedia had devised a better way of presenting its major or multi-faceted topics. As can easily be appreciated, the content of the above articles is encyclopedic in quantity but the Wikipedia way of arranging it and presenting it to Internet readers needs further refining.


(more…)

Fluctuating Specifications for Online Encyclopedias

5 July 2008

by Brian Steel

By the mid-1990s, the information needs of the growing Internet market were being served not only by online versions of traditional commercial encyclopedias like the Encyclopedia Britannica, Grolier’s New Book of Knowledge and the World Book Encyclopedia but by Microsoft’s vigorously marketed Encarta, which had begun to attract significant numbers of online customers. Both Encarta (http://encarta.msn.com) and Encyclopedia Britannica (www.britannica.com) have maintained a high online profile during the innovative cyber-developments to be described below. (Surfers may even browse the articles of the 1911 (E.B.) edition at http://www.1911encyclopedia.com.)A much more recent online presence, still in its beta stage, is the High Beam Encyclopedia (www.encyclopedia.com), which offers articles from the sixth edition of the Columbia Encyclopedia as well as newspaper and magazine references and a commercial backnumbers service for newspapers, magazines and journals.

At the turn of the century, Larry Sanger was hired by Jimmy Wales to organise the (short-lived) Nupedia project which was launched in March 2000. The aim was to attract encyclopedia entries from volunteer experts for eventual publishing as free content after peer review and approval. During the following twelve months of snail-pace progress, Sanger proposed to speed up the process with preliminary versions in wiki form (Wikipedia), involving voluntary contributions from any Users. Within a further year, this idea had produced such rapid progress that the original idea of articles by experts was discarded and Sanger left the company shortly afterwards. (Ironically, Wales would eventually be forced to reconsider and partially reinstate the experts theme into Wikipedia.)

In the intervening seven years, Wikipedia, financed by the non-profit Wikimedia Foundation and supported by the enthusiastic efforts of thousands of eager volunteers, has been experiencing exponential growth, not only in English but in many other languages. It currently dominates the online encyclopedia field. There is no denying that the 2 million entries of this seething online co-operative venture is of incalculable daily value to its millions of users as a quick free source of reliable data on basic factual topics. On other topics it has proved to be much less satisfactory. (See ‘Wikipedia’s Grudging Recognition of its Self-imposed Limitations’.)

However, with Wikipedia’s success have come many problems and controversies and subsequent necessary adjustments to its rigid structure. For example, the following new departure was announced in mid-2006 with reference to the German Wikipedia:

“The German Wikipedia is set to introduce editing restrictions that may spread to other language versions if successful. This involves identifying a set of “trusted users” and allowing only their changes to be instantly visible. New contributors’ work would be moderated by these users, who might be selected on the basis of how long they have been on the site and the number of their edits that have gone unreverted.”

(http://en.citizendium.org/wiki/Wikipedia)

Incidentally, the following brief extracts from an announcement about the German version of Wikipedia (with its 700,000 articles) is of particular interest to those used to the conditions under which the English Wikipedia has notched up over 2 million articles of varying quality:

“The German Wikipedia is different from the English Wikipedia in a number of aspects.

There are severe rules of relevancy. Contemporary people usually have to reach a high level of fame before an article on them is allowed. […]

Many controversial articles are protected for months and cannot be edited by unsubscribed or recently subscribed users.”

The progress of Wikipedia in its initial rigid form has produced a great deal of alternative Internet activity from disillusioned editors, critics as well as from users and editors who have preferred to set up their own ‘forks’ or alternative Wiki encyclopedias to produce what they see as less inhibited or more permanent results. The most interesting of these forks are the following ones set up by non-English-speaking groups:

Germany: http://www.wikiweise.de

Russia: http://www.wikiznanie.ru

Spanish-speaking countries: wikilibre.org/index.php/

Perhaps the most interesting ‘fork’ is on display at http://en.wikipilipinas.org

WikiPilipinas is a very interesting but also very localised offshoot, launched in mid 2007. It deals with Philippine-related topics, is non-academic, allows original research and is not bound by the NPOV principle. It presently contains 57,000 articles.

“WikiPilipinas is an encyclopedia dedicated to anything and everything that matters to Filipinos and the Philippines. It is an encyclopedia of Philippine content and includes elements of an almanac, directory and community pages. A centralized repository of Philippine content, it is intended to serve Filipinos anywhere in the world. Wikipilipinas allows Filipinos to document themselves in a manner they deem proper, whether or not it agrees with what foreign sources say.”

A strong indication that the management of Wikipedia is getting tired of the growing intensity of public criticism and disparagement of its inflexible rules and the instability of some of its articles through constant changes or “edit wars” is the recently launched feeder project ‘Veropedia’ (http://en.veropedia.com), to which Wikipedia writers of ‘good’ articles can apply for their articles to be saved INTACT.

Officially, Wikipedia announces this late 2007 development thus:

“Veropedia is a free, advertising-supported Internet encyclopedia project launched in late October 2007.[1][2]

“The site is based around collaboration within Wikipedia, whereby Wikipedia articles that meet Veropedia’s reliability criteria are chosen by its editors, scraped, and then a stable version of the article is kept on Veropedia. Any improvements required for articles to reach a standard suitable for Veropedia occur on Wikipedia itself. This model is intended to provide benefits to both projects with Wikipedia providing a large amount of free content suitable for potential improvement, and Veropedia contributors providing improvements and fact-checking within Wikipedia articles.[1][2][3]

As of April 2008 the site, still in beta, has checked and imported over 5700 articles[4] from the English Wikipedia into its public database.[5] Although Veropedia intends to eventually support itself completely through advertising as of January 2008 the project is run mainly from personal savings, investments and loans of those involved in the project.[6]

This novel choice for seasoned Wikipedian editors is openly solicited by a Wikipedia User named Moreschi (and perhaps other editors) with some of the Wikipedia edits he makes. They link (through a superscript hyperlink) to the following announcement:

“If you’ve written a quality article, here’s a suggestion about how to save your stable, quality version, and preserve it from vandalism, spam, POV-pushing, and the addition of inaccuracies that so often decrease the quality of Wikipedia articles over time. Want to really preserve your classy work for humanity? See it expert-reviewed? Get it uploaded to Veropedia (FAQ, see also my user and talk pages.)! You don’t have to do this yourself; though we welcome new contributors, if you feel you haven’t got the time, simply send an email to us suggesting your article as suitable for upload, or any other you might know of that you think good enough. To do this, go to the Main Page that I linked to above, put your mouse on the Contacts tab, and click “Suggest an article”. Cheers, Moreschi Talk 13:38, 10 November 2007 (UTC)”

Moreschi also makes this astonishing further comment on his Wikipedia User page:

“Veropedia is going to be taking up more and more of my time, and I would encourage others who care about Wikipedia’s articles to join in with our efforts there and help out. The best of my work has already been saved there as a quality stable version, and for that reason I do not regret the time I have spent on Wikipedia since March 2006. I’m not overly optimistic about Wikipedia’s condition at the moment, but it is not beyond repair. All it would take is for more to understand that truth is a woman, and she will not let herself be assailed with the cold bludgeons of policy.”

“I will still do everything important: contribute regularly to articles, put in the hours at the coalface at the Opera Project, write and discover quality content for Veropedia.”

The recent appearance of rival fledgeling Web encyclopedia Citizendium and Google’s announced Knol project (still under wraps since December 2007) add further incentives for incorporating greater flexibility into the Wikipedia system.

Citizendium [= Citizens’ Compendium], a project in preparation since 2005 by Larry Sanger.

Launched in early 2007, with the laudable aim of providing expert contributions under contributors’ own names, Citizendium also announced a feeder project called Eduzendium (proposed by Professor Sorin Matei) which would harness the talents of doctoral candidates. In spite of these attractive proposals, the project was not received very optimistically by experts as diverse as Professor Clay Shirky (a Wikimedia advisory board member) and Nicholas Carr, an eloquent critic of Wikipedia. After just over a year of publishing, the progress of this new online encyclopedia (with a non-charismatic name) does not seem very encouraging in terms of properly finished and approved articles.

Knol Web Encyclopedia

Announced in late 2007 by Google, also to consist of expert and peer-reviewed unalterable articles. Apart from one sample published, its initial work has so far been conducted in secret.

At the same time others have been setting up their own Wikipedia-derived encyclopedias and specialist groups have begun to offer restricted wiki-type encyclopedias.

Scholarpedia

A very serious scholarly restricted scientific Wiki, of value to specialists.

Conservapedia

Also aimed at a restricted audience, this wiki-based encyclopedia is written “from a socially and American Conservative Christian viewpoint” in order to counter a perceived “liberal, anti-Christian and anti-American bias” in Wikipedia. Its editorial policies are guided by the “Conservative Commandments”.

Also to be taken into consideration in a survey of online encyclopedias are the offerings of those organisations which (in accordance with Jimmy Wales’s expressed philosophy and wishes) have copied Wikipedia, or parts of it, to their own websites, some of which permit further editing by visitors. Since each site downloads the copies at different times, they enshrine versions of Wikipedia articles which may subsequently undergo significant amendments. Such cyber-debris may therefore be misleading, or may preserve fossilised versions of controversial Wikipedia articles which have (long) since been ‘reverted’ in ‘edit wars’.

Caveat lector! (Online Encyclopedia readers should take care!)

PS If all of these bewildering sources of information become too much, it may be time for a brief ‘R & R’ visit to Uncyclopedia(hosted by Daniel Brandt).

Wikipedia’s Grudging Recognition of its Self-imposed Limitations. An Internet Case Study

26 April 2008

(This is a long blog, offering a digest of the important opinions of sincere and reputable commentators. Your patience and indulgence will be rewarded.)

Recent and foreseeable changes in the Wikipedia modus operandi are, in fact, a belated acknowledgement of the validity of the unrelenting pressure from its many articulate and brave critics. The changes and the reasons for them are also a further encouraging proof of the existence of what one Wikipedia critic, Andrew Orlowski, has dubbed “collective intelligence”, which must surely be seen as a complete antonym for the much-touted ‘Wisdom of Crowds’ (which leads, inevitably, to the creation and popular success of projects like Wikipedia in its present flawed form).

Sources of instant enlightenment on this ongoing Internet controversy:

Wikipedia itself dutifully chronicles 25 pages of criticisms:

http://en.wikipedia.org/wiki/Wikipedia_criticisms

Major sites and individuals critical of Wikipedia:

www.wikitruth.info (A Wiki-based site)

http://www.wikipedia-watch.org (Daniel Brandt)

www.theregister.co.uk (Andrew Orlowski)

www.wikipediareview.com (forums especially for disaffected Wikipedians)

http://uncyclopedia.org (Daniel Brandt – a brilliantly satirical Wiki-based site)

http://ascii.textfiles.com/archives (Jason Scott)

Also: ‘Criticisms of Wikipedia – A Compendium’, 4 January 2008, by The Review:

(This post was submitted to the forum by The Review’s resident Troubleshooter, Gomi, on January 1, 2008)”

“Gomi: For the New Year, I decided to attempt to compile a list of Wikipedia Review’s criticisms of Wikipedia. I have tried to approach this broadly — I don’t agree with all of these myself, but this is my view of the complaints that come up over and over again. One thing that is clear, after looking at Wikipedia for several years, is that these problems are not getting better, they are getting worse.”

*

Jason Scott

An ex-contributor to Wikipedia, information specialist Scott boldly and perceptively articulated many serious claims in his lecture, ‘The Great Failure of Wikipedia’ on 19 November 2004 (three years after the launch of Wikipedia). In response to an avalanche of Internet correspondence, including the sort of abuse often directed at “apostates” and whistle-blowers, Scott followed this lecture with two other important contributions in 2005, and a further one in 2007 (on the extraordinary and revealing Essjay scandal).

Here is Scott’s spectacularly vernacular verdict on Wikipedia’s performance:

“This is what the inherent failure of wikipedia is. It’s that there’s a small set of content generators, a massive amount of wonks and twiddlers, and then a heaping amount of procedural whackjobs. And the mass of twiddlers and procedural whackjobs means that the content generators stop being so and have to become content defenders. Woe be that your take on things is off from the majority. Even if you can prove something, you’re now in the situation that anybody can change it.”

(Jason Scott (Sadovsky) ‘The Great Failure of Wikipedia’ (19 November 2004) http://ascii.textfiles.com/archives/000060.html)

On Wikipedia as a concept and on the Wikipedia NPOV doctrine:

“Neutral Point of View is a doctrine about how Wikipedia articles should be written. Like wikipedia itself, it is a great idea in theory. In application, of course, it turns into yet another hammer for wonks and whackjobs to beat each other and innocent bystanders.”

Jason Scott (Sadovsky), ‘The Great Failure of Wikipedia’ (19 November 2004) http://ascii.textfiles.com/archives/000060.html

On the Open system model:

“I should mention that I’ve actually spent several years doing work for an organization, using software that is, basically, a Wiki. However, there’s only about 12 of us with access, and of the 12 maybe 6 are frequent contributors… And I thought this is how they all were. We just didn’t get in each others’ way. It was quite a shock to be on Wikipedia.”

(Jason Scott (Sadovsky) ‘The Great Failure of Wikipedia’ (19 November 2004) http://ascii.textfiles.com/archives/000060.html)

This is also a fundamental point by critic Nicholas G. Carr:

“ … the open source model — when it works effectively — is not as egalitarian or democratic as it is often made out to be. Linux has been successful not just because so many people have been involved, but because the crowd’s work has been filtered through a central authority who holds supreme power as a synthesizer and decision maker.”

(‘The Ignorance of Crowds’, May 2007
http://www.strategy-business.com/press/enewsarticle/enews053107?pg=all&tid=230)

Scott reiterated and clarified his position after intense Internet debate on his writings:

“ My primary disagreement with Wikipedia’s approach is not about expertise, accuracy or quality; it is about procedure energy dispersal […]. [… ] my issues as stated in my previous essay were not about whether Wikipedia was in competition with other reference sources, but how minor procedural decisions have essentially doomed it on its own.”

“ As an off-the-cuff example, Wikipedia has a login system, wherein for free and with no effort, you become a “Person”, an entity with a name and a history and even your nice little page that you can use to build a fun little world of pictures and information about your work on Wikipedia. It is essentially effortless, and it is pretty easy to create a mass of user accounts and foment your opinions in votes and other situations. […]”

“… they allow totally anonymous full-content editing by random users. In other words, no accounting at all. People don’t even have to submit to a rubber-stamp login process to begin screwing with entries that someone may have just spent hours getting just right. […]”

[…]

“ Wikipedia has a large contingency of users who play the Wikipedia Rules of Etiquette and Procedure like they were Role Playing Games and function within them causing havok and personal gratification at the expense of moving the project forward.”

“Academic review, experts vs. non-experts, use of Wikipedia as a replacement encyclopedia, and other such high-level concerns are way down the road and not my concern; my concern, and ultimately the reason why I have stopped contributing to the project (and why many others have, too) rests in aspects much closer to Wikipedia’s core.”

On Wikipedians’ reactions to criticism (of particular interest to ‘whistleblowers’ and those involved with illuminating the murky world of cults and fundamentalist organisations):

“Some days, I feel like I should have never written anything about Wikipedia, positive or negative. Like many cults, it has extreme members or well-meaning folks who do not understand what they are part of, and who take me on personally and then fall back into the ranks should I respond poorly. Some of them, should I respond within the confines of Wikipedia, point to the rules of discourse on Wikipedia and how I am breaking them.”

(Jason Scott 3 Jan 2005 http://ascii.textfiles.com/archives/000067.html)

A few months later:

“… I will rest my case on a single entry: That of the Swastika.

Here, contained in one entry, is everything that I have issues with regarding the implementation of Wikipedia as it currently stands with its rules. […]”

“With over 1,500 edits done to this entry over its 3 year lifespan, the process of becoming even slightly familiar with the editing pattern could be a full day’s work. […]”

[…]

“The story of the swastika’s entry continues after this, for over 1,200 edits. Dozens of people are involved, lots of facts are lost, many are gained… and you would be hard, hard-pressed to show why many of these folks should be editing the Swastika entry in the first place. Calling this “open source” and comparing it to programming projects is borderline insane: open-source programming projects have a core team with goals in mind that they state clearly, who then decide what gets in and what does not get in. Sometimes this works, and sometimes it does not, but people with anonymous IPs can’t just come in and fundamentally redo the graphics code on the program and then disappear, never to be seen again.”

(Jason Scott, 3 May 2005, ‘Swastipedia’, http:// ascii.textfiles.com/archives/000100.html)

*

In October 2005, Andrew Orlowski contributed the following opinions:

“Encouraging signs from the Wikipedia project, where co-founder and überpedian Jimmy Wales has acknowledged there are real quality problems with the online work.

Criticism of the project from within the inner sanctum has been very rare so far, although fellow co-founder Larry Sanger, who is no longer associated with the project, pleaded with the management to improve its content by befriending, and not alienating, established sources of expertise. (i.e., people who know what they’re talking about.)”

[…]

“ Traditionally, Wikipedia supporters have responded to criticism in one of several ways. The commonest is: If you don’t like an entry, you can fix it yourself. Which is rather like going to a restaurant for a date, being served terrible food, and then being told by the waiter where to find the kitchen.”

[..]

“ Thirdly, and here you can see that the defense is beginning to run out of steam, one’s attention is drawn to process issues: such as the speed with which errors are fixed, or the fact that looking up a Wikipedia is faster than using an alternative. This line of argument is even weaker than the first: it’s like going to a restaurant for a date – and being pelted with rotten food, thrown at you at high velocity by the waiters.”

[…]

“Re-working Wikipedia so it presents the user with something minimally readable will be a mammoth task. Although the project has no shortage of volunteers, most add nothing: busying themselves with edits that simply add or takeaway a comma. These are housekeeping tasks that build up credits for the participants, so they can rise higher in the organization.”

“And Wikipedia’s “cabal” has become notorious for deterring knowledgable and literate contributors. One who became weary of the in-fighting, Orthogonal, calls it Wikipedia’s HUAC – the House of Unamerican Activities prominent in the McCarthy era for hunting down and imprisoning the ideologically-incorrect.”

[…]

“One day Wikipedia may well be the most amazing reference work the world has ever seen, lauded for its quality. But to get from here to there it will need real experts and top quality writing – it won’t get there by hoping that its whizzy technical processes remedy such deficiencies. In other words, it will resemble today’s traditional encyclopedias far more than it does today.”

(Andrew Orlowski, ‘Wikipedia founder admits to serious quality problems. Yes, it’s garbage, bit it’s delivered so much faster!’ http://www.theregister.co. uk/2005/10/18/wikipedia_quality_problem/page2.html)

*

On the members of the Wikipedia community:

A satirical definition from http://www.wikitruth.info:

“What is a [serious] Wikipedian?

“You can set up a user account, start editing everything you can find, enmesh yourself into the politics, the lameness, the backstabbing and moronity, and fight an ever-present desperate whirlpooling battle of contract law, miserable personalities and microscopic anal details. You can run out of additional information to add to subjects you know, and instead tunnel deep into shit you don’t have the slightest notion about, using your intense knowledge of Wiki-jargon and gaming the system to fight every bastard who tries to change an article in a way you don’t agree with, or which might have any information you’re unable to garner in the first 5 matches of a Google search.” (www.wikitruth.info)

In its wiki article on Wikipedia, http://www.sourcewatch.org makes this critical point:

“Although experts on a subject may edit a page, they ultimately have no more control over the content of that page than anyone else. Contributors with unique knowledge of unusual subjects may be mistrusted by editors with general knowledge, or to put it less diplomatically, little or no knowledge, who rely on searches of other Internet sites to review new information. Administrators or editors might analyze writing skills or rely on opinions about a contributor to inform decisions when they have no knowledge of the subject of an article, or on a poll of individuals as ill-informed about the subject at hand as they are, themselves.”

Sam Vaknin adds:

“Lacking quality control by design, the Wikipedia rewards quantity. The more one posts and interacts with others, the higher one’s status, both informal and official. In the Wikipedia planet, authority is a function of the number of edits, no matter how frivolous. The more aggressive (even violent) a member is; the more prone to flame, bully, and harass; the more inclined to form coalitions with like-minded trolls; the less of a life he or she has outside the Wikipedia, the more they are likely to end up being administrators.”

(Sam Vaknin, ‘The Six Sins of the Wikipedia’, 2 July, 2006,

http://www.americanchronicle.com/articles/viewArticle.asp?articleID=11109)

A striking example of the many Wikipedia scandals of recent years, unearthed because of the persistence of a critic and the over-confidence of a prominent Wikipedian administrator with brazenly false credentials:

In July 2006, following a fascinating feature article on Wikipedia in The New Yorker by Stacy Schiff, Daniel Brandt posted this letter to the critical forum wikipediareview.com:

“Who is Essjay? I would love to ID this guy. I think he’s notable enough for his own biography.
He says that his username derives from his initials, S.J. That would suggest that his first and middle name, or first and surname, start with S and J. But it hasn’t helped my search.
He’s between 30 and 45, and teaches theology to undergrads and grads. He’s a tenured professor. He says that he teaches at a private university in the northeastern U.S., but I have my doubts about this also.
He says he has these degrees: Bachelor of Arts in Religious Studies (B.A.), Master of Arts in Religion (M.A.R.), Doctorate of Philosophy in Theology (Ph.D.), Doctorate in Canon Law (JCD)
I’ve searched on his degrees, and I’ve looked at religion-department faculty lists in the northeast by using this resource. No clues.” […]

(http://wikipediareview.com/index.php?showtopic=2778&mode=threaded)

Six months later, Brandt’s suspicions were confirmed. In February 2007, Wikipedia’s credibility suffered a further bodyblow when his evidence and an announcement in The New Yorker revealed that one of Wikipedia’s prominent administrators (recently promoted to be a salaried Wikia employee) did not possess the tenured professorship and four academic degrees that he had claimed on his User page and to the journalist Stacy Schiff. After further internal investigation and discussions revealed that 24 year old “Essjay” (with a tally of 16,000 edits) had used the prestige of his false credentials in edit disputes, he was eventually asked to resign by Jimmy Wales.

*

On Signs of Change in the System, Nicholas G. Carr:

“Aware of Wikipedia’s flaws, Wales and other contributors have been trying hard to improve the quality of the site’s content. A management team has slowly been taking shape, and it is establishing editorial policies and policing contributions. But even though this nascent hierarchy has already become much more bureaucratic than Linux’s lean managerial structure, it hasn’t yet been able to substantially improve Wikipedia. The failure appears to stem from the makeup of the supervisory group. Whereas the Linux team is a strict meritocracy, Wikipedia’s administrators represent a broader mix of contributors. They’re often chosen on the basis of how much they’ve contributed or how long they’ve contributed rather than on the quality of their contributions or their editorial skill. It seems fair to say that although the bazaar should be defined by diversity, the cathedral should be defined by talent.”

(‘The Ignorance of Crowds’, May 2007 by Nicholas G. Carr
http://www.strategy-business.com/press/enewsarticle/enews053107?pg=all&tid=230)

Nicholas G. Carr’s much earlier brief analysis of Wikipedia is also highly instructive and much wider-ranging.

“In theory, Wikipedia is a beautiful thing – it has to be a beautiful thing if the Web is leading us to a higher consciousness. In reality, though, Wikipedia isn’t very good at all. Certainly, it’s useful – I regularly consult it to get a quick gloss on a subject. But at a factual level it’s unreliable, and the writing is often appalling. I wouldn’t depend on it as a source, and I certainly wouldn’t recommend it to a student writing a research paper.”

(Here Carr gives a critique of two flawed Wikipedia articles (on Bill Gates and Jane Fonda). His analysis was so accurate that Jimmy Wales later admitted the need for improvements.)

Carr continues:

“The promoters of Web 2.0 venerate the amateur and distrust the professional. We see it in their unalloyed praise of Wikipedia, and we see it in their worship of open-source software and myriad other examples of democratic creativity. Perhaps nowhere, though, is their love of amateurism so apparent as in their promotion of blogging as an alternative to what they call ‘the mainstream media’.”

(Nicholas G. Carr, ‘The amorality of Web 2.0’, www.roughtype.com/archives/2005/10/the_amorality_o.php) LINK

*

Transcending the lessons offered by the case of Wikipedia, Carr’s magisterial conclusion to this important essay deserves the widest attention and diffusion in this increasingly ‘amoral’ cyberworld:

“Like it or not, Web 2.0, like Web 1.0, is amoral. It’s a set of technologies – a machine, not a Machine – that alters the forms and economics of production and consumption. It doesn’t care whether its consequences are good or bad. It doesn’t care whether it brings us to a higher consciousness or a lower one. It doesn’t care whether it burnishes our culture or dulls it. It doesn’t care whether it leads us into a golden age or a dark one. So let’s can the millennialist rhetoric and see the thing for what it is, not what we wish it would be.”

(Nicholas G. Carr, ‘The amorality of Web 2.0’, www.roughtype.com/archives/2005/10/the_amorality_o.php)

See also: Fluctuating  Specifications for Online Encyclopedias

Etymology, and False Etymology as a Rhetorical Device

13 April 2008

Etymology: “An account of, or the facts relating to, the formation or development of a word and its meaning; the process of tracing the history of a word. The original meaning of a word as shown by its etymology” (Shorter Oxford English Dictionary). For the English language, a majority of etymologies refer to origins in Old English, Germanic Languages, French, Latin and Greek. The origins of the word ‘etymology’ itself are to be found in two Greek roots: ‘etymon’ (true) and logos (word).

It is not essential to know anything at all about etymologies. Most people survive and prosper without even knowing what the word means. Nevertheless, such knowledge (or where to find it: in reliable dictionaries) often proves to be very useful or indispensable to those who deal closely with (or are interested in) the words of a language. An etymological consultation can also help to avoid serious errors and misunderstandings (and sometimes misleading pronunciations). For example, the differences in meaning between the visually and orally similar ‘manually’, ‘manly’ and ‘manic’ are easily explained by their etymologies: respectively from a) the Latin word for hand, b) ‘man’, and c) Greek ‘mania’. Similarly, any suspicion of a common relationship between eschatology / eschatological and scatology / scatological can quickly and safely be dispelled by noting the different Greek roots from which the eschat- (last) and scat- (dung) parts are derived.

The Spreading of False Etymologies

In about 630 CE, a Catholic Archbishop named Isidore of Seville published an important encyclopedic series of books in Latin. This reference work continued to be consulted by European Latin scholars for several centuries. In The Etymologies of Isidore of Seville, a recent English translation of this major ecclesiastical work by Stephen A. Barney and three other scholars (Cambridge University Press, 2006), the legendary poor quality of many of the etymologies offered is stressed and suitable samples are offered:

“Horses (equus) are so called because when they were yoked in a team of 4 they were balanced (aequare).” and “Humus (humus) was the material from which the human (homo) was made.” (I quote from a review by Emily Wilson.)

Another excellent example of how badly Isidore dealt with this minor aspect of his magnum opus (because of unreliable sources and, perhaps, lapses in research rigour) is offered by Adrian Murdoch on his typepad blog:

“The walking stick [baculus in Latin] is said to have been invented by Bacchus, the discovered of the grapevine, so that people affected by wine might be supported by it.”

Isidore’s, er, habit has nevertheless prospered in recent eras and in specific areas. There is some interesting evidence that etymological explanations seem attractive as a rhetorical device to prove a point, particularly in preaching, but also in other areas. If the promoters of beliefs are trusted by their readers or audience, impressive-sounding etymological proof will usually be accepted without demur, even if demonstrably false (‘false etymology’). In his book on cults, the Reverend Stephen Wookey refers to research which demonstrates the use of inaccurate quotations and false etymologies used to make a point by such preachers and orators. He quotes a blatant example by Mary Baker Eddy, the founder of Christian Science.

“The word Adam is from the Hebrew adamah, signifying the red color of the ground dust, nothing new. Divide the name Adam into two syllables [in English!] and it reads, a dam, or obstruction … it stands for obstruction, error, even the supposed separation of man from God and …” (Wookey, p. 338, from line 12 on). Baker Eddy goes on in similar vein, telling us all the negatives that poor Adam “stands for” for half a page. As Rev. Wookey comments: the Hebrew meaning is simply: man.

One of the clumsiest attempts at etymology for religious indoctrination purposes must surely be the one reported by William J. Petersen (Those Curious New Cults, New Canaan, Connecticut, 1973, p. 115). According to Petersen, one of the beliefs subscrtibed to by members of Herbert W. Armstrong’s Worldwide Church of God was that the British and the Americans are descended from the so-called Lost Tribes of the ancient Jewish people.’ As one of his ‘proofs’ of this peculiar assertion, Armstrong suggested that the word ‘Saxon’ was derived from ‘Isaac’s sons’.

Armstrong’s false etymologies are also dealt with in an easily accessible article, ‘The “Lost Tribes” of Herbert W. Armstrong’, in Catholic Answers Magazine: www.ewtn.com/library/ANSWERS/LOSTRB.htm.

Apparently, to further his thesis that the Lost Tribes settled in Britain and America, the preacher wrote a further piece of blatant ‘etymological’ indoctrination:

“The House of Israel is the ‘covenant people’. The Hebrew word for ‘covenant’ is brit. And the word for ‘covenant man’ or ‘covenant people’, would therefore sound, in English word order, ‘Brit-ish’ (the word ish means ‘man’ in Hebrew, and it is also an English suffix on nouns and adjectives). And so, is it mere coincidence that the true covenant people today are called the ‘British’? And they reside in the ‘British Isles’!”

And Armstrong’s disciples swallowed the false etymologies.

The Indian guru, Sathya Sai Baba, has also made frequent use of etymologies as a teaching tool. Many of these are unconvincing except to his unquestioning devotees, who consider him to be Omniscient (and he himself has made that claim). For example, SSB has offered his devotees an idiosyncratic etymology of the ‘Sai’ part of the name that he assumed in 1943: ‘Sai Baba’, from Sai Baba of Shirdi – the Muslim/Hindu saint who died 1918 – whose reincarnation he claimed to be:

Sa means ‘Divine’, ai or ayi means ‘mother’ and Baba means ‘father’. The Name indicates Divine Mother and Father …” (Sathya Sai Speaks, Vol. XII, 38:229. These Discourses are translated from Telugu and edited by the Sathya Sai Organisation)

On the real etymology of the original Sai Baba, scholars seem to be agreed. As Kevin R. D. Shepherd writes: “Sai is not a Hindu name, but a Persian word indicative of a holy man. It seems to bear an affinity with the Arabic sa’ih, which in the early medieval era of Islam was used to designate itinerant ascetics of sufi background. It appropriately reflects the Muslim background of the subject. ….” (KRDS, 1986, Chapter 2). See also Sathya Sai Baba’s Claim to be the Reincarnation of Shirdi Sai Baba

In addition to other inventive Sanskrit etymologies for words like Bhagavan, Guru, Hindu, Krishna, etc., Sathya Sai Baba, the alleged polyglot, has occasionally exercised his imagination on foreign terms. For example, here is one of his etymological explanations of Salaam (which most people know as the Arabic greeting: ‘Peace’).

“The Muslims use the term Salaam as a form of greeting. What does the word mean? ‘Sa’ refers to Sai, the Lord who is the embodiment of Truth, Awareness and Bliss (Sat-Chit-Ananda); ‘la’ means ‘layam’ (mergence). Salaam means merging in the Supreme, who is also the embodiment of Truth and Bliss.” (Sathya Sai Speaks, Vol. XVIII, 30:187)

Notice that in this example, SSB arbitrarily reduces ‘Salaam’ to ‘Sa’ plus ‘la’ (= ‘Sala’) to fit in with his extraordinary self-promotional interpretation.

(I have reported his different etymologies for Allah elsewhere on the Internet.)

More recently, a few writers of highly controversial works on history and archaeology (especially on the Internet) have also shown a preference for creative etymologies and other plays on words and names in order to support their contentious theses. (For the use of False Etymology in politics and propaganda, see the corresponding article in Wikipedia, to which this blog piece may be considered a supplement, at least by non-Wikipedians who do not reject the fruits of personal research.)

Gene D. Matlock, in yet another book on the lost Atlantis, puts forward the theory that there was an Atlantis in or close to Mexico. Part of his proof seems to be that there were Mexican “Sanskrit” place names like Atlán, Tlan or Tollán and that their inhabitants were called Atlantecas. (Those ‘Sanskrit’ names look like ordinary Mexican indigenous names.)

The author of a sensational best-selling book about a putative Chinese fleet which circumnavigated the globe in 1421-1423 (Gavin Menzies) offers as one of his exhibits news of an alleged inscription found in the Cape Verde islands. Menzies apparently attributes this to the Chinese Admiral Zheng He, but a critic (www.1421exposed.com) reveals (among many other inconvenient details) that Menzies himself admits that the inscription turned out to be written in the southern Indian language, Malayalam.

And finally, for now, a much-argued Internet thesis that there is a connection between Abraham and his wife Sarah and Hindu God Brahma and his consort Saraswati (“Sarai-svati” in this case) seems to have foundered on Wikipedia for lack of solid evidence and partly because “a major hole in this hypothesis is that Hebrew is not an Indo-European language, and that the etymologies for each word [offered as proof] are fairly different.”

(See Wikipedia Discussion page for ‘Brahma’; User: ‘Gizza’.)