Posted tagged ‘Yahoo’

Translation 36. Free Internet Translation Software: The Contest between Google Translate and Microsoft’s BING Translator. Russian and Hindi

13 June 2012

In my article Translation 33, I attempted a rough assessment of the efficiency of free online translation software offered by Google, Microsoft (BING), and the venerable Yahoo Babel Fish.
In this test both Google and Microsoft proved to be competent in French and Spanish (into English) translation (at this general level). My stated next step was to check the online translation of other languages with different scripts and/or syntax by taking a look at Russian (as an example of a different script, Cyrillic) and Hindi (both script, Devanagari, and syntax).  This is what is attempted in this new article (using short extracts from Russian and Hindi Wikipedias).

A preliminary and very topical comment to make is that further reference to Yahoo’s Babel Fish will not be possible here because, as of 30 May, 2012, Yahoo Babel Fish has been either subsumed into or replaced by BING Translator, as indicated in the following recent note from Microsoft:
“We are pleased to welcome Yahoo! Babel Fish users to the Bing Translator family. We have been working closely with our friends at Yahoo! to make this an easy transition, and Bing Translator is a natural upgrade to the experience with Yahoo! Babel Fish. We support all the languages you used with Babel Fish, and provide a superset of all the features.”

Let us now look at the Russian to English situation. The piece chosen is an extract of  172 words from the Russian Wikipedia article on the Indian writer, intellectual and activist, Arundhati Roy. (Section:Политическая деятельность (Politicheskaya dyeyatyel’nost’)  Political activities) The style is simple.Here are the two translations for comparison:

Google Translate: (http://translate.google.com)
Subsequently, Arundhati Roy has used his celebrity to draw public attention to important political issues. In a number of essays and speeches, it is opposed to nuclear weapons in India and neighboring Pakistan, as well as against Indian nationalism [citation needed 537 days]. She also took part in protest actions against the dam project on the Narmada River, as such projects are usually at the expense of the earth’s poorest and marginalized populations. Due to its popularity, Roy was able to attract the attention of national and international media to these problems.

Arundhati Roy’s literary activity is completely focused on illumination and critique of political and social themes. She opposed the so-called “war on terrorism” launched by the United States against the war in Iraq and against the policy of the World Bank and WTO. With its rigid stand it has become one of the best known environmental activists, anti-globalization movement and the peace.

In 2002 the High Court in Delhi has sentenced her to prison because she blamed the judges that they wanted to suppress the protests against the construction of a dam on the Narmada River. However, the symbolic conclusion was only one day. (196 words)

The Microsoft BING version: (http://www.microsofttranslator.com)

Subsequently, Arundhati Roy had used their celebrity to draw public attention to important political problems. In a number of essays and speeches, she spoke out against nuclear weapons for India and neighbouring Pakistan, as well as against Indian nationalism. She also participated in the protest activities against the dam project on the River Narmade, as such projects are carried out, usually at the expense of the poor and disenfranchised. Due to its popularity Roy was able to draw the attention of the national and international MEDIA.

Arundhati Roy completely literary activity focused on reporting and criticism of the political and social order. She opposed the so-called “war on terrorism” launched by the United States against the war in Iraq, as well as against the policies of the World Bank and the WTO. Thanks to its unwavering position it has become one of the most prominent environmental activists, peace and antiglobalizacionnogo movements.

In 2002, the Supreme Court in New Delhi sentenced her to jail because she accused judges that those would like to suppress protests against the construction of a dam on the River Narmade. However, the symbolic conclusion was only one day. (197 words)

As with the previous passages from French and Spanish, both Google and Microsoft convey an English version which makes good general sense, without reaching the standard of a professional translation. In spite of obvious flaws and errors, they are both useful to readers who do not speak Russian. More detailed examination of the translations and the Wikipedia originals will be of particular interest to professional translators and experts in Machine Translation (MT).

It would therefore seem reasonable to hypothesise that the Cyrillic script is not a special obstacle to “Western” Machine translation – just as it is reasonable to state that in learning Russian as a second language the Russian script does not present major problems in reading or writing, at least for Europeans.

Now we can move on to present for examination and comparison the performances by Google and Microsoft with an extract from Hindi Wikipedia (201 words; 10 sentences) on the same writer and activist (Section: kraaNtikaaree vichaar. Revolutionary idea). Warning: Readers may be bemused or irritated by these results.

Google

From U.S. imperialism, nuclear arms race, building dams on the Narmada, etc. Many
Local – international issues have been a voice against Arundhati Roy. But now
He believes that at least non-violent protest and civil disobedience in India
Movements is not the point.
Parliamentary system, part of the Communists and who believe in violent resistance
Arundhati admits that he caught in the Maoist ideology of Gandhi fan
Are not. In his words: – “After all, Gandhi was a superstar. When they are hungry – used to strike the
The hunger – strike were sitting on the superstar. But I do not believe in superstar politics.
If the people of the slums, hunger – strike that, no it does not care. ”

Arundhati Bajharwad that go on flowing in the flow of voices of protest in India
Being ignored. Against anti-people system, judiciary and media
Efforts have failed to impress. He said, “I understand that for us
Is very important to consider where we are right, wrong and where. We have given the arguments
They are right … But the violence did not work. ” (184 words)

Microsoft BING

US imperialism from the nuclear arms race, the Narmada dam, construction etc

Local-international issues that Arundhati Roy-voice ascendant are. But now
He believes in non-violent protests at India and civil disobedience
Movements not talk.
Became part of the parliamentary system and violent resistance in the count Communists
The Maoists of ideologies in which Arundhati admits that Gandhi’s andhabhakt
Are not. In their words-“end Gandhi a superstar. The hunger-strike, so they
The hunger-strike on superstars. But I do not mind you in the superstar.
If the people of a slum, a hunger-strike that it doesn’t care. ”
Arundhati believes that going away bazarvad flow-down of vowels in India
Unheard. janvirodhi system-the judiciary and media
Efforts have failed to impress. He said, “I think for us
It is important to consider where we are great, and where the wrong right. We gave arguments
They are right … But nonviolence is not effective.” (150 words)

These unsatisfactory performances (which, in my experience are not uncommon nor unrepresentative) clearly need much more attention and comment than the Russian translations above, or the French and Spanish ones. For Machine Translation, there is much more work to be done before satisfactory translations from Hindi to English (and some other languages) can be achieved.

From a reading of the English and without any reference to the original, the best that can be said of the translations is that they give glimpses of the subject material but they are not very useful. One can also see that the syntax is disjointed, many sentences are incomplete, and some references are inaccurate. In both Google and Microsoft versions all lines begin with a capital letter (which suggests a new sentence is beginning). From a comparison with the original one may add that the translations also offer some false information or impressions, as well as obvious problems with vocabulary identification and pronoun gender.

The reason why the Google and Microsoft translation systems have not yet been able to cope more satisfactorily with Hindi (and presumably with a number of other languages) is that they still have basic problems in identifying the complicated script, the very “different” syntax of Hindi and even the organisation of print, sentences and paragraphs.First of all, Hindi does not use upper case letters (nor italics or bold distinctions). Secondly, the main punctuation is a vertical bar as a full stop. Commas are used but often sparsely. The inability to deal with these characteristics must surely contribute to the peculiar look of the translations above, with initial capital letters at the beginning of each line.

Finally, let us look at the first sentence of the Hindi Wikipedia original (in transliterated form) to get a further glimpse of what can go wrong.

Amreekee saMraajyavaad se lekar, parmaanu hathiyaaroN ki hor, Narmada par baaNdh nirmaan aadi kaee sthaaneeye – antarrashtreeya mudhoN ke khilaaf avaaz bulaNd kartee rahee haiN arundhati raay. 

(my rough translation:)
From American imperialism, the nuclear arms race, to the construction of the Narmada Dam, etc., Arundhati Roy is raising her voice loudly on many local and international issues.

In the Hindi word order, a list of nominal groups is followed by “etc.” and then (literally) “several local-international issues against” (an example of the numerous Hindi “postpositions”, which are very basic and frequent sentence elements) and, finally, the sentence’s Verb and Subject (Arundhati Roy). Very different from: “From U.S. imperialism, nuclear arms race, building dams on the Narmada, etc. ManyLocal – international issues have been a voice against Arundhati Roy.”  and “US imperialism from the nuclear arms race, the Narmada dam, construction etc
Local-international issues that Arundhati Roy-voice ascendant are.”

I gave both systems a second chance by submitting the last part of that first sentence on its own. Without the cumbersome word order, Google did better but BING did not.

के ख़िलाफ़ आवाज़ बुलंद करती रही हैं अरुंधति राय
ke khilaaf aavaaz bulaNd kartee rahee haiN aruNdhati raay [roy]

Google: Arundhati Roy has been a voice against
BING: Is Arundhati Roy of that lofty-sounds
*

We must be grateful to Google and Microsoft for their valuable work on Hindi but we must also hope that the massive problems, briefly signposted in the above exercise, can be solved in the not too distant future. And similarly for other problem languages.

The next logical step would be to examine the quality of Google and BING translation from English into other languages. I will do my best at a later date, using the same four languages.

Au revoir. Hasta luego. Do sveedanya. Phir milenge.

Translation 33. Free Internet Translation Software: The Contest between Google Translate and Microsoft’s BING Translator

24 November 2011

Machine Translation (MT) software comes in many forms and in two specific categories: commercial, and free of charge. At the top end of the commercial offerings are sophisticated and expensive software tools used by professional freelance translators and translation companies in order to ease and speed up their laborious tasks. The name TRADOS is one of the most used. It offers packages costing from 600 to 2,500 Euros. At the lower commercial level there are many products costing between $60 and $120 for help with translating between the major European languages, or at least between English and those languages. For free Internet translation services, the current leader is Google Translate, closely followed by its recent challenger Microsoft’s BING Translator. Both produce fast but basic translations of all sorts of Internet material, in a very wide range of languages and language pairs. For a number of years, the earlier Internet leader was Altavista’s Babelfish. Under the Yahoo label, this free programme is still available and widely used but with the two younger competitors making fast progress with their more effective MT formulas, it is showing its age.

As a preliminary sample of MT, take the following absurdly easy test used by blog researchers comparing and rating ten budget software translation packages. From this site,
the references lead to this basic test, from Spanish to English.

“Abuela, ¿por qué tienes los ojos tan grandes?” Caperucita Roja preguntó. “Para que yo pueda ver mejor,” Dijo la abuela. “¡Oh, abuelita, ¿por qué tienes la boca tan grande?” “Para poder comerte mejor!” Entonces, la abuela salta de la cama.

They offer the following as a “Correct Translation” against which to compare the ten commercial contenders:
“Grandma, why do you have such big eyes?” Little Red Riding Hood asked. “So that I can see better.” the grandma said. “Oh, Grandma, why do you have such a big mouth?” “So I can eat better!” Then, the grandma jumps out of the bed.

For a description of the major three free Internet MT systems listed above and a judgement on their relative qualities, see John Yunker’s articles on the work of Ethan Shen, starting with this one and following the links).(Shen pronounces Google Translate as the overall winner.)

Another strong recommendation of Google’s quality and breadth of coverage as well as a clear explanation of the Google method is to be found in Chapter 23 of David Bellos’s recent wide-ranging book on Translation, Is That a Fish in Your Ear – by now a runaway bestseller.

The chapter offers a potted history of MT and expresses Bellos’s very positive view of the advances in MT achieved by Google, emphasising its novel approach to the task of MT. In a recent article, Bellos offers an edited version of pages 263-266 of that chapter (‘The Adventure of Automated Language Translation Machines’) in which, in characteristic manner, he succinctly explains the complex Google system to us:

“Using software originally developed in the 1980s by researchers at IBM, Google has created an automatic translation tool that is unlike all others. It is not based on the intellectual presuppositions of early machine translation efforts – it isn’t an algorithm designed only to extract the meaning of an expression from its syntax and vocabulary.
“In fact, at bottom, it doesn’t deal with meaning at all. Instead of taking a linguistic expression as something that requires decoding, Google Translate (GT) takes it as something that has probably been said before.
“It uses vast computing power to scour the internet in the blink of an eye, looking for the expression in some text that exists alongside its paired translation.
“The corpus it can scan includes all the paper put out since 1957 by the EU in two dozen languages, everything the UN and its agencies have ever done in writing in six official languages, and huge amounts of other material, from the records of international tribunals to company reports and all the articles and books in bilingual form that have been put up on the web by individuals, libraries, booksellers, authors and academic departments.
“Drawing on the already established patterns of matches between these millions of paired documents, Google Translate uses statistical methods to pick out the most probable acceptable version of what’s been submitted to it.”

Although he admits that Google Translate results are not always satisfactory, Bellos forecasts a rosy future for MT and for Google in particular as it improves and adds to its fabulous corpora in 58 language.

To give an idea of the standard of translation achieved by Google, and to give a glimpse of what Professor Bellos’s enthusiasm is founded on, I propose to offer and examine samples of translations into English from four languages. The additional factor is that BING (which offers 2-way translations to and from 37 languages as compared with the 58 Google pairs cited by Professor Bellos) will be subjected to the same tests, as evidence of this battle of the Free to Ether Translation Titans. (Results from Yahoo’s Babelfish are offered at the end of the piece.)

Firstly (in the current article) I present and compare translations from French and Spanish into English. In a later blog article I hope to offer similar material from Russian and Hindi (probably transliterated to fit in the WordPress system). From these disparate examples, we may be able to discern the strengths of the two software programmes and some of the problems which still remain to be overcome in the search for workable and useful translations into and out of all printed languages.

By way of Prologue to the proposed comparisons, if we try the ‘Little Red Riding Hood’ test sample on Google and BING, we get the following results.

Google Translate:
“Grandma, why are your eyes so big?” Little Red Riding Hood said. “So that I can see better,” said the grandmother. “Oh, Grandma, why your mouth is so big?” “To eat better!” Then the grandmother jumps out of bed.

There are two unsatisfactory translations here:
why your mouth is so big?” “To eat better!”

BING Translator:
“Grandmother, why you have such large eyes?” Little Red Riding Hood asked. “So that I can see better,” said the grandmother. “Oh, grandmother, why have the big mouth?!” “To be able to eat better!” Then Grandma jumps out of bed.

Again, two unsatisfactory translations, and for the same segment:
why have the big mouth?!” and “To be able to eat better!”

Both Google and BING completely miss the agglutinated Spanish pronoun in “comer” + “-te” (“to eat YOU better”), but, IMHO, Google is marginally in front of BING in the second listed infelicity.

Now let us move on to a more challenging test of MT ability. For this I have chosen short segments from French and Spanish Wikipedia on a topic of recent interest.

1.
French Wikipedia: ‘Crise financière mondiale débutant en 2007’
“La crise financière mondiale qui a commencé en 2007 est une crise financière marquée par une crise de liquidité et parfois par des crise [sic: = crises] de solvabilité tant au niveau des banques que des Etats, et une raréfaction du crédit au niveau des entreprises. Amorcée en juillet 2007, elle trouve son origine dans le dégonflement de bulles de prix (dont la bulle immobilière américaine des années 2000) et les pertes importantes des établissements financiers provoquées par la crise des subprimes. C’est la crise la plus grave de l’histoire des bourses de valeurs, après celle de 1873, découlant de la crise bancaire de mai 1873.
“La crise financière de l’automne 2008 amplifie le mouvement et provoque une chute des cours des marchés boursiers et la faillite de plusieurs établissements financiers. Pour éviter une crise systémique, les Etats doivent intervenir et sauver de nombreuses banques ce qui provoquera une crise de la dette publique en Islande d’abord puis en Irlande. Par ailleurs, elle provoque une récession touchant l’ensemble de la planète. Les finances publiques ont été lourdement sollicitées pour résoudre cette crise. Le déficit public s’est creusé dans de nombreux pays, après un recul du produit intérieur brut mondial de 2,2% en 2009.”
(200 words)

(Points for attention are italicised.)

Google:
“The global financial crisis that began in 2007 is a financial crisis marked by a liquidity crisis and sometimes by solvency crisis at both banks of the States, and a credit crunch at the company level. Began in July 2007, it has its origins in the bursting of price bubbles (including the U.S. housing bubble of the 2000s) and losses of financial institutions caused by the subprime crisis. This is the worst crisis in the history of stock exchanges, after that of 1873, arising from the banking crisis of May 1873.
The financial crisis of autumn 2008 amplifies the movement and causes a fall in equity markets and the collapse of several financial institutions. To avoid a systemic crisis, governments must step in and save many banks which will cause a crisis of public debt in Iceland first and then in Ireland. Moreover, it causes a recession in the entire world. Public finances were heavily used to solve this crisis. The deficit has widened in many countries, after a decline in global GDP by 2.2% in 2009.” (177 words)

Microsoft:

“The global financial crisis that began in 2007 is a financial crisis marked by a crisis of liquidity and solvency crisis sometimes both at the level of the banks of the States, and a scarcity of credit at the level of enterprises. Started in July 2007, it finds its origin in the bursting of bubbles of awards (including the the 2000 US housing bubble) and losses of financial institutions caused by the subprime crisis. It is the most serious crisis in the history of stock exchanges, after 1873, arising from the banking crisis of May 1873.
The financial crisis of autumn 2008 amplifies the movement and causes a collapse in stock market prices and the bankruptcy of several financial institutions. To prevent a systemic crisis, States should intervene and save many banks which will cause a crisis of public debt in Iceland first and then in Ireland. In addition, it causes a recession affecting the entire planet. Public finances were heavily sought to resolve this crisis. The public deficit widened in many countries, after a decline of 2.2% in 2009 world gross domestic product.” (169 words)

These are worthy attempts, useful to the general reader looking for a gist, and produced, on demand, in a few seconds. All that is needed to make them more reliable is shown below (in bold type).

Google, improved:

“The global financial crisis that began in 2007 is a financial crisis marked by a liquidity crisis and sometimes by solvency crises for both banks and States, and a credit crunch at the company level. Beginning in July 2007, it has its origins in the bursting of price bubbles (including the U.S. housing bubble of the 2000s) and serious losses by financial institutions caused by the subprime crisis. This is the worst crisis in the history of stock exchanges, after that of 1873, arising from the banking crisis of May 1873.
The financial crisis of autumn 2008 amplifies the movement and causes a fall in equity markets and the collapse of several financial institutions. To avoid a systemic crisis, governments had to step in and save many banks, which was to cause a crisis of public debt first in Iceland and then in Ireland. Moreover, it caused a recession in the entire world. Public finances were heavily used to solve this crisis. The deficit has widened in many countries, after a decline in global GDP of 2.2% in 2009.” (177 words)

BING, improved:

“The global financial crisis that began in 2007 is a financial crisis marked by a crisis of liquidity and sometimes by solvency crises both at the level of the banks and of the States, and by a scarcity of credit at the company level. Commencing in July 2007, it has its origin in the bursting of price bubbles (including the 2000 US housing bubble) and the serious losses of financial institutions caused by the subprime crisis. It is the most serious crisis in the history of stock exchanges, after the 1873 crisis, arising from the banking crisis of May 1873.
The financial crisis of autumn 2008 amplifies the movement and causes a collapse in stock market prices and the bankruptcy of several financial institutions. To prevent a systemic crisis, States had to intervene and save many banks, which was to cause a crisis of public debt first in Iceland and then in Ireland. In addition, it caused a recession affecting the entire planet. Public finances were heavily drawn on to resolve this crisis. The public deficit widened in many countries, after a decline of 2.2% in world gross domestic product in 2009.” (169 words)

2.
Spanish Wikipedia: ‘Crisis económica de 2008-2011’

“Por crisis económica de 2008 a 2011 se conoce a la crisis económica mundial que comenzó ese año, originada en los Estados Unidos. Entre los principales factores causantes de la crisis estarían los altos precios de las materias primas, la sobrevalorización del producto, una crisis alimentaria mundial y energética, una elevada inflación planetaria y la amenaza de una recesión en todo el mundo, así como una crisis crediticia, hipotecaria y de confianza en los mercados. La causa raíz de toda crisis según la Teoría austríaca del ciclo económico es una expansión artificial del crédito. En palabras de Jesús Huerta de Soto «esta crisis surge de la expansión crediticia ficticia orquestada por los bancos centrales, y que ha motivado que los empresarios invirtieran donde no debían”.
“La crisis iniciada en el 2008 ha sido señalada por muchos especialistas internacionales como la “crisis de los países desarrollados”, ya que sus consecuencias se observan fundamentalmente en los países más ricos del mundo.” (159 words)

(Points for attention are italicised.)

Google

In economic crisis from 2008 to 2011 is known to the world economic crisis that began that year, which originated in the United States. Among the main factors causing the crisis would be the high prices of raw materials, the overvaluation of the product, a global food and energy crisis, high inflation and the threat of global recession around the world and a credit crisis trust and mortgage markets. The root cause of all crises as the Austrian theory of business cycle is an artificial expansion of credit. In the words of Jesus Huerta de Soto “this crisis arises from the fictitious credit expansion orchestrated by central banks, and has motivated entrepreneurs to invest where there were”.
The crisis that began in 2008 has been noted by many international experts as the “crisis of the developed countries”, since its effects are observed mainly in the richer countries of the world. (150 words)

Microsoft

The global economic crisis that began that year, originating in the United States is known by economic crisis of 2008 to 2011. Among the major causative factors of the crisis would be high prices of raw materials, the sobrevalorización of the product, energy and global food crisis, high global inflation and the threat of a recession around the world, as well as a loan, mortgage crisis and confidence in the markets. Caused by following every crisis according to the Austrian business cycle theory is an artificial expansion of credit. In the words of Jesus Huerta de Soto “this crisis stems from the fictional credit expansion orchestrated by central banks, and that has motivated entrepreneurs to invest where wrong”.
The crisis which began in 2008 has been brought by many international experts as the ‘crisis of developed countries’, already that its consequences are observed mainly in countries richest in the world. (150 words)

Google, improved

The
economic crisis of 2008 to 2011 is the title given to the world economic crisis that began that year and originated in the United States. Among the main factors causing the crisis would be the high prices of raw materials, the overvaluation of the product, a global food and energy crisis, high inflation and the threat of global recession around the world and a crisis in credit, mortgages and market confidence. The root cause of all crises according to the Austrian theory of the business cycle is an artificial expansion of credit. In the words of Jesus Huerta de Soto “this crisis arises from the fictitious credit expansion orchestrated by central banks, and has motivated entrepreneurs to invest where they should not have done“.
The crisis that began in 2008 has been noted by many international experts as the “crisis of the developed countries”, since its effects are observed mainly in the richer countries of the world. (158 words)

BING, improved

The global economic crisis that began in 2008, originating in the United States, is known as the economic crisis of 2008 to 2011. Among the major causative factors of the crisis would be high prices of raw materials, the overvaluation of the product, a global food and energy crisis, high global inflation and the threat of a recession around the world, as well as a loan crisis, a mortgage crisis and loss of confidence in the markets. The root cause of every crisis, according to the Austrian business cycle theory is an artificial expansion of credit. In the words of Jesus Huerta de Soto “this crisis stems from the fictional credit expansion orchestrated by central banks, and that has motivated entrepreneurs to invest where they should not have done.”
The crisis which began in 2008 has been labelled by many international experts as the ‘crisis of developed countries’, since its consequences are observed mainly in the richest countries in the world. (161 words)

So, on the above evidence, both of these translation tools, Google and BING, offer a very useful BASIC – and lightning fast – FREE service for French and Spanish to English to millions of Internet users. (The situation of English INTO French and Spanish needs separate attention and may be dealt with in a future blog.)

For comparison, here are the results obtained Yahoo’s Babelfish with its updated but ageing technology. Note the higher number of italicised items, and their nature.

1. French to English:
“The world financial crisis which started in 2007 is a financial crisis marked by a crisis of liquidity and sometimes by crisis of solvency as well on the level of the banks as of the States, and a rarefaction of the credit on the level of the companies. Started in July 2007, it finds its origin in the deflation of bubbles of price (of which the American real estate bubble of the years 2000) and important losses of the financial institutions caused by the crisis of the subprimes. C’ is the most serious crisis of l’ history of the purses of values, after that of 1873, rising from the banking crisis of May 1873. The financial crisis of l’ autumn 2008 amplifies the movement and causes a fall of the courses of the stockmarkets and the bankruptcy of several financial institutions. To avoid a systemic crisis, the States must intervene and save many banks what will cause a crisis of the national debt in Iceland d’ access then in Ireland.
In addition, it causes a recession concerning l’ together of planet. Public finances were heavily requested to solve this crisis. The public deficit s’ is dug in many countries, after a retreat of the world gross domestic product of 2,2% in 2009.”

2. Spanish to English

By economic crisis from 2008 to 2011 it is known world-wide the economic crisis that began that year, originated in the United States. Between the main factors causes of the crisis they would be the high prices of the raw materials, the sobrevaluation of the product, world-wide an nourishing crisis and energetics, a high planetary inflation and the threat of a recession anywhere in the world, as well as a credit, hypothecating crisis and of confidence in the markets. The root cause of all crisis according to the Austrian Theory of the economic cycle is an artificial expansion of the credit. In words of Jesus Kitchen garden of Grove “this crisis arises from the fictitious credit expansion orchestrated by the central banks, and that have motivated that the industralists invested where they did not have”. The crisis initiated in the 2008 has been indicated by many international specialists like the “crisis of the developed countries”, since their consequences are observed essentially in the richest countries of the world.”
*

In a later blog, passages will be selected from two languages which are “more foreign” to English speakers, and for which less raw material has been available to the colossal Internet data banks on which Google Translate and Microsoft Translator rely for their lightning fast searches. The samples will be taken from Russian and Hindi, languages whose structures differ more basically from English than its familiar French and Spanish cousins.

Da svidanya. Phir milenge

(For a lighter and enlightening finish to this long essay, Google’s own explanation of its system is to be found here.)

Quantity versus Quality on the Internet

14 April 2008

Is the frenetic Internet party coming to an end? It is becoming less user-friendly than it was. As it has morphed into a huge hustling bazaar, the Web (2.0) has become more bewildering and intrusive. Most of the alluring and very useful products on offer are free (of monetary cost, at least). So many goodies: email, web browsers, information, chat groups and all sorts of forums, social network membership, video entertainment (YouTube), maps, blogsites, ping sites and myriad other facilities to spread knowledge about millions and millions of blogsites (like this one), and so on, and on, and on. Free services are being thrown at us from all sides. Since it’s Christmas every day in cyberspace, wannabe party poopers can hardly yell “Caveat Emptor!” or “Buyer beware!, but perhaps for many services, especially the free search engines and the well populated social networking empires, “User, take care!” may be a timely warning.

The sheer quantity of information exponentially available as the ‘bots’ dig deeper is so overwhelming and of such varying quality that new initiatives are sorely needed to sort it all in a more qualitative way, rather than offering the spectacular choice of 1 million superficially sorted search results in 0.15 seconds.

Google’s search results are highly praised and in the past I have found them useful, but, in my recent experience, the ranking system seems increasingly crude and uniquely vulnerable to clever manipulation by hundreds (thousands?) of specialised “Search Engine Optimisation” (SEO) companies and smart or unscrupulous invidivuals who can ‘play’ the vital backlinks game and skew the top ranking search results (especially the first page), often on the basis of a popularity count almost as artificially manipulated as an election in a dictatorship. (To get an idea of how big this booming SEO business is becoming, search for “SEO Services”. Yahoo served up 109 million results in 0.30 seconds.)

Although the Google Search engine is still well in front of chief rival Yahoo in numerical terms (5.6 billion searches in December 2007, compared with Yahoo’s 2.2 billion), Yahoo (which is possibly about to fall into the hands of salivating Google rival, Microsoft) has the edge (IMO) on Google in ranking quality and greater resistence to the arcane industry-condoned massage techniques which lead to highly profitable Search Engine Optimisation. In my recent experience, Microsoft, with a mere 940 million searches in December 2007, also often comes higher than Google in the quality of its rankings. In fact, between Yahoo or Microsoft plus the Metasearch engine Dogpile (which includes Google results), one could probably bypass Google’s current offerings with little noticeable loss, at least in the wide areas over which I hunt and gather.

Google hit the headlines when it announced that it regarded its countless billions of individual Searches as valuable archival material, to be carefuly stored, sifted and marketed as it sees fit. The mind boggles over what they might do with all that fascinating and basically private information. Although I do not lose any sleep worrying about my Search habits becoming known, I did recall the company attitude when hearing of Google’s free Gmail service with its virtually infinite storage space (their storage space), free organisation and search facilities for members’ emails. It seemed such a generous, almost philanthropic, gift to humankind in return for the money Google has made and rightfully expects to make from its advertising. Nevertheless, before leaping in to claim my 6 Gigabytes of storage space, and a Search facility on all my emails from cyberNanny Google (who adds the further bizarre promise that “you’ll never need to delete another message”), I actually READ the Terms of Service and was deterred by the image conjured up by Condition 4.4: (as well as utterly bewildered by Conditions 11 and 13).

4.4.: “You acknowledge and agree that if Google disables access to your account, you may be prevented from accessing the Services, your account details or any files or other content which is contained in your account.” (All my emails! All six Gigabytes? What would Google do with them? What would I do without them?) So I decided to continue to trust my own Hard Disk, my friendly Mozilla, and carry on deleting to my heart’s content for a while longer.

So much for the tentacular multi-billion dollar Google information empire. They have made it – BIG-time! Others are still starting up with similarly ambitious plans to attain algorithmically-derived financial nirvana. Take, for example, the Dredgers of Information on People (for People Profiles). This new set of cyber entrepreneurs are now following the Internet money trail, inspired by Google’s spectacular success in the mass farming and marketing of information. This sub-group of new IT companies are hoping to make rich pickings out of the simple automated process of massive robotic web searches for all scraps of information about people (and companies). Whatever their robots dredge up from cyberspace is then presented in a bundle under a person’s name. As simple (and as primitive) as that.

If you think about the ongoing problems of highly experienced Google in ranking their data results in a really helpful way, you may begin to get an idea of the potential for glitches and weirdness in the resulting mixtures of personal profiles. How many other people have YOUR name? Profiles for the Bill Smiths, even perhaps the George Bushs, of this world may become hopelessly mixed up in the ‘hands’ of the bots. As an experiment, try the two people search companies named below. Feed in any names you know and see how much of the resulting profile is relevant to that specific person. Check also to see if the robot profile gives a reasonably balanced portrait. Don’t forget to try your own name because that is where you may get the biggest surprise, perhaps seeing yourself ‘profiled’ along with “body parts” from others with the same name. Well, with only the robots to sort out the information and no human intervention (which is what the companies at present offer), what do you expect. Very scary!

1. Zoominfo.com

Zoominfo offers a subscription service to companies interested in other companies. It also offers free information on “People”. The People profiles offered are not restricted to businesspeople. The basis of Zoominfo’s presentation of its personal data farming operation is explained in a separate disclaimer statement, titled ‘How did Zoominfo get this information?’ One of the paragraphs is the following:

“Please note: ZoomInfo does not fact check its profiles and aggregation errors are possible. Additionally, ZoomInfo does not verify user-submitted information. Errors to your own profile can be corrected by updating your information. [They mean “correcting”.] Other errors or inappropriate content can be reported to ZoomInfo using our support form.” [There is something Orwellian about this “support form”, which is surely a reference to a complaint about the personal information found against a correspondent’s name.]

The nature and extent of the additional “user-submitted information” is nowhere explained. When I questioned Zoominfo on this, they declined to explain, several times. And when questioned on how robots had managed to compose several unbalanced profiles containing negative or disparaging references, the Support team was evasive and dogmatic. On most of these profiles, Zoominfo repeats the blanket disclaimer that it does not verify the information presented.

All in all, this seems an unsatisfactory offering in its present stage. When you check them out, if you don’t like what you see about yourself, or others, let them know. They may resist at first, or wave their disclaimer and robots in your face, but if you insist, they will remove any garbled or unwelcome information.

2. Pipl.com

This apparently newer kid on the block offers a similar service with the same sort of mixed results, but seems to cater for a younger demographic, especially targeting, with “deep searches”, the social networks and specialising in phone numbers, emails and information on old schoolmates. As with Zoominfo, I found the results mixed. There is no disclaimer. According to one IT blogger: “Pipl, a people search engine mentioned here a few days ago, also searches social networking sites. The usual cautions apply: You can’t assume anything you find is true, and you’ll have to find verification elsewhere.” (Mark Schaver, at http://www.depthreporting.com/2007_05_01_archive.html)

Although Pipl’s deep searches tempt us with “most of the higher-quality information about people” which “is simply ‘invisible’ to a regular search engine”, I also found oddly unbalanced (lower-quality) negative profiles about a number of older people.

The initial “Quick Facts” did not always refer to facts and the remainder of Pipl’s information on the names that I looked up was simply the Google result for that name. The Support team responses to my queries were slow, evasive, and basically unfriendly but I know of one or two colleagues who have had their profiles removed after lengthy exchanges of correspondence. Again, there was no satisfactory answer to the question of how the one-sided information was obtained in these selected cases. Another cyber mystery to be solved.

Perhaps the promised Web 3.0 will be able to offer us some overdue improvements in Internet quality, accuracy, and privacy protection.