Posted tagged ‘Personal Data Mining’

Quantity versus Quality on the Internet

14 April 2008

Is the frenetic Internet party coming to an end? It is becoming less user-friendly than it was. As it has morphed into a huge hustling bazaar, the Web (2.0) has become more bewildering and intrusive. Most of the alluring and very useful products on offer are free (of monetary cost, at least). So many goodies: email, web browsers, information, chat groups and all sorts of forums, social network membership, video entertainment (YouTube), maps, blogsites, ping sites and myriad other facilities to spread knowledge about millions and millions of blogsites (like this one), and so on, and on, and on. Free services are being thrown at us from all sides. Since it’s Christmas every day in cyberspace, wannabe party poopers can hardly yell “Caveat Emptor!” or “Buyer beware!, but perhaps for many services, especially the free search engines and the well populated social networking empires, “User, take care!” may be a timely warning.

The sheer quantity of information exponentially available as the ‘bots’ dig deeper is so overwhelming and of such varying quality that new initiatives are sorely needed to sort it all in a more qualitative way, rather than offering the spectacular choice of 1 million superficially sorted search results in 0.15 seconds.

Google’s search results are highly praised and in the past I have found them useful, but, in my recent experience, the ranking system seems increasingly crude and uniquely vulnerable to clever manipulation by hundreds (thousands?) of specialised “Search Engine Optimisation” (SEO) companies and smart or unscrupulous invidivuals who can ‘play’ the vital backlinks game and skew the top ranking search results (especially the first page), often on the basis of a popularity count almost as artificially manipulated as an election in a dictatorship. (To get an idea of how big this booming SEO business is becoming, search for “SEO Services”. Yahoo served up 109 million results in 0.30 seconds.)

Although the Google Search engine is still well in front of chief rival Yahoo in numerical terms (5.6 billion searches in December 2007, compared with Yahoo’s 2.2 billion), Yahoo (which is possibly about to fall into the hands of salivating Google rival, Microsoft) has the edge (IMO) on Google in ranking quality and greater resistence to the arcane industry-condoned massage techniques which lead to highly profitable Search Engine Optimisation. In my recent experience, Microsoft, with a mere 940 million searches in December 2007, also often comes higher than Google in the quality of its rankings. In fact, between Yahoo or Microsoft plus the Metasearch engine Dogpile (which includes Google results), one could probably bypass Google’s current offerings with little noticeable loss, at least in the wide areas over which I hunt and gather.

Google hit the headlines when it announced that it regarded its countless billions of individual Searches as valuable archival material, to be carefuly stored, sifted and marketed as it sees fit. The mind boggles over what they might do with all that fascinating and basically private information. Although I do not lose any sleep worrying about my Search habits becoming known, I did recall the company attitude when hearing of Google’s free Gmail service with its virtually infinite storage space (their storage space), free organisation and search facilities for members’ emails. It seemed such a generous, almost philanthropic, gift to humankind in return for the money Google has made and rightfully expects to make from its advertising. Nevertheless, before leaping in to claim my 6 Gigabytes of storage space, and a Search facility on all my emails from cyberNanny Google (who adds the further bizarre promise that “you’ll never need to delete another message”), I actually READ the Terms of Service and was deterred by the image conjured up by Condition 4.4: (as well as utterly bewildered by Conditions 11 and 13).

4.4.: “You acknowledge and agree that if Google disables access to your account, you may be prevented from accessing the Services, your account details or any files or other content which is contained in your account.” (All my emails! All six Gigabytes? What would Google do with them? What would I do without them?) So I decided to continue to trust my own Hard Disk, my friendly Mozilla, and carry on deleting to my heart’s content for a while longer.

So much for the tentacular multi-billion dollar Google information empire. They have made it – BIG-time! Others are still starting up with similarly ambitious plans to attain algorithmically-derived financial nirvana. Take, for example, the Dredgers of Information on People (for People Profiles). This new set of cyber entrepreneurs are now following the Internet money trail, inspired by Google’s spectacular success in the mass farming and marketing of information. This sub-group of new IT companies are hoping to make rich pickings out of the simple automated process of massive robotic web searches for all scraps of information about people (and companies). Whatever their robots dredge up from cyberspace is then presented in a bundle under a person’s name. As simple (and as primitive) as that.

If you think about the ongoing problems of highly experienced Google in ranking their data results in a really helpful way, you may begin to get an idea of the potential for glitches and weirdness in the resulting mixtures of personal profiles. How many other people have YOUR name? Profiles for the Bill Smiths, even perhaps the George Bushs, of this world may become hopelessly mixed up in the ‘hands’ of the bots. As an experiment, try the two people search companies named below. Feed in any names you know and see how much of the resulting profile is relevant to that specific person. Check also to see if the robot profile gives a reasonably balanced portrait. Don’t forget to try your own name because that is where you may get the biggest surprise, perhaps seeing yourself ‘profiled’ along with “body parts” from others with the same name. Well, with only the robots to sort out the information and no human intervention (which is what the companies at present offer), what do you expect. Very scary!


Zoominfo offers a subscription service to companies interested in other companies. It also offers free information on “People”. The People profiles offered are not restricted to businesspeople. The basis of Zoominfo’s presentation of its personal data farming operation is explained in a separate disclaimer statement, titled ‘How did Zoominfo get this information?’ One of the paragraphs is the following:

“Please note: ZoomInfo does not fact check its profiles and aggregation errors are possible. Additionally, ZoomInfo does not verify user-submitted information. Errors to your own profile can be corrected by updating your information. [They mean “correcting”.] Other errors or inappropriate content can be reported to ZoomInfo using our support form.” [There is something Orwellian about this “support form”, which is surely a reference to a complaint about the personal information found against a correspondent’s name.]

The nature and extent of the additional “user-submitted information” is nowhere explained. When I questioned Zoominfo on this, they declined to explain, several times. And when questioned on how robots had managed to compose several unbalanced profiles containing negative or disparaging references, the Support team was evasive and dogmatic. On most of these profiles, Zoominfo repeats the blanket disclaimer that it does not verify the information presented.

All in all, this seems an unsatisfactory offering in its present stage. When you check them out, if you don’t like what you see about yourself, or others, let them know. They may resist at first, or wave their disclaimer and robots in your face, but if you insist, they will remove any garbled or unwelcome information.


This apparently newer kid on the block offers a similar service with the same sort of mixed results, but seems to cater for a younger demographic, especially targeting, with “deep searches”, the social networks and specialising in phone numbers, emails and information on old schoolmates. As with Zoominfo, I found the results mixed. There is no disclaimer. According to one IT blogger: “Pipl, a people search engine mentioned here a few days ago, also searches social networking sites. The usual cautions apply: You can’t assume anything you find is true, and you’ll have to find verification elsewhere.” (Mark Schaver, at

Although Pipl’s deep searches tempt us with “most of the higher-quality information about people” which “is simply ‘invisible’ to a regular search engine”, I also found oddly unbalanced (lower-quality) negative profiles about a number of older people.

The initial “Quick Facts” did not always refer to facts and the remainder of Pipl’s information on the names that I looked up was simply the Google result for that name. The Support team responses to my queries were slow, evasive, and basically unfriendly but I know of one or two colleagues who have had their profiles removed after lengthy exchanges of correspondence. Again, there was no satisfactory answer to the question of how the one-sided information was obtained in these selected cases. Another cyber mystery to be solved.

Perhaps the promised Web 3.0 will be able to offer us some overdue improvements in Internet quality, accuracy, and privacy protection.