Surprise, surprise - a Google backlash is going on. At GavinsBlog.com some really stupid paranoid thinking about Google can be found.
To the question 'Can we escape Google's crawlers?': I am sure they respect the robot exclusion standard. To the 'eerie 2038 cookie': You're still being tracked even if your cookies expire in 2 years. To the 'no privacy policy': Plain untrue - if you install the google toolbar you are made very aware of the privacy implications. And they don't sell your data. To the 'eerie tracking of IP adresses' : Classy.dk logs all the same information. All web servers tend to do.
Is the problem, quite simply, that Google works - and that technology that works so well at information gathering is scary? Efficient search clearly has privacy implications, but surely the real concern here is the publication of data, not the fact that the public data is utilized.
Some people are shifting to AllTheWeb in response. That's just great - AllTheWeb recently was purchased by the most evil company in search, Overture - the paid listing company, whose business is to make ads and content indistinguishable.
Gavin makes reference to Google watch a watchdog site. But the criticism there is as inept and largely unrelated to Google, which makes it hard to see why Google should suffer the criticism especially:
Again, webserver tracking is brought up. A legitimate issue but hardly Google specific. The use of search terms in referral URL's means I know what you're looking for when you reach my site - again hardly a Google problem. Supposedly some guy who used to work for NSA works for Google (Google-watch is here employing the fine tactic of guilt by association that I'm quite sure they would like Google and the government to refrain from). Did it occur to the whistle blowers that Google may need engineers with clearance to sell google search to secured government intranets?
The most ridiculuous criticism is that of PageRank as a monopolizing feature harming the openness of the internet. It's not that there isn't a problem it's just that the problem is not the one being discussed.
Here then is the real problem: If you're looking for something you have to read one text first. That's how attention works, not how Google works. That means that some ranking algorithm will apply.
The mode of your search plays an important role then in whether or not the dominance of PageRank is a good thing. If your mode is 'search for something specific' - i.e. the more and more popular 'Google as DNS' mode of search, where you know the content of your location but not the exact address, then it doesn't matter what the ranking algorithm is as long as it works. Then there is what you might call 'auction search' - where you are looking for something which has a large number of equally qualified providers, or to be more precise: A large number of providers you have no information about.
For that kind of search you might say that you want to stratify the usable results into equally qualified strata and then choose randomly among the searches within each stratum.
You could accomplish this by adding a small random number to the PageRank and ordering by this new rank. You would still get a usable overall ranking but it would be 'fair'. It remains to be seen if a real rank difference of, say, 0.10 has qualitative meaning or not. Whether this adds any value for the user of a search is doubtful
Generally speaking attention monopolizes. The complaints against Google are almost always 'supply-side'. The criticism waged at Google is of the 'I have a rank of 7 - and these 50 sites with a better rank are in the way' kind. Well, a rank of 8 will - for most searchers - translate to a more relevant site. And what's more. Having a site with a rank of 7 puts you in a much larger group than the sites with rank 8. So a fair distribution of hits among the rank 7 sites would not necessarily generate a lot of traffic for you. Your contribution would drown out because of an owerwhelming supply of information.
Posted by Claus at April 15, 2003 04:47 PMHmm. I have received hundreds of criticisms from people over my position on Google.
What strikes me is that I am labelled as some kind of paranoid freak that doesnt know that cookies are used by most top websites.
Quite the opposite. I am not under any illusions. I am aware of all the tracking abilities of websites, and im amazed even at my own ability to track referrers - which is how I am able to reply.
But I am not under any illusions about Google either. Any company that is in a position like Google will use that to their own advantage, it is only natural.
Any company that has the advantages Google has will use them - im not at Google because its Google - I am simply concerned at their practices, in a monopoly position, as a company.
Quite the same as I would be concerned about any company in a monopoly position.
The backlash againt me has been mostly negative - rarely do people agree.
I just want to err on the side of caution until such time as internet search engines are regulated - yes by the state - because i believe there is a conflict of interest.
Regards
Posted by: Gavin on April 17, 2003 12:49 AMAnd you are of course free to criticize. However, when none of your criticisms can stand scrutiny that is quite a problem. While your 'evil by default' approach to Google might be justified as a general approach to power, you have failed to come op with one case of abuse of power. There are some problematic cases - like giving in to threats of litigation or political pressure - but you make no mention of them.
Inasmuch as Google is turning into the public space for information it is of course problematic that this is a commercially controlled entity and an almost monopoly, but it is plain untrue that Google somehow bars your access to 'critical' or 'controversial' or 'competing' knowledge. The concept of PageRank presents an illusion that Google has ordered all webpages into a hierarchy and only the strong survive. But searching for words is much more specific than you might think. Lets assume there are something like 40000 'meaningful' words in English (the real number is much greater) and lets assume that searches are frequently for 2 words or more. That gives you over 1 billion different subsets of Googles page database that occur as resultsets for searches. Most of these subsets share very few pages so it is of no consequence that there is only one relevance algorithm. The result spaces for different searches are sufficiently disjoint that highly popular irrelevent pages do not obscure the result set, and that is why search works and a search engine monopoly is less of a problem. If you want to find and read som Google critical material your best bet is to search for it using Google.
It would be interesting to know the distribution of maximum PageRank among these subsets. It is most likely the case that very few of these subsets have a max pagerank of 10 and most of them have a max pagerank significantly lower.
No comments have been removed from this post.