Found this delightful blast from the past - back when Yahoo as a link collection and search engines did not yet rule the world. This was also way back when Tim Bray wasn't yet known as XML specification editor (XML didn't yet exist) or blogger extraordinaire.
The WWW5 conference had a panel on Internet Indexing that a former incarnation of Bray, then cofounder and VP of technology of document search and indexing company OpenText, was on:
- Tim Bray, OpenText: servers are getting free benefit from being crawled (exposure), yet crawlers do all the work! You servers should do your share of work! (metadata, update protocols, duplicate detection, canonical names, etc)
I don't see how a nonprofit org. whose content has no direct commercial value is going to be convinced of this, but Tim seemed arrogant, er, confident that even non-commercial servers would see the value in being crawled and make the extra effort. I agree that metadata and canonical names are a good thing anyway, but I didn't like the way he framed the argument.
- Tim also said OpenText was considering taking money to allow a particular site to be at the top of a results list for particular queries, "as long as they were clearly marked as such". I thought this was appalling and would undermine any credibility in search engines. I pointed out that there were already advertisement-filtering proxies, and it couldn't be very productive for anyone to just go down the path of users and advertisers fighting each other to see who can be the more clever. The reply was that in the long run this would produce more robust software, just as the "fight" between the cryptographers and the crypto-breakers has produced stronger algorithms. I think there is absolutely no parallel here, but I thank Tim Bray for helping me choose my future search engine.
- There was a call for a "crawling consortium", to
- develop crawling standards and eliminate redundant crawls
- establish metadata standards and solve the "text-inside-gifs problem"
- establish authenticated-crawler standards to address copyright protection
- Tim Bray thinks a crawling consortium won't work, since current services regard their crawled data as their primary advantage, not a commodity. Of course nobody said the crawlers had to give data away...Tim's justification was also that "brute force" still works just fine for crawling the whole web and doesn't waste a significant amount of bandwidth.
How long ago that seems in so many ways. Nobody believes in metadata anymore. Everybody believes in search engines none the less. Advertising in search engines is what it is. And sadly the only thing still going on is the copywars.
(meta ironic end commentary: Yes, I found this using absolutely no metadata, searching for "tim bray opentext" in a search engine that did not exist then. There were no ads in the search result. Don't know what that last fact means.)
Posted by Claus at August 15, 2006 12:37 PM | TrackBack (0)