Standout phrases on amazon - Notes from Classy's Kitchen

April 10, 2005

Standout phrases on amazon

Amazon has a new interesting feature based on the full etxt data they have because of the Search Inside feature. They show you the phrases from a particular book that are statistically improbable, i.e. standout phrases, phrases that are unique to a particular book. This is very useful, I'm not sure it's surprisingly useful, but it's certainly useful.
I am reminded of an IBM research paper on hierarchical bayesian categorization which used similar ideas to obtain useful hierarchical categories of documents. Since I read that paper I've been wondering when we would see this applied in the real world, but no search engine seems to have emerged from the IBM project.
Oddly related projects: Technorati "related" tags and by extension, applications of Yahoo's term extraction service - this is like open sourcing the context algorithsm underlying e.g. Adsense.

Posted by Claus at April 10, 2005 07:32 PM | TrackBack (0)

Comments (post your own)

Help the campaign to stomp out Warnock's Dilemma. Post a comment.