I subscribe to about 1000 RSS feeds. What would be nice is if my reader could search through the feeds and sort posts based on my probable interest level. In addition, it would categorize those posts into groups of similar content. Currently, most readers provide a search feature to filter posts on a particular keyword to help manage noise.
The reader would analyze the content of a new post, compute via Bayesian techniques the probability of interest given the probability that similar posts with similar keywords and properties from the same feed (and other feeds) have been interesting in the past. There would need to a way to rate each post. Rating could be done explicitly by clicking a set of buttons, though I am not in favor of this approach. The ideal way would implicitly infer a rating from a post through a user's actions such as deleting quickly or saving a post, in the same way that WinFS automatically paints metadata into documents through file operations (copying, for example, a document to a folder automatically generates keywords from the name of all the parent folders). Under the same technique, I could also see which of my feeds tend to generate mostly or all noise, and delete those underperforming feeds.
For categorizing similar posts, this could be done explicitly by setting up individual group that match on certain keywords. Or implicitly, by using one of a number of simple clustering algorithms to collect similar posts.
Another possible mechanism would utilize collaborative filtering, such as when Amazon examines an book that you order and automatically makes additional recommends based on what other buyers have bought. There would need to be centralized site on web that collects aggregate statistics from multiple users; this is probably more relevant for web-based readers like BlogLines than for desktop readers.
i've been doing this for over a year now, and it works. for my world news aggregation, i find similar posts and clump them together, reducing the redundancy. for other sites, it's a matter of discovering topics and ideas that interest me. i know of people who read usenet and/or RSS using a bayesian filter, too, and it works well. finally, you can have a feedback mechanism find you "more posts like this one" as a way to surf through posts. in short, it's been done, and yes, it works. it's pretty much the only way you can scale beyond a few dozen highly active feeds a day without getting caught in a huge time sink.
Posted by: jose | August 18, 2004 at 06:23 AM
A number of us talked about this at the Blogging BoF during the 2003 PDC. I think it could be done with a Basian filter built in to the aggregator. Kind of like how TiVo has "thumbs up" and "thumbs down"... Let the Basian filter find things that I'm likely to like and push down those things I'm not likely to like. :)
Posted by: Peter Provost | August 18, 2004 at 09:16 AM
Oops... Bayesian not Basian.
Posted by: Peter Provost | August 18, 2004 at 09:17 AM
While not as sophisticated as using Bayesian filters, I get by pretty well using Newsgator in Outlook 2003. I use Outlook's Search Folders to pull together posts from Newsgators folder structure, pull the search folder to favorites, then do all the "browsing" in the search folder.
Posted by: Mark | August 21, 2004 at 09:20 AM