Bayesian filtering of RSS feeds – can you automatically find interesting journal articles?


In Aggregating sources for academic research in a web 2.0 world, I wrote about keeping up with your research using RSS feeds from

traditional databases (citation alerts, table of contents of favourite journals), library opac feeds of searches and new additions, book vendor sites (e.g Amazon) book sharing sites (e.g LibraryThing), social bookmarking sites both generic (e.g. Delicious) and research 2.0 sites (e.g. citeulike), Google alerts and more

The main problem with this of course is that you quickly get overwhelmed with results. In many cases you can’t create a custom RSS feed (e.g. Many libraries provide RSS feeds of “new additions” in broad subject areas like Economics) and even in instances where you can , say a EBSCOHOST database search in RSS, even the most finely tuned search query can often bring up quite a lot of irrelevant results.

The answer is of course filtering. Bayesian filtering has proven very successful in categorizing mail into good mail and spam, but it can be generalized to  classify text into an arbitrary number or type of categories.

Can one do the same on RSS feeds? In particular RSS feeds from Table of contents from journals? The idea is for the bayesian filter to learn what words tend to occur in articles (abstracts rather) you find interesting, and classify them into “interesting” and not “interesting”

I’m aware of 3 services that do bayesian filtering of RSS feeds. 2 are web commercial services (FeedZero and  Feedscrub) and one is a open source project (SuxOr).

For longer more rambling posts see my more detailed blog post here

GoogleWave – First thoughts

It has being a crazy week, I was stressing out having to give my first ever presentation at the Libraries of the future seminar (with the new presentation tool Prezi !).

Google decided to make things more complicated by sending me an Invite to Google Wave! I promptly gave it out to librarians I knew on Twitter and settled down to play with it.

First off , it’s supposed to work in Firefox and Google Chrome. But many people have reported that it’s slow and unstable in Firefox, and that has being my experience as well, so I use Google Chrome for now. It’s still slow and not totally stable but it’s far worse in Firefox.

Google wave as a email/IM hybrid

Google wave is hard to describe, but it’s basically a Email/IM/Wiki hybrid.

You “wave” to one or more googlewave accounts by adding your contacts to a new wave, similar to the way you add email addresses when emailing. If people you have on your google contacts have a wave account they automatically appear as one of the possible contacts. The image below shows me starting a new wav.

Chances are though , you will have no one to wave to at first, so you have to figure out what their wave addresses are, or find some public waves to interact in.

What are public waves?

Like email conversations, you can usually only read waves you were added to as a contact. It is however possible to make a wave “public”, so anyone with a wave account can read it (see this on how to make a wave public).

You can do a search with:public keyword search in the middle pane, to find public waves. I like to do a search with:public librarians

There are waves such as the Librarians wave directory that lists librarians on Google Wave, or you can go to any wave, click on the row of accounts listed in the wave, and add them to your contacts

The interesting part is that if the people you wave to are online, you can see them type their responses in real-time and by real-time I mean you can see them type their responses letter by letter! You can also respond in real time, so you can respond mid-way even before the other party has answered.

It’s a novel experience, particularly if you have not used real time collaborative  tools like Etherpad or Googledocs before.

Each wave you see would include a threaded history of the conversation so far, and you can add new people to the wave at any time, and they would have access to the whole conversation.

When you view any wave that has changed, any new wavelet (a message in the wave)  or changed wavelet (see later) will have a green border around it. You can click on space bar, to quickly jump to these wavelets.

There is also a “playback” mode that allows you to see how the wave changed with time, who added new wavelets etc.

At this level of use Google Wave is just a email replacement, with the added advantage of being able to react in real time like IM, if your contacts are online. The presentation also reminds me of a threaded web-based Bulletin Board forum

Google Wave as a email-IM-Wiki Hybrid

An interesting twist is that the messages you type as well as those added by others can be edited/revised at anytime by anybody already on the wave.

This is of course based on the wiki concept, with similar history tracking features.

Embedding widgets, robots

Google Wave also makes it easy to embed widgets from IGoogle or OpenSocial gadget. But I found this really interesting extension that allow you to embed anything or html into the wave! So you can embed anything from Slideshare widgets to searchbox widgets or anything else with the correct html sniplet.

Google Wave also allows you to create robots which are automated agents that respond to events in the wave to carry out automated tasks. I don’t have the programming chops to work this out yet, but here’s a interesting bot that looks for ISBN13 and replaces with a book cover.

You can also embed the wave into blogs, webpages etc, but it isn’t as easy as simply copying and pasting html. In this area,  Friendfeed is much more user-friendly with similar real-time functions. Mashable has a nice explanation of this and more.

First thoughts

Google wave adds yet another possible communication tool to libraries. With libraries struggling with new communication channels such as Instant Messaging, Text messaging, Twitter, Facebook and more, it is a interesting problem to have.

To me the obvious use of Google wave would be as a replacement to email. Once Google Wave becomes ubiquitous like email or gmail or if institutions implement their own Wave platforms (it’s an open platform), I suspect all libraries would use this routinely to answer queries.

It has all the features of email with added functions of Instant messaging.  My experience manning email library accounts is that more often then not, library users give you insufficient information to help them and you desperately want to ask them more questions in real-time. Currently I either pick up the phone and call them, or possibly invite them to a meebo chat site or use services like Tinychat.

Not all of them will respond and even if they do, bringing the conversation to another location, means needing to keep track of the transaction on another communication platform (logs etc).

Google Wave makes all this seamless.

I’ve being racking my brain to see if anything currently done with wave cannot be done with email and so far I haven’t come up with much.


Currently there are many issues with Google Wave, which is not surprising given the innovative nature of the service.

I think it’s quite complicated to use, and being beta the interface needs tons of work, so much so that many people (almost all whom are geeky early adopters) are struggling with it.  So it definitely isn’t ready for the masses for a long while.

The main one is that I haven’t found a way to be automatically notified in a popup that someone waved to me, or added me to a wave. (The ‘Ping’ mechanism is clumsy), leading to a strange situation where people are co-coordinating with each other via Twitter/IM first before going to Google wave to communicate.

For instance I have being using Twitter with @digicmb and @mlibrarianus to connect with them first before going to test wave.

My first library transaction over wave

Somewhat interestingly, I got a wave from a library user which would be my very first ever library transaction conducted via Wave (permission granted from user).

Nothing particularly interesting, at this level it works just like email, or rather gmail with threaded conversations or a normal web-based forum, particularly since we were not both online at the same time. If only Google wave could send me a popup notification of a reply, so I could respond instantly if desired.

The wiki-like functions of Google Wave isn’t necessary a boon, in most cases, I don’t really see the need to allow anyone to edit everything. Currently there doesn’t seem to be any controls allowing you to turn that off, though there seems to be a provision for “read-only” messages that hasn’t be turned on yet.

Also as @eagledawg on Twitter pointed out to me, while waves are private by default, anyone you include in a wave, can invite anyone else to join, there is no way to control this.

Other use cases

Obviously people are still trying to figure out Google wave. I have written in the past about mashups/services such as

These pieces were written with the full understanding that Google Wave might make all the ideas there irrelevant eventually. In theory Google wave could be a component of the use cases above (probably as a replacement for meebo, or Friendfeed real-time widget), or it could be used on its own.

For instance the Rssybot which allows you to watch RSS feeds in wave, seems to have a lot of potential.