wesley tanaka

Yahoo Pipes

‹ The Physics of Penguin Poop | Questions about producing light (or heat) ›
()

I use pipes.  A lot.  So yahoo pipes, which promises to do for RSS feeds what unix pipes do for line-delimited text files, sounds like it might be exciting.  But pipes are only as powerful as the tools you string together with them, so I'll postpone my excitement until I try it out.  Which I think I'm going to do now by seeing if I can create a "popular digg stories which don't suck" RSS feed.  Drinking out of the digg information firehose was proving to be too Herculean a task, so I unsubscribed a few days ago.

Update: I played around for a few minutes.  Here are my initial impressions:

  • Your "pipes" are drawn as directed (acyclical?) graphs.  Edges in the graph are the pipes, and nodes in the graph are commands.  Pipes can be composed.  There are a few different data types that can be passed along the edges, the most interesting being the "RSS feed" datatype.  Input and output parameters are typed, so doing something like trying to drag an RSS output onto a node which takes a text input doesn't work.
  • It's painfully slow in Firefox/Linux.  I assume it's faster in Windows, like Yahoo Mail Beta is slow in Firefox/Linux but fast in Firefox on Windows.  I wish firefox or Yahoo would fix whatever causes that.
  • When you create a "pipe", you get an output node automatically (I assume that there is an RSS feed corresponding to the output node of every pipe created).  However, for input you're presented with a menu of: "Yahoo! Search", "Yahoo! Local", "Fetch", "Google Base", "Flickr" in that order.  it turns out that "Fetch" is the one that allows you to grab any RSS feed.  About half of the time I spent in the interface so far was spent trying to figure that out.  It should be named "Fetch Any Feed" and be at the top of the list (or even in its own section), not in the middle.
  • I would have really liked a "Bayesian N-Gram Boolean Categorizer," or whatever spam filters do these days.  It could even be named "Spam Filter," although it would be useful for so much more than that.  It would have two outputs: the left output would be the items which "passed" the filter (the ham), and the right output would be those that "failed" (the spam).  Every item on both outputs would have an extra attribute added which contained two links for training the filter (the "this is not spam" and "this is spam" links).  Alternatively, if there's no way that the architecture could support two outputs, you could have two nodes which referred to the same logical categorizer instance, one node for "spam" and one node for "ham."  I think this would have worked well for filtering out the uninteresting items from the digg feed.
  • However, with that exception, the built-in operators are high-enough level to be interesting...  if they work as advertised.  I tried the "content analysis" operator which appears to try and extract "interesting" keywords from each item in an RSS feed.  Maybe good for autogenerating tags or clustering items into groups.  Probably not good enough for filtering.
  • They call the boxes "pipes", when in the UNIX analogy, the links between the boxes should be called "pipes".  Probably the marketing department trying to ameliorate their product's painfully geeky name.

In terms of my original goal, it seems possible to subscribe to all digg popular items, and then filter out categories that aren't interesting (rather than needing to subscribe additively to each and every digg category feed).  But I guess I was really hoping for some kind of machine learning node.  Maybe by the time it's out of beta...

Another update: I'll play around with the filter module for a few days and see if it can do enough of what I want to make it worthwhile subscribing to digg again.  It will be here, but I don't know if I'll end up using it yet.  And one more impression from the first time I looked at the yahoo pipes page a few days back:

  • I thought it was cute how they integrated the yahoo avatar.  Yahoo seems like it's always been good about integrating its products together.

Actually, I'll just keep adding impressions as they come to me.

  • Instead of a "publish" button, there should be a checkbox or something which toggles whether or not the pipe is public.  It's not clear if the publish button needs to be pressed every time you save, or just the first time.  The more logical copies of something exist ("the version on the screen", "the saved version", "the most recently published version"), the more confusing a piece of software gets.
  • There doesn't seem to be a way to create a pipe which takes an RSS feed as an input.  The closest you can come is to take a URL to an RSS feed as an input.  But that makes it kind of ugly if you want to pipe the output from one pipe in the library into the input of another pipe in the library -- you have to find the feed URL of the first pipe and use that as the input into the second pipe.

Suggested Links

Syndicate content