Planet Woes

Planet is some RSS aggregator written in Python we're using over at Planet Inkscape. But maybe I should start at the beginning.

Lots of weird requests which resulted in 404s showed up in the log. Things like:

XX at http:/kaioa.com
XX at
XX
XXathttp:/kaioa.com

Where "XX" stands for the node id. E.g. "36" for this blog post.

After investigating it for a bit I found the shocking reason behind this. Well, not really that shocking... it's more on the silly side, really. ;)

My RSS 2.0 feeds look like this:

[...]
<link>http://kaioa.com/node/36</link>
[...]
<guid isPermaLink="false">36 at http://kaioa.com</guid>
[...]

The RSS 2.0 feeds from Planet look like this, however:

[...]
<guid>http://kaioa.com/36 at http://kaioa.com</guid>
<link>http://kaioa.com/node/36</link>
[...]

As you can see the isPermaLink attribute is missing. If it's missing it defaults to true, which in turn causes other readers/aggregators to treat that guid as URL. Ironically Planet does interpret that attribute for itself, but strips it from its own feeds.

isPermaLink="false" is used by Drupal and and WordPress. However, it only negatively affects Drupal's feeds, because WordPress' feeds happen to use guids which are identical to link. But that isn't a given and may change at some point in the future (well, it's unlikely).

Either way it's totally Planet's fault. I tried to track down the issue, but Planet's source is pretty hard to follow. Additionally "rss20.xml.tmpl" and the template stuff in general lack support for the isPermaLink attribute, which means that fixing it won't be that easy.

If you're wondering why I'm blogging about this instead of posting it on Planet's bug tracker... well, they don't have one. D'oh. I already contacted one of the authors, but so far I got no reply.

Comments

move planet to drupal

and get a maintained aggregator that is easy to extend. Cheers!

Actually, Planet is maintained...

...but it's maintained by someone else and it's called Venus. You should see if Venus has the same problems, and if so, report it to the mailing list. Venus's author is pretty responsive.

(As an aside, Venus is also easy to extend. At one point I had even written a plugin for it so that it could run PHP code to filter the feeds.)

Also: Atom is a good alternative

The Atom Syndication format is also a good alternative to the XML jumble that is RSS. It's an IETF standard with a clear spec (and a large number of testcases at the Feed Validator). There should be a template for that as well in Venus. I don't remember exactly if one is in Planet 2.0, though.

Eventually...

@#1

There are some vague plans to move the complete site over to Drupal at some point, but it doesn't look like it will happen soon enough. As far as I know I'm the most experienced person of the team when it comes to Drupal, which isn't all that obvious given that I'm using a stock theme here (it's even worse... it's the default theme).

Unfortunately all my spare time is allocated for more interesting (and hopefully more useful) things. And of course I don't really like spending my spare time on the kind of things, I'm doing all the time already. It's sorta like doing overtime for a break. ;)

Even if we would move right away, the problem would persist. There are lots of other Planet planets (another name would have been nice), which will continue to produce crippled RSS 2.0 feeds.

IMO it's sorta fun how Planet relays the bug over to the next layer. I initially thought it's a bug in the other aggregators.

@#2

Thanks for the info. I'll take a look. :)

Ah... it appears to be fixed in Venus

rss20.xml.tmpl line 13:

<guid isPermaLink="<TMPL_VAR guid_isPermaLink>"><TMPL_VAR id ESCAPE="HTML"></guid>

Neat. :)

edit: bug tracker - replace planet with venus
edit#2: Venus does indeed work very well. I just tried it myself to be sure.

Post new comment

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options