Twitter: over one billion tweets served. Actually, it’s probably more than that, since the count is from GigaTweet, an external service and not an official count. If we do the math, that comes out to:
140 chars per tweet x 1 byte per char x 10^9 tweets = 140 billion bytes = 130.4 GB worth of data
The 1 billion tweet mark took Twitter just over two years to achieve. Even assuming exponential growth, it’s hard to see Twitter’s raw tweet storage needs exceeding a terabyte ($109 and falling) in the next five years.
Of course, raw storage alone isn’t the whole story, since unlike the gigabytes of data on our home computers, the data on Twitter needs to be actively accessed and queried from the databases, which is a non-trivial task (as any regular user of Twitter over the past year can attest to). This is probably why Twitter has been enforcing a limit of 3200 tweets on users’ archives. The overhead on maintaining the archives beyond that is probably a threat to Twitter’s ability to maintain uptime and service reliability. The limit seems reasonable, since only the heaviest users will have reached that limit by now – I’ve been on twitter longer than most of the A-listers, and I tweet every blog entry I make from 5-6 different blogs, but I’m still only around 1200 tweets. Also, with far fewer followers (several hundred instead of thousands), I have only a handful of @replies compared to the firehose that folks like Darren (@problogger) or Scoble (@Scobleizer) see on their real-time clients. As a result, Twitter is more akin to an email/private messaging system for users like myself, rather than a real-time chatroom for the big users.
Still, even a casual Twitter user should be at least partially alarmed at the thought that their entire Twitter history is subject to arbitrary limits and no real guarantee of backup. As usual, it’s up to us to protect our own data, especially data in a walled garden (albeit one with handy RSS and API gates). Good user practices are the same whether we are using an online service or word processing at home, after all.
Here are just a few ways in which you can backup your tweets. I am sure there are more, so if you have any ideas I’ve not listed here, please share in comments!
Tweetake.com – This service lets you enter your username and download a CSV-format file of your followers, favorites, friends, and tweets. Unfortunately, @replies are not available for backup. It doesn’t save direct messages, either, but if you configure your twitter account to send you notification emails of direct messages, you can t least archive those separately. The CSV format is useful for archiving but not very user-friendly, though you could in principle import the data again into some other form.
Alex King’s Twitter Tools – this is a wordpress plugin that lets you manage your twitter account directly from your WordPress blog. The plugin lets you blog each tweet and/or tweet each blog post, and you can also generate a daily tweet digest as a blog post if you choose (and assign it to an arbitrary category). There’s no way to archive replies, DMs, or follower relationships.
Twitter itself supports RSS feeds, so you could slurp your own feed of replies and tweets using a feedreader and periodically back those up or even write them to disk. Also, users of third-party services like Socialthing, Friendfeed, or Ping.fm also have an alternate interface to Twitter that could potentially be used for backup. However, none of these provide comprehensive tweet archives either, only real-time mirroring.
Finally, Dave Winer has proposed a service/API that twitter clients can use to backup the RSS feed of a twitter account, but this is more of a technical solution of interest to twitter developers rather than end users.
UPDATE: Johann Burkard has written a great little tool in Java called (appropriately) TwitterBackup. It is a very simple piece of freeware that simply downloads all your tweets to a XML-format file saved locally. You specify a filename as you desire, and the tool is smart enough that if you give it the name of a file that already exists, it will only download newer tweets and append them to it rather than do a full download again. This incremental backup of tweets is ideal behavior – the only thing that this tool doesn’t do is preserve your follower/following relationships.
To be honest, none of these solutions are perfect, though Tweetake and Twitter Backup come closest. What would the ideal twitter backup tool look like? A few thoughts:
- be available as a desktop client or Adobe AIR application rather than yet another online service asking for your twitter password. ((Twitter’s implementation of OAuth or OpenID or some other authorization system is long overdue, by the way.))
- At first run, it should allow you to retrieve your entire (available) twitter history, including tweets, replies, and DMs.
- After the initial import, it should provide for periodic incremental backups of your tweets/replies/DMs, at an interval you specify (ideally, a five minute interval minimum).
- It should preserve your friend/follower relationships, and let you import everyone you follow onto any new twitter account or export all their RSS feeds as an OPML file.
What else? There’s definitely a niche out there for an enterprising developer to take Twitter’s API and create a tool focused on backup rather than yet another twitter client. Hopefully before I reach the 3200 tweet limit myself!