Recently inBlog & Tweet Iexplained why I wanted to make my Twitter history a part of the publicationyou are now reading. Along the same lines, read Dave Wineron the importance of the historic record and the general goodness of staticfiles behind an Apache server.This post outlines how it works, with source code, and draws a conclusion.

But First, the Conclusion

There?s one tweets-blob a week, posted early Monday and coveringthe period from the previous Monday morning to midnight on the just-endedSunday. The title is always ?Short-form Fragments? because you know,microblogging might not always mean just Twitter. Also, they havetheir own category, which you mightfind worth a glance.

Each tweet is flagged by weekday and time, and gets its own fragmentidentifier thus URI thus is a Web Resource, which was necessary for myhappiness. The shortened URIs have been lengthened whenever possible, and anattempt made to use the results both as title and link.

I?m happy I did this. Clearly it?s irritated some people, and at least onehas actually unsubscribed. To those people, sorry. I think bybatching ?em up so that there?s only one intrusion a week, andgiving them a distinctive title & category, I?ve made the task of ignoringthem acceptably lightweight. So, once again my apologies to those who areoffended, but I?m not going to take them out; it?s important to me to havemy contributions to the Net here under my own control, to the extentpossible.

Step One: Backup and Dump

The Twitter API is straightforward, and I was gettingready to write the code to pull out my Tweet history when I ran acrossBackupMyTweets, which does just whatit says.

I downloaded their XML dump and wrote a script to transmogrify thatinto the Monday-to-Sunday batches. I didn?t even use an XML reader, theformat was regular enough to be line-oriented. The only nit was thatangle-brackets were double-escaped.Stomp that roach,Keith!

Step Two: Archaeology

The XML backup reached back toValentine?s day, 2008. But I was pretty sure I?d started tweeting beforethen.

So I spelunked around the Twitter API and, being too lazy to code, just used curl(1) to reach back in time. Thisgot me to January 30. Since Twitter?s own XML is different from BackupMyTweets?,I had to write another script to generate a few more weekly blobs.

There are more;my first Tweet was in March2007. But I can?t get the Twitter API to show ?em to me.

Google Reader?s view of my tweet stream?s Atom feed includes them all,presumably because it started caching them way back when. While I can see them there in the browser, I can?t manage to save them; ?ViewSource? shows a nauseating tangle of JavaScript. Someone suggested visitingYahoo! and using YQL to plumb that Atom feed, but I couldn?t get that to workeither.

The correct answer, of course, is for Twitter to just bloody well giveme a copy of my own damn written words which I donated to their app.Since they seem to be reasonable people, I?m assuming they agree in principleand it should become possible.

Step Three: Production

As of this week, there?s a script that runs early every Monday,to pull the previous Monday-to-Sunday span out and turn it intothe input form for my publishing system.

Here it is. It?s under 200 lines ofRuby and I assert no copyright nor do I claim any rights, and if you askquestions or for other support, I may just laugh cruelly.There are no unittests and the error-handling is laughable. It is not an example of excellencein XML processing or HTTP wrangling or Ruby engineering or anything else.Well, there are a couple of things about sunday.rb thatmight bring a smile to the odd geek face.




More...