The other day, I got an email from Virgin Media stating that my connection had been "upgraded to 100Mb/s". I went to a bunch of speed-testing websites, and reported speeds were indeed much higher than in the past. I was tempted to brag about it on Facebook then I remembered that, last time I did something similar, I was humbled by a bunch of Dutch friends with "big pipes". I wondered what sort of speed they reported then, so I went to Facebook to search for that old status. And that's where my problems started.
The standard FB search UI failed to return anything even vaguely related, as usual. So I started googling for apps that would allow me to search my previous posts, and found a few which just wanted to gather all my personal data (on FB -- you don't say!). Then I found that you can actually request a complete download of all your data from FB (under Settings) and launched the request, but it looked like it would take a long time (for the record, I finally got it about 24 hours later). So I thought "hey, surely I can work with the FB API". How naive of me!
There is, in fact, a straightforward API call to get your statuses: /me/statuses
. By default, it will return some 25 records, and links to further paginated results. Except pagination is ridiculously buggy: after the first 50 records, it will just return a blank page. If you try to use the limit
parameter, it will return a maximum of 100 records per page, and again it will stop after the second page (i.e. max 200 results, which it's actually 199 because everybody knows "there are only two hard things in computer science"). Time-based parameters (until
, since
) didn't seem to work at all. Using wrappers rather than direct calls didn't seem to make any difference. Being very late, I gave up and went to sleep.
A day later, still incredulous and obviously fairly frustrated, I googled harder and finally found a relevant question on StackOverflow, which pointed to a Facebook bug about pagination. As the bug says, you can work around the problem by always using offset
rather than relying on 'next' and 'previous' links returned in JSON responses. I verified and that's actually the case. By now, my export was available for download anyway. You can imagine how happy I am (not).
Lessons learnt from this whole debacle:
- The unofficial facebook-sdk for python doesn't work with Python 3. There is an experimental fork with very limited testing (i.e. it passes 2to3 and that's it).
- the
json
module in Python 3 Standard Library, as used by facebook-sdk, chokes on Facebook data. Don't even ask me how I found out. Trying with a more up-to-date version from the original upstream doesn't help. There is a Python 3 fork which didn't help. Juggling between json.load and json.loads didn't seem to help, and I didn't want to rip the guts out of facebook-sdk in fear of dropping compatibility with 2.x (although I cringed at times: using "file" as variable name? Really?). No wonder @kennethreitz rolled his own JSON parser in Requests. - facebook-sdk should probably be rewritten from scratch in Python 3 using Requests. Not that I'll ever do it.
- After so many years and botched revamps, the Facebook API is still terrible. For something reportedly so essential to "2.0" internet infrastructure, and with so many uber-smart people on their payroll, the whole thing still feels incredibly hackish.