17 December 2012

Why I've never really liked the Facebook API

The other day, I got an email from Virgin Media stating that my connection had been "upgraded to 100Mb/s". I went to a bunch of speed-testing websites, and reported speeds were indeed much higher than in the past. I was tempted to brag about it on Facebook then I remembered that, last time I did something similar, I was humbled by a bunch of Dutch friends with "big pipes". I wondered what sort of speed they reported then, so I went to Facebook to search for that old status. And that's where my problems started.

The standard FB search UI failed to return anything even vaguely related, as usual. So I started googling for apps that would allow me to search my previous posts, and found a few which just wanted to gather all my personal data (on FB -- you don't say!). Then I found that you can actually request a complete download of all your data from FB (under Settings) and launched the request, but it looked like it would take a long time (for the record, I finally got it about 24 hours later). So I thought "hey, surely I can work with the FB API". How naive of me!

There is, in fact, a straightforward API call to get your statuses: /me/statuses. By default, it will return some 25 records, and links to further paginated results. Except pagination is ridiculously buggy: after the first 50 records, it will just return a blank page. If you try to use the limit parameter, it will return a maximum of 100 records per page, and again it will stop after the second page (i.e. max 200 results, which it's actually 199 because everybody knows "there are only two hard things in computer science"). Time-based parameters (until, since) didn't seem to work at all. Using wrappers rather than direct calls didn't seem to make any difference. Being very late, I gave up and went to sleep.

A day later, still incredulous and obviously fairly frustrated, I googled harder and finally found a relevant question on StackOverflow, which pointed to a Facebook bug about pagination. As the bug says, you can work around the problem by always using offset rather than relying on 'next' and 'previous' links returned in JSON responses. I verified and that's actually the case. By now, my export was available for download anyway. You can imagine how happy I am (not).

Lessons learnt from this whole debacle:

  • The unofficial facebook-sdk for python doesn't work with Python 3. There is an experimental fork with very limited testing (i.e. it passes 2to3 and that's it).
  • the json module in Python 3 Standard Library, as used by facebook-sdk, chokes on Facebook data. Don't even ask me how I found out. Trying with a more up-to-date version from the original upstream doesn't help. There is a Python 3 fork which didn't help. Juggling between json.load and json.loads didn't seem to help, and I didn't want to rip the guts out of facebook-sdk in fear of dropping compatibility with 2.x (although I cringed at times: using "file" as variable name? Really?). No wonder @kennethreitz rolled his own JSON parser in Requests.
  • facebook-sdk should probably be rewritten from scratch in Python 3 using Requests. Not that I'll ever do it.
  • After so many years and botched revamps, the Facebook API is still terrible. For something reportedly so essential to "2.0" internet infrastructure, and with so many uber-smart people on their payroll, the whole thing still feels incredibly hackish.


Giulio Piancastelli said...

A couple of additional, well, hints, as I'd call them, rather than lessons, that I got from your post:

1. Python 3 is still not widely adopted - too many holes in the field - and if you are going to build something that uses more than a small number of selected libraries you are going to face at least one potential show-stopper, and that's the availability of a certain tool in a Py3k compatible fashion.

1a. No one will ever do it - rewrite facebook-sdk from scratch in Python 3, I mean - not just you. I wonder whether frustration will kick in at some point and you will return to program in Python 2.x.

2. Your favorite social network of the day privileges here-and-now data rather than the historical record of what have been posted in a way that's not easy too work around - yeah, you had your Facebook posts exported but you still needed to search for the relevant entry, with your own tools. I believe Twitter has similar issues - I think that even a simple search is unable to go back farther than a certain time span. Tumblr probably suffers the same exporting issues that have been plaguing it since its birth. And so on and so forth. I don't think the blame is entirely on the technical side, but it's a shame nonetheless.

Giacomo Lacava said...

Yes, the ecosystem around Python 3 is still sub-optimal, and it will likely will be for another year or two. However, its benefits are huge, and as soon as popular high-productivity webdev frameworks start supporting it, the rest of the web ecosystem will probably catch up very quickly. For example, facebook-sdk is basically just a lib for the relevant django app; as soon as people start asking for the app to support Py3k, the library will have to follow suit. When that happens, I hope they just chuck out the existing codebase and rewrite it; after all, compatibility with 2.6/2.7 (as well as 3.2.x+) is relatively easy for such a small lib.

Anyway, I'll keep using my "3-first" approach ;)

I completely agree that social networks are even more ephemeral than the average website -- the "web" being incredibly ephemeral already on its own.