07 June 2009

The pains of backward-compatibility

Problem:
  • you have several ZIP and TAR archives
  • you have to replace ONE FILE in each of them
  • you only have Python 2.5

From what I see, the only solution in this situation is to completely branch off the two cases, because the relevant Python modules (tarfile and zipfile) have such a completely different interface.

None of them can simply replace or delete one single file, so you have to unpack the entire archive, edit the file, repack. Inefficient, but consistent approach.
Then ZipFile object will read bytes, whereas TarFile objects will extract files.
Finally, ZipFile doesn't feature a method to extract all files in one go, like TarFile has. To be honest, zipfile sucks in pre-2.6 VMs.
This means that what you can really do (more or less) in the same way (with some essential metaprogramming) is opening/closing archives, listing the contained files, and adding new files.

Things are much, much better in 2.6 and 3.0, where both interfaces are almost the same, but if you are stuck with 2.5 (like me) then you'll have to do with inelegant solutions. And if you are reading this, maybe you'll waste less time.

(Memo to self: always, ALWAYS do the easiest thing that could possibly work, no matter how inelegant it is. Premature optimization really is the root of all evil.)

2 comments:

Dave Fried said...

You're using Python. It seems like the ZIP stuff is a strictly lower-level interface to the tar stuff. Why why don't you wrap the ZIP stuff in a class that mimics the Tar interface (or at least the subset you need), doing the file stuff under the hood? Then you can use the same code to work on both.

It's not ideal, but at least you're less likely to introduce bugs due to duplicated logic.

GiacomoL said...

That's a very good suggestion, I guess it would also make easier to change things once I move to 2.6 or 3.0.