08 December 2016

Best way to get locale info and localized strings on Python

One area where Windows often beats the Unix tradition is internationalisation. Traditional POSIX interfaces assume there will be One And Only One set of internationalised conventions (display language, date format etc) at runtime, and are mostly concerned with displaying data formatted for that One True Set. When you want something "international" on POSIX, you switch your locale with setlocale() and do your business. This approach unfortunately percolates in Unix-borne tools and languages, in this case Python. There is no "pure-Python" alternative to good ol' GetLocaleInfo, so there is no way to retrieve, say, the French name for January without switching the whole process locale with setlocale(). This is pretty insane and likely dangerous.
Most libraries hack their way around this by packing an arbitrary subset of i18n strings, or just go full-YOLO by switching process locale back and forth. It's a sad state of affairs.
However, there is a better way. The Unicode Consortium, in its infinite wisdom, maintains a big database of localisation metadata, the CLDR. You can either download the full set yourself and parse a bunch of XML files, or you can use the Babel library which basically does it for you.
$> pip install babel
...
$> python
>>> import babel
>>> locale = babel.Locale('fr')
>>> locale.months['format']['wide'][1]
'janvier'
Et voilĂ .

No comments: