27 April 2008

MySQL character set ("charset") options for UTF8

While dabbling with PhPeace, I found out a few things about how MySQL handles charset options.

There are several levels at which you can specify a charset, from top to bottom: server instance, database/schema, table/column and, finally, connection.
Usually, the lowest option will override the previous one, so if you specify the encoding at connection level (with a query like "SET CHARSET 'utf8'; SET NAMES 'utf8' COLLATE 'utf8_general_ci';") you should be safe. However, if you want to fix things "once and for all", you can add these options to your /etc/mysql/my.cnf file (or adding a /etc/mysql/conf.d/UTF8 in Debian with these options):

[mysqld]
character-set-server=utf8
collation-server=utf8_general_ci
default-character-set=utf8
[client]
default-character-set=utf8

The *-server* options are instance-specific, while default-character-set is connection-specific. If you wonder what the "collation" stuff is, it's the method to use for sorting. I'm not sure if, as guideline, you should use utf8_unicode_ci rather than the default utf8_general_ci, but I didn't find out how to override the default option at connection level in the .cnf file, so I'll stick to utf8_general_ci to have a consistent environment.

No comments:

Post a Comment