Unicode support in MySQL is ... ๐Ÿ‘Ž

Posted by

For the last few days, I've been getting some strange error reports from the War Worlds server. Messages like this:

java.sql.SQLException: Incorrect string value: '\xF0\x9F\x98\xB8. ...' for column 'message' at row 1
   at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1078)
   at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4120)
   at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4052)
   at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2503)
   at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2664)
   at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2815)
   at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2155)
   at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2458)
   at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2375)
   at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2359)
   at com.jolbox.bonecp.PreparedStatementHandle.executeUpdate(PreparedStatementHandle.java:203)
   at au.com.codeka.warworlds.server.data.SqlStmt.update(SqlStmt.java:117)
   at au.com.codeka.warworlds.server.ctrl.ChatController.postMessage(ChatController.java:120)
   . . .

Now, that string which MySQL complains is "incorrect" is actually the Unicode codepoint U+1F638 GRINNING CAT FACE WITH SMILING EYES, aka ๐Ÿ˜ธ -- a perfectly valid Emoji character. Why was MySQL rejecting it? All my columns are defined to accept UTF-8, so there should not be a problem, right?

When is UTF-8 not UTF-8?

When it's used in MySQL, apparently.

For reasons that completely escape me, MySQL 5.x limits UTF-8 strings to U+FFFF and smaller. That is, the "BMP". Why they call this encoding "UTF-8" is beyond me, it most definitely is not UTF-8.

The trick, apparently, is to use a slightly different encoding which MySQL calls "utf8mb4" which supports up to 4-byte UTF-8 characters.

So the "fix" was simple, just run:

ALTER TABLE chat_messages
   MODIFY message TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,

And so on, on basically every column in the database which could possibly include characters outside the BMP. But that's not enough! You also need to tell the server to use "utf8mb4" internally as well, by including the following line in your my.cnf:

[mysqld]
โ€‹character-set-server            = utf8mb4

Now presumably there is some drawback from doing this, otherwise "utf8mb4" would be the default (right?) but I'll be damned if I can figure out what the drawback is. I guess will just moniter things and see where it takes us. But as of now, War Worlds support Emoji emoticons in chat messages, yay!

 

blog comments powered by Disqus