This epic ticket of the day is brought to you by Joe Hopkinson.
#7940: Default charset should be utf8mb4
------------------------------------------------------------------------
The RFC for UTF-8 states, AND I QUOTE:
> In UTF-8, characters from the U+0000..U+10FFFF range (the UTF-16
accessible range) are encoded using sequences of 1 to 4 octets.
What's that? You don't believe me?! Well, you can read it for yourself
here!
What is an octet, you ask? It's a unit of digital information in computing
and telecommunications that consists of eight bits. (Hence, __oct__et.)
"So what?", said the neck bearded MySQL developer dressed as Neo from the
Matrix, as he smuggly quaffed a Surge and settled down to play Virtua
Fighter 4 on his dusty PS2.
So, if you recall from your Pre-Intro to Programming, 8 bits = 1 byte.
Thus, the RFC states that the storage maximum storage requirements for a
multibyte character must be 4 bytes, as required.
I know that RFCs are more of GUIDELINE, right? It's not like they could be
considered a standard or anything! It's not like there should be an
implicit contract when an implementor decides to use a label like "UTF-8",
right?
Because of you, we have to strip our reader's carefully crafted emojii.
Because of you, our search term data will never be exact. Because of you,
we have to spend COUNTLESS HOURS altering every table that we have (which
is a lot, by the way) to make sure that we can support a standard that was
written in 2003!
A cursory search shows that shortly after 2003, MySQL release quality
started to tank. I can only assume that was because of you.
Jerk.
* The default charset should be utf8mb4.
* Alter and test critical business processes.
* Change OrderedFunctionSet to generate the appropriate tables.
* Generate ptosc or propagator scripts to update everything else, as needed.
* Curse the MySQL developer who caused this.
Justin Swanhart Says:
For technical performance reasons the default should be latin1. Utf8mb4 uses four bytes per character in sorting and grouping, regardless of the character. If you want a default of utf8mb4, then create your database with that character set.