MySQL Character Set
A character set in MySQL is a set of characters, encodings, and symbols that are legal in a string. This article explains how we can get all character sets in MySQL, how we can configure proper character sets for client connections, and how we can convert strings between multiple character sets.
Let us understand it with the example of an imaginary character set. Suppose we have alphabets from A to B, or a to b. Next, we will assign a number for each letter: A = 0, B = 1. Here, the letter A or B is a symbol, and the number 0 or 1 is the encoding. We called the combination of these letters (A to B, or a to b) and their associated encodings is a character set.
Again, if we want to make comparisons between two string values, for example, A and B. The simplest way for doing this using their encodings that are 0 for A and 1 for B. From the encodings, it is clear that 0 is less than 1 so that we can say A is less than B.
MySQL supports various character sets, and each character set always uses at least one collation. A collation defines the set of rules to compare the characters in a character set. The character sets allow us to store any character in a string. We can use the below statement to see all character sets available in MySQL database server:
It will return the following output:
By default, the above statement produces all available character sets in MySQL. If we want to get the names of any particular character set family, use an optional LIKE or WHERE clause that returns the matched character set names. The below statement displays some of the Unicode character sets based on UTF (Unicode Transformation Format):
We can see the output as follows:
In this image, the values in the Maxlen column represents the number of bytes that holds a character in a character set. Every character sets in MySQL either contain a single-byte character such as latin1, latin2, cp850, or multi-byte characters.
The latin1 is a default character set used in the MySQL. If we store characters or symbols from various languages in one column, we will use Unicode character sets such as utf8 or ucs2.
MySQL LENGTH() function returns the string's length in bytes, and the CHAR_LENGTH() function is used to find the string's length in characters. If we use the LENGTH() function to calculate the string's length that contains the multi-bytes character, we will get the result higher than the result of the CHAR_LENGTH(). The following example explains it more clearly that uses the ucs2 character set.
We can see the output as follows, where the LENGTH function returns the result greater than the result of the CHAR_LENGTH():
The CONVERT function is used to convert a string into a particular character set. The above example converts the MySQL Character Set string into the ucs2 character set. Since the character set ucs2 contains two-byte characters, the length of @str in bytes is greater than its length in characters.
Again, we will see another example. It is to note that many character sets contain multi-byte characters, but their strings contain only single-byte characters such as utf8, as shown in the below statements:
We can see the output as follows:
However, if the character set utf8 string contains any special character, its length in bytes is different. See the below example:
We will get the output as below:
Character Set Conversion
MySQL provides mainly two functions to perform the conversion of string between different character sets. These functions are CONVERT() and CAST(). We have already seen the working of the CONVERT function in the above examples. Here, we will see the working of CAST function only that is similar to the CONVERT function.
The following is the basic syntax of CONVERT() function:
The following is the basic syntax of CAST() function:
See the below example that uses the CAST function to make conversion between different character set:
It will return the output as follows:
Configure Character Sets for Client Connections
When the database application exchanges data with a MySQL server, it uses the default character set named latin1. If the application stores Unicode strings in character set UTF8, then the default character set latin1 in the database application may not be sufficient. Therefore, it is required to specify a proper character set when our application connects to the MySQL database server.
We can configure a character set for client connections using one of the following ways:
It is to ensure that which way we are using, the character set used by our database application should be matched with the character set stored in the MySQL server.