SQL Server Collation
Collation in SQL Server is a predefined set of rules that determine how data is saved, accessed, and compared. In other words, it's a configuration setting that indicates how the database engine should handle character data. SQL Server has a vast number of collations for dealing with the language and regional differences that come with supporting users and applications worldwide. This article will discuss collations and show a few examples to deal with collations.
Where do I find collations?
SQL collation can be found at the server, database, and column levels. It should be noted that there is no need to be the same collation settings on the server, database, and column. We can also use certain collations by updating our queries. If the collation is not consistent, we will appreciate the necessity of defining the correct collation across our environment as there is a high risk of unforeseen errors.
What are the different types of collations in SQL Server?
SQL Server provides the below system function to get the full list of available collations:
Here are some collation name and their descriptions returned by the above function:
If we want to get any specific collations by language, SQL Server provides the like clause in the WHERE clause to filter the name. For example, we are looking for the Maori language supported collation. We can get this information using the below statement:
It returns the below collation name:
What are the different options in the collation name?
The collation name contains the following options. These options perform differently to deal with character data for sorting and searching operations:
KS: kana type-sensitive
SC: supplementary characters
UTF8: Encoding standard
Suppose we are using the case-sensitive option in SQL collation. The database engine will behave differently when the query operation is looking for "Andrew" or "andrew". If the query does not find the row with the first name "Andrew," the query will return no results. It is due to the collation's "CS-Case sensitive" option.
Collations can be specified at several levels of the database engine in SQL Server, but by default, every level inherits the parent level's collation settings. The following are a list of collation supported in SQL Server in descending order:
Let us discuss each in detail.
SQL Server instance level collation
The default collation for system databases and user databases is a SQL Server instance collation. It is set during the SQL Server start-up and consists of three parts: master, tempdb, and model. It supports all collations except Unicode-only collations. When it comes to textual data comparison, if a user's database collation differs from one of the SQL Server instance collations, a comparison between tables in that database and tempdb will result in an error. It is because the instance level default collation will always be used by tempdb.
Suppose the temp table is used in any SQL statement or function inside a user database that compares text-based columns using WHERE clause or JOIN conditions, and one of the user databases is restored with a collation other than tempdb. In that case, the query will return an error because the tables do not have the same collation.
We can use the below query to get the SQL Server instance level collation:
It shows the below output:
Alternatively, we can also get the collation in SQL Server Management Studio by right-clicking the SQL instance, selecting "Properties," and checking the "General" tab. This collation is chosen by default when SQL Server is installed.
SQL Server database collation
It is inherited from SQL Server instance collation, which is used for database restoration. When we create a backup, the collation is replicated into the backup. It is the default for all string columns, temporary objects, variable names, and other strings, as this collation is used for all metadata within the database. The default server collation is used if we have not specified when the database is created.
We can use the below SQL statement to get the current database collation:
Alternatively, we can also get the database collation in SSMS by right-clicking the SQL database and go to the "Properties" that shows the below screen:
We can also use the CREATE DATABASE statement to create a database with collation:
Here javatpoint is the database name, and SQL_Latin1_General_CP1_CI_AS is the collation type. If we want to change the user database collation, we can use an ALTER DATABASE statement:
NOTE: We cannot change the system database collation until the change of collation for the server. In addition, altering collation at the database level does not affect the collation at the column or expression level.
The SQL Server will not change the collation of the existing user-defined table if a user changes the default collation of a database. However, new tables will be distinguished by a different database collation. But, if the database contains tables with different collations, issues may occur while attempting to join them. If we want a new collation to existing database tables, we can set a new collation on TEXT columns of tables.
SQL Server column level collation
The database specified collation is used as the default collation for SQL Server user-defined columns. Existing columns will keep their original collation if the database collation is changed, but new columns will be generated with the current database collation.
We can use the below syntax to change the collation of the column:
We can use the below command to find the collation for particular column:
Alternatively, we can also get the column collation in SSMS by navigating to the table, then the columns, and then right-clicking the particular columns to see the context menu. Now, choose the "Properties" option for the collation type. We will see collation details only when the collation the column is of a character data type. Otherwise, it shows an empty field.
SQL Server query level collation
We use the query level collation when a statement is run, and they have an impact on how the output is returned. Due to this, the ORDER BY sort results can now be more specific.
SQL Server does not support automated collation casting when we execute a query that compares two columns with different collations. We have also stated previously that the collation follows parent levels. If we have multiple UNICODE characters, we should use the COLLATE casting. We can use the COLLATE keyword to collate SQL by adding a collation name to the query. COLLATE is a column collation casting keyword that performs the collate operation. It can also be used with the input character string and column name.
We can use the below syntax for the query level collation:
Let us understand the use of the query level collation through example. Suppose we have two tables with similar collation and want to compare their columns using join or subquery. Data with the lower-case string exists in the chosen columns of those tables, and the selected columns of the tables stored data with the lower-case string. We have a requirement for comparing both columns in case-sensitive terms, but the collation of both columns is not the same. In such a case, when executing a query, query-level collation allows us to override database or column-level collation.
SQL Server provides supports for several collations. However, it cannot handle collation mismatches. SQL Server will throw an error with collation names when we try to compare fields with different collations. Here is the error message:
Cannot resolve the collation conflict between "Latin1_General_CI_AI" and "SQL_Latin1_General_CP1_CI_AS" in the like operation.
If we want to avoid these conflicts, we need to add a default collation setting in the SQL statement:
This issue can cause problems while programming for SQL Server. As explained above, we can fix this problem by using the COLLATE clause with a SQL Server column compare clause. When SQL Server is installed for the first time, and an incorrect collation option is discovered, it can cause problems with tempdb usage. When the SQL Server service is restarted, tempdb is created with the default server-level collation.
Is it possible to change the collation of the system database?
It is not possible to change the collation of system databases. We will get an error message if we try to modify the collation of the system databases, such as master, model, msdb, or tempdb.
Here is the error message:
Cannot alter the database 'master' because it is a system database.
This article will explain the complete overview of the collation concept in SQL Server. We also learned about the various collation options, how to find collation details for any SQL instance, database, or column, several examples of using collation options in SQL queries, how to change collations at the instance, database, and column levels, how to change collations of system databases, and how to resolve a collation conflict.