Collation: Difference between revisions
incorporate new material from the 'MySQL' page |
→Changing Character Sets and Collations: update config example; remove duplicate |
||
| Line 16: | Line 16: | ||
===Changing Character Sets and Collations=== | ===Changing Character Sets and Collations=== | ||
Character sets and collations can be set from the server level right down to the column level, as well as for client-server communication. For example, the <code>[https://mariadb.com/kb/en/server-system-variables/#collation_connection collation_connection]</code> '''server system variable'''. There is another KB page for [https://mariadb.com/kb/en/setting-character-sets-and-collations/ Setting Character Sets and Collations]. Keep in mind that there a many possible collations per character set, but only one default per character set. So, if you specify a character set, but are silent on the collation, you will receive the default. Character sets and collations always cascade down, so a column without a specified collation will look for the table default, the table for the database, and the database for the server. It's therefore possible to have extremely fine-grained control over all the character sets and collations used in your data. Collation names always start with the character set they correspond to | Character sets and collations can be set from the server level right down to the column level, as well as for client-server communication. For example, the <code>[https://mariadb.com/kb/en/server-system-variables/#collation_connection collation_connection]</code> '''server system variable'''. There is another KB page for [https://mariadb.com/kb/en/setting-character-sets-and-collations/ Setting Character Sets and Collations]. Keep in mind that there a many possible collations per character set, but only one default per character set. So, if you specify a character set, but are silent on the collation, you will receive the default. Character sets and collations always cascade down, so a column without a specified collation will look for the table default, the table for the database, and the database for the server. It's therefore possible to have extremely fine-grained control over all the character sets and collations used in your data. Collation names always start with the character set they correspond to, e.g. '''latin1''' charset and '''latin1'''_bin collation (the bin suffix stands for 'binary'). | ||
<source lang="mysql"> | <source lang="mysql"> | ||
show GLOBAL VARIABLES LIKE 'character_set_%'; | show GLOBAL VARIABLES LIKE 'character_set_%'; | ||
show VARIABLES LIKE 'collation%'; | show VARIABLES LIKE 'collation%'; | ||
</source> | </source>This is the mess that I wound up with somehow: | ||
This is the mess that I wound up with somehow: | |||
<source lang="mysql"> | <source lang="mysql"> | ||
SET @DATABASE_NAME = 'wiki_freephile'; | SET @DATABASE_NAME = 'wiki_freephile'; | ||
| Line 67: | Line 57: | ||
</source> | </source> | ||
To set the mysql server to | To set the mysql server default character set to binary <ref>https://mariadb.com/kb/en/library/setting-character-sets-and-collations/ https://scottlinux.com/2017/03/04/mysql-mariadb-set-character-set-and-collation-to-utf8/</ref>, add the following to <code>/etc/mysql/my.cnf</code> (Debian/Ubuntu) or <code>/etc/my.cnf</code> (Fedora/Centos/RHEL) under the '''<nowiki>[mysqld]</nowiki>''' section: | ||
<pre> | <pre> | ||
character-set-server = | # character set and collation | ||
collation-server = | character-set-server = binary | ||
skip-character-set-client-handshake | collation-server = BINARY | ||
# ignore client side character set value sent during handshake | |||
skip-character-set-client-handshake = 1 | |||
</pre> | </pre> | ||
Then restart mysql/mariadb: | Then restart mysql/mariadb: | ||
| Line 95: | Line 87: | ||
== Understand Encoding and Collation == | ==Understand Encoding and Collation== | ||
http://kunststube.net/encoding/ | http://kunststube.net/encoding/ | ||
== 2016 problems with MediaWiki == | ==2016 problems with MediaWiki== | ||
Trying to Special:Import an XML dump from Special:Export of Commons, I get an "illegal mix of collations" error. | Trying to Special:Import an XML dump from Special:Export of Commons, I get an "illegal mix of collations" error. | ||
| Line 119: | Line 111: | ||
{| class="wikitable" | {| class="wikitable" | ||
| Schema: || mediawiki | |Schema:||mediawiki | ||
|- | |- | ||
| Default collation: || latin1_swedish_ci | |Default collation:||latin1_swedish_ci | ||
|- | |- | ||
| Default characterset: || latin1 | |Default characterset:||latin1 | ||
|} | |} | ||
I also have <code>$wgDBmysql5 = true;</code> in LocalSettings.php <ref>https://www.mediawiki.org/wiki/Manual:$wgDBmysql5</ref> | I also have <code>$wgDBmysql5 = true;</code> in LocalSettings.php <ref>https://www.mediawiki.org/wiki/Manual:$wgDBmysql5</ref> | ||
== Default MediaWiki SQL schema setup == | ==Default MediaWiki SQL schema setup== | ||
When MediaWiki is first installed, [https://phabricator.wikimedia.org/source/mediawiki/browse/master/maintenance/tables.sql this is the SQL that is run]. | When MediaWiki is first installed, [https://phabricator.wikimedia.org/source/mediawiki/browse/master/maintenance/tables.sql this is the SQL that is run]. | ||
| Line 141: | Line 133: | ||
</source> | </source> | ||
== DB engine Conversion == | ==DB engine Conversion== | ||
The table type has nothing to do with Collation, but while updating my database, I wanted to switch the engine from MyISAM to InnoDB which is now the default. | The table type has nothing to do with Collation, but while updating my database, I wanted to switch the engine from MyISAM to InnoDB which is now the default. | ||
| Line 188: | Line 180: | ||
So then to change the charset/collation: | So then to change the charset/collation: | ||
* UTF8 is a very bad choice for that. | |||
* If you ever need to put the DB data into something Unicode, then be sure to use the utf8mb4 (Multi-byte 4) variety. | *UTF8 is a very bad choice for that. | ||
*If you ever need to put the DB data into something Unicode, then be sure to use the utf8mb4 (Multi-byte 4) variety. | |||
The *recommended* charset/collation is 'binary BINARY'. | The *recommended* charset/collation is 'binary BINARY'. | ||
| Line 213: | Line 206: | ||
</source> | </source> | ||
== Table or column collation == | ==Table or column collation== | ||
You can go further than setting the database, or table defaults; you can look at collation and character sets for individual columns. Individual columns probably don't need to be (and shouldn't be) converted to a new character set or collation unless you are certain you know what you're doing. MySQL has this [https://dev.mysql.com/doc/refman/5.5/en/charset-conversion.html reference on charset conversion] | You can go further than setting the database, or table defaults; you can look at collation and character sets for individual columns. Individual columns probably don't need to be (and shouldn't be) converted to a new character set or collation unless you are certain you know what you're doing. MySQL has this [https://dev.mysql.com/doc/refman/5.5/en/charset-conversion.html reference on charset conversion] | ||
| Line 228: | Line 221: | ||
{| class="wikitable" | {| class="wikitable" | ||
|- | |- | ||
| TABLE_SCHEMA || count || COLLATION_NAME | |TABLE_SCHEMA||count||COLLATION_NAME | ||
|- | |- | ||
| wiki_demo | |wiki_demo||0||NULL | ||
|- | |- | ||
| wiki_demo | |wiki_demo||36||latin1_bin | ||
|- | |- | ||
| wiki_demo | |wiki_demo||37||latin1_swedish_ci | ||
|- | |- | ||
| wiki_demo | |wiki_demo||3||utf8_general_ci | ||
|} | |} | ||
| Line 253: | Line 246: | ||
[[Category:QualityBox]] | [[Category:QualityBox]] | ||
[[Category:Wiki]] | [[Category:Wiki]] | ||
<references /> | |||