Collation: Difference between revisions

incorporate new material from the 'MySQL' page
Changing Character Sets and Collations: update config example; remove duplicate
Line 16: Line 16:


===Changing Character Sets and Collations===
===Changing Character Sets and Collations===
Character sets and collations can be set from the server level right down to the column level, as well as for client-server communication. For example, the <code>[https://mariadb.com/kb/en/server-system-variables/#collation_connection collation_connection]</code> '''server system variable'''. There is another KB page for [https://mariadb.com/kb/en/setting-character-sets-and-collations/ Setting Character Sets and Collations]. Keep in mind that there a many possible collations per character set, but only one default per character set. So, if you specify a character set, but are silent on the collation, you will receive the default. Character sets and collations always cascade down, so a column without a specified collation will look for the table default, the table for the database, and the database for the server. It's therefore possible to have extremely fine-grained control over all the character sets and collations used in your data. Collation names always start with the character set they correspond to (e.g. latin1 charset and latin1_bin collation).
Character sets and collations can be set from the server level right down to the column level, as well as for client-server communication. For example, the <code>[https://mariadb.com/kb/en/server-system-variables/#collation_connection collation_connection]</code> '''server system variable'''. There is another KB page for [https://mariadb.com/kb/en/setting-character-sets-and-collations/ Setting Character Sets and Collations]. Keep in mind that there a many possible collations per character set, but only one default per character set. So, if you specify a character set, but are silent on the collation, you will receive the default. Character sets and collations always cascade down, so a column without a specified collation will look for the table default, the table for the database, and the database for the server. It's therefore possible to have extremely fine-grained control over all the character sets and collations used in your data. Collation names always start with the character set they correspond to, e.g. '''latin1''' charset and '''latin1'''_bin collation (the bin suffix stands for 'binary').


If your collation and character sets are all <code>latin1_swedish</code>, but you want them to be <code>UTF-8</code>, then you can set it in the configuration file, and restart the database.
<source lang="mysql">
<source lang="mysql">
show GLOBAL VARIABLES LIKE 'character_set_%';
show GLOBAL VARIABLES LIKE 'character_set_%';
show VARIABLES LIKE 'collation%';
show VARIABLES LIKE 'collation%';
</source>
</source>This is the mess that I wound up with somehow:
<pre>
# Variable_name, Value
'collation_connection', 'utf8_general_ci'
'collation_database', 'latin1_swedish_ci'
'collation_server', 'latin1_swedish_ci'
</pre>
 
This is the mess that I wound up with somehow:
<source lang="mysql">
<source lang="mysql">
SET @DATABASE_NAME = 'wiki_freephile';
SET @DATABASE_NAME = 'wiki_freephile';
Line 67: Line 57:
</source>
</source>


To set the mysql server to utf8 <ref>https://mariadb.com/kb/en/library/setting-character-sets-and-collations/ https://scottlinux.com/2017/03/04/mysql-mariadb-set-character-set-and-collation-to-utf8/</ref>, add the following to <code>/etc/mysql/my.cnf</code> (Debian/Ubuntu) or <code>/etc/my.cnf</code> (Fedora/Centos/RHEL) under the '''<nowiki>[mysqld]</nowiki>''' section:
To set the mysql server default character set to binary <ref>https://mariadb.com/kb/en/library/setting-character-sets-and-collations/ https://scottlinux.com/2017/03/04/mysql-mariadb-set-character-set-and-collation-to-utf8/</ref>, add the following to <code>/etc/mysql/my.cnf</code> (Debian/Ubuntu) or <code>/etc/my.cnf</code> (Fedora/Centos/RHEL) under the '''<nowiki>[mysqld]</nowiki>''' section:
<pre>
<pre>
character-set-server = utf8
# character set and collation
collation-server = utf8_general_ci
character-set-server = binary
skip-character-set-client-handshake
collation-server = BINARY
#  ignore client side character set value sent during handshake
skip-character-set-client-handshake = 1
</pre>
</pre>
Then restart mysql/mariadb:
Then restart mysql/mariadb:
Line 95: Line 87:




== Understand Encoding and Collation ==
==Understand Encoding and Collation==
http://kunststube.net/encoding/
http://kunststube.net/encoding/




== 2016 problems with MediaWiki ==
==2016 problems with MediaWiki==


Trying to Special:Import an XML dump from Special:Export of Commons, I get an "illegal mix of collations" error.
Trying to Special:Import an XML dump from Special:Export of Commons, I get an "illegal mix of collations" error.
Line 119: Line 111:


{| class="wikitable"  
{| class="wikitable"  
| Schema: || mediawiki
|Schema:||mediawiki
|-
|-
| Default collation: || latin1_swedish_ci
|Default collation:||latin1_swedish_ci
|-
|-
| Default characterset: || latin1
|Default characterset:||latin1
|}
|}


I also have <code>$wgDBmysql5 = true;</code> in LocalSettings.php <ref>https://www.mediawiki.org/wiki/Manual:$wgDBmysql5</ref>
I also have <code>$wgDBmysql5 = true;</code> in LocalSettings.php <ref>https://www.mediawiki.org/wiki/Manual:$wgDBmysql5</ref>


== Default MediaWiki SQL schema setup ==
==Default MediaWiki SQL schema setup==
When MediaWiki is first installed, [https://phabricator.wikimedia.org/source/mediawiki/browse/master/maintenance/tables.sql this is the SQL that is run].
When MediaWiki is first installed, [https://phabricator.wikimedia.org/source/mediawiki/browse/master/maintenance/tables.sql this is the SQL that is run].


Line 141: Line 133:
</source>
</source>


== DB engine Conversion ==
==DB engine Conversion==
The table type has nothing to do with Collation, but while updating my database, I wanted to switch the engine from MyISAM to InnoDB which is now the default.
The table type has nothing to do with Collation, but while updating my database, I wanted to switch the engine from MyISAM to InnoDB which is now the default.


Line 188: Line 180:


So then to change the charset/collation:
So then to change the charset/collation:
* UTF8 is a very bad choice for that.
 
* If you ever need to put the DB data into something Unicode, then be sure to use the utf8mb4 (Multi-byte 4) variety.
*UTF8 is a very bad choice for that.
*If you ever need to put the DB data into something Unicode, then be sure to use the utf8mb4 (Multi-byte 4) variety.


The *recommended* charset/collation is 'binary BINARY'.
The *recommended* charset/collation is 'binary BINARY'.
Line 213: Line 206:
</source>
</source>


== Table or column collation ==
==Table or column collation==
You can go further than setting the database, or table defaults; you can look at collation and character sets for individual columns.  Individual columns probably don't need to be (and shouldn't be) converted to a new character set or collation unless you are certain you know what you're doing.  MySQL has this [https://dev.mysql.com/doc/refman/5.5/en/charset-conversion.html reference on charset conversion]  
You can go further than setting the database, or table defaults; you can look at collation and character sets for individual columns.  Individual columns probably don't need to be (and shouldn't be) converted to a new character set or collation unless you are certain you know what you're doing.  MySQL has this [https://dev.mysql.com/doc/refman/5.5/en/charset-conversion.html reference on charset conversion]  


Line 228: Line 221:
{| class="wikitable"
{| class="wikitable"
|-
|-
| TABLE_SCHEMA || count || COLLATION_NAME  
|TABLE_SCHEMA||count||COLLATION_NAME
|-
|-
| wiki_demo   ||     0 || NULL            
|wiki_demo||0||NULL
|-
|-
| wiki_demo   ||   36 || latin1_bin        
|wiki_demo||36||latin1_bin
|-
|-
| wiki_demo   ||   37 || latin1_swedish_ci  
|wiki_demo||37||latin1_swedish_ci
|-
|-
| wiki_demo   ||     3 || utf8_general_ci  
|wiki_demo||3||utf8_general_ci
|}
|}


Line 253: Line 246:
[[Category:QualityBox]]
[[Category:QualityBox]]
[[Category:Wiki]]
[[Category:Wiki]]
<references />