Open main menu

Changes

no edit summary
===Changing Character Sets and Collations===
Character sets and collations can be set from the server level right down to the column level, as well as for client-server communication. For example, the <code>[https://mariadb.com/kb/en/server-system-variables/#collation_connection collation_connection]</code> '''server system variable'''. There is another KB page for [https://mariadb.com/kb/en/setting-character-sets-and-collations/ Setting Character Sets and Collations]. Keep in mind that there a many possible collations per character set, but only one default per character set. So, if you specify a character set, but are silent on the collation, you will receive the default. Character sets and collations always cascade down, so a column without a specified collation will look for the table default, the table for the database, and the database for the server. It's therefore possible to have extremely fine-grained control over all the character sets and collations used in your data. Collation names always start with the character set they correspond to (, e.g. '''latin1 ''' charset and latin1_bin '''latin1'''_bin collation(the bin suffix stands for 'binary').
 
If your collation and character sets are all <code>latin1_swedish</code>, but you want them to be <code>UTF-8</code>, then you can set it in the configuration file, and restart the database.
<source lang="mysql">
show GLOBAL VARIABLES LIKE 'character_set_%';
show VARIABLES LIKE 'collation%';
</source><pre># Variable_name, Value'collation_connection', 'utf8_general_ci''collation_database', 'latin1_swedish_ci''collation_server', 'latin1_swedish_ci'</pre> This is the mess that I wound up with somehow:
<source lang="mysql">
SET @DATABASE_NAME = 'wiki_freephile';
wiki_freephile, 1, utf8_general_ci
</pre>
 Note that this query is counting the '''columns''' which have an undesirable collation. (Recall that encoding/collation can be set at the server, database, table, and column levels.)   So, I fixed start off by setting the defaults for each database:
<source lang="mysql">
ALTER DATABASE wiki_freephile CHARACTER SET = 'binary' COLLATE = 'BINARY';
</source>
You can check the status of each '''table ''' too:
<source lang="mysql">
SET @DATABASE_NAME = 'wiki_freephile';
</source>
Then, down in the section below on changing engines, there's an example SQL query for generating '''<code>ALTER TABLE</code>''' statements to fix the table defaults<ref>https://mariadb.com/kb/en/setting-character-sets-and-collations/#table-level</ref>. These statements would be in the form:<source lang="mysql">ALTER TABLE table_name CONVERT TO CHARACTER SET charset_name [COLLATE collation_name];</source> In my case, there are 90-100 columns that have old latin1 character sets.<source lang="mysql">SET @DATABASE_NAME = 'wiki_greg';SELECT TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, CHARACTER_SET_NAME, COLLATION_NAMEFROM information_schema.columns AS colsWHERE table_schema = @DATABASE_NAME AND COLLATION_NAME NOT IN ('binary');</source>We can use the information schema to list them all out. And you can also inspect each TABLE, with e.g. <code>SHOW CREATE TABLE brokenlinks \G</code>   To set the mysql server default character set to utf8 binary <ref>https://mariadb.com/kb/en/library/setting-character-sets-and-collations/ https://scottlinux.com/2017/03/04/mysql-mariadb-set-character-set-and-collation-to-utf8/</ref>, add the following to <code>/etc/mysql/my.cnf</code> (Debian/Ubuntu) or <code>/etc/my.cnf</code> (Fedora/Centos/RHEL) under the '''<nowiki>[mysqld]</nowiki>''' section:
<pre>
# character set and collationcharacter-set-server = utf8binarycollation-server = utf8_general_ciBINARY# ignore client side character set value sent during handshakeskip-character-set-client-handshake= 1
</pre>
Then restart mysql/mariadb:
== Understand Encoding and Collation ==
http://kunststube.net/encoding/
== 2016 problems with MediaWiki ==
Trying to Special:Import an XML dump from Special:Export of Commons, I get an "illegal mix of collations" error.
{| class="wikitable"
| Schema: || mediawiki
|-
| Default collation: || latin1_swedish_ci
|-
| Default characterset: || latin1
|}
I also have <code>$wgDBmysql5 = true;</code> in LocalSettings.php <ref>https://www.mediawiki.org/wiki/Manual:$wgDBmysql5</ref>
== Default MediaWiki SQL schema setup ==
When MediaWiki is first installed, [https://phabricator.wikimedia.org/source/mediawiki/browse/master/maintenance/tables.sql this is the SQL that is run].
</source>
== DB engine Conversion ==
The table type has nothing to do with Collation, but while updating my database, I wanted to switch the engine from MyISAM to InnoDB which is now the default.
So then to change the charset/collation:
* UTF8 is a very bad choice for that.
* If you ever need to put the DB data into something Unicode, then be sure to use the utf8mb4 (Multi-byte 4) variety.
*UTF8 is a very bad choice for that.*If you ever need to put the DB data into something Unicode, then be sure to use the utf8mb4 (Multi-byte 4) variety. The *'''recommended* ''' charset/collation is 'binary BINARY'.
<source lang="mysql">
ALTER table <tablename> CHARACTER SET = 'binary' COLLATE = 'BINARY';
</source>
== Table or column collation ==
You can go further than setting the database, or table defaults; you can look at collation and character sets for individual columns. Individual columns probably don't need to be (and shouldn't be) converted to a new character set or collation unless you are certain you know what you're doing. MySQL has this [https://dev.mysql.com/doc/refman/5.5/en/charset-conversion.html reference on charset conversion]
{| class="wikitable"
|-
| TABLE_SCHEMA || count || COLLATION_NAME
|-
| wiki_demo || 0 || NULL
|-
| wiki_demo || 36 || latin1_bin
|-
| wiki_demo || 37 || latin1_swedish_ci
|-
| wiki_demo || 3 || utf8_general_ci
|}
WHERE table_schema = "wiki_demo";
</source>
 
 
 
 
 
 
 
<references />
[[Category:MediaWiki]]