Open main menu

Changes

incorporate new material from the 'MySQL' page
Note: When dealing with MariaDB, they have this [https://mariadb.com/kb/en/character-set-and-collation-overview/ KB Overview]. For MySQL, there is a dedicated [https://forums.mysql.com/list.php?103 forum for Character Sets, Collation, Unicode] Collation is a big word that we don't use everyday. It helps to remember that it basically means 'sorting'. Throwback: when printing documents in the office was a big deal (before the Internet), you'd have to pay extra to get huge sorting trays attached to the printer so that you could print '20 copies of this 5-page document' and each set of 5 pages would be printed into it's own tray. Now collating copiers are the norm and they do it without the massive tray finishers. == Reference =OS-dependent defaults===MariaDB is different in Ubuntu vs. Debian<ref>https://mariadb.com/kb/en/library/differences-in-mariadb-in-debian-and-ubuntu/</ref> (and CentOS / Rocky Linux seem more like Ubuntu than Debian). Debian does the 'right thing' and sets defaults to be UTF-8 compatible, whereas others are still lagging with defaults of <code>latin1</code> for character set (and <code>latin1_swedish_ci</code> collation)!! Debian also enables the <code>unix_socket</code> authentication plugin by default for MariaDB <= 10.4.This allows passwordless login. See the reference for more details. ===Best practice==={{Ambox|text=Get into the habit of specifying CHARACTER SET and COLLATION on all connections and CREATE TABLEs. MySQL and MariaDB are gradually changing from latin1_swedish_ci to utf8mb4_0900_ai_ci.  "900" is probably not the last Unicode standard. By explicitly specifying the charset and collation, you maintain control and consistency, even if it is an out-dated pair.}} ===Changing Character Sets and Collations===Character sets and collations can be set from the server level right down to the column level, as well as for client-server communication. For example, the <code>[https://mariadb.com/kb/en/server-system-variables/#collation_connection collation_connection]</code> '''server system variable'''. There is another KB page for [https://mariadb.com/kb/en/setting-character-sets-and-collations/ Setting Character Sets and Collations]. Keep in mind that there a many possible collations per character set, but only one default per character set. So, if you specify a character set, but are silent on the collation, you will receive the default. Character sets and collations always cascade down, so a column without a specified collation will look for the table default, the table for the database, and the database for the server. It's therefore possible to have extremely fine-grained control over all the character sets and collations used in your data. Collation names always start with the character set they correspond to (e.g. latin1 charset and latin1_bin collation).  If your collation and character sets are all <code>latin1_swedish</code>, but you want them to be <code>UTF-8</code>, then you can set it in the configuration file, and restart the database.<source lang="mysql">show GLOBAL VARIABLES LIKE 'character_set_%';show VARIABLES LIKE 'collation%';</source><pre># Variable_name, Value'collation_connection', 'utf8_general_ci''collation_database', 'latin1_swedish_ci''collation_server', 'latin1_swedish_ci'</pre> This is the mess that I wound up with somehow:<source lang="mysql">SET @DATABASE_NAME = 'wiki_freephile';SELECT TABLE_SCHEMA, COUNT(COLLATION_NAME) AS 'count', COLLATION_NAMEFROM information_schema.columns AS colsWHERE table_schema = @DATABASE_NAMEGROUP BY COLLATION_NAME;</source><pre>wiki_freephile, 0, wiki_freephile, 11, binarywiki_freephile, 73, latin1_binwiki_freephile, 69, latin1_swedish_ciwiki_freephile, 1, utf8_general_ci</pre>So, I fixed the defaults for each database:<source lang="mysql">ALTER DATABASE wiki_freephile CHARACTER SET = 'binary' COLLATE = 'BINARY';ALTER DATABASE wiki_meta CHARACTER SET = 'binary' COLLATE = 'BINARY';ALTER DATABASE wiki_wiki CHARACTER SET = 'binary' COLLATE = 'BINARY';</source> You can check the status of each table too:<source lang="mysql">SET @DATABASE_NAME = 'wiki_freephile';SELECT TABLE_SCHEMA, TABLE_NAME, TABLE_COLLATIONFROM information_schema.tablesWHERE TABLE_SCHEMA = @DATABASE_NAMEAND TABLE_COLLATION not in ('binary');</source> To set the mysql server to utf8 <ref>https://mariadb.com/kb/en/library/setting-character-sets-and-collations/ https://scottlinux.com/2017/03/04/mysql-mariadb-set-character-set-and-collation-to-utf8/</ref>, add the following to <code>/etc/mysql/my.cnf</code> (Debian/Ubuntu) or <code>/etc/my.cnf</code> (Fedora/Centos/RHEL) under the '''<nowiki>[mysqld]</nowiki>''' section:<pre>character-set-server = utf8collation-server = utf8_general_ciskip-character-set-client-handshake</pre>Then restart mysql/mariadb: *<code>service mariadb restart</code> for Debian/Ubuntu;*<code>/bin/systemctl restart mariadb.service</code> for Fedora/Centos/RHEL <br /> ===Changing defaults===Depending on what version of MySQL or MariaDB you are using, it can be difficult at best to change the default collation '''for a given character set'''. <ref>https://dba.stackexchange.com/questions/239975/change-default-collation-for-character-set-utf8mb4-to-utf8mb4-unicode-ci</ref> The default relationship between character set and the collation for it will affect you when you use 'CREATE' or 'ALTER' with a 'CHARACTER SET' clause and the collation clause is left out. The admonition is to always specify both the character set and the collation in Connections as well as CREATE statements.  Not to confuse things, but the default server-wide character set changed in MariaDB >= 10.6. The <code>character_set_server</code> changed from <code>utf8</code> to <code>utf8mb3</code> <ref>https://mariadb.com/kb/en/server-system-variables/#character_set_server</ref>  Show the default collation '''for a given character set''': <code>SELECT * FROM information_schema.COLLATIONS WHERE CHARACTER_SET_NAME = 'utf8mb4' ;</code> On my CentOS server, the value is <code>utf8mb4_general_ci</code> and the same for Rocky Linux 8.9 For <code>CHARACTER_SET_NAME = 'binary'</code>, the collation is <code>binary</code> as no other collation is available, nor would make sense.  == Understand Encoding and Collation ==
http://kunststube.net/encoding/
== 2016 problems with MediaWiki ==
Trying to Special:Import an XML dump from Special:Export of Commons, I get an "illegal mix of collations" error.