Talk:Collation: Difference between revisions

No edit summary
 
(6 intermediate revisions by the same user not shown)
Line 1: Line 1:
This page is all about Collation in MediaWiki - and should probably become a new article to supplement the general topic page.
== See Also== 
https://www.mediawiki.org/wiki/Manual:$wgCategoryCollation and https://github.com/wikimedia/mediawiki/blob/master/maintenance/updateCollation.php
Other maintenance scripts such as https://github.com/wikimedia/mediawiki/blob/master/maintenance/uppercaseTitlesForUnicodeTransition.php or other extensions etc. may be helpful if you need to "fix" things.
In MWStake General, I [https://matrix.to/#/!NGZmJSwAAwbGRxhWwH:matrix.org/$HObrGFUPO_XBbhS3STTjQ5R-i-uV3cRjVIpPuAhP0i4?via=matrix.org&via=marijn.it&via=converser.eu said]
In MWStake General, I [https://matrix.to/#/!NGZmJSwAAwbGRxhWwH:matrix.org/$HObrGFUPO_XBbhS3STTjQ5R-i-uV3cRjVIpPuAhP0i4?via=matrix.org&via=marijn.it&via=converser.eu said]
Short answer:
Short answer:
Line 4: Line 12:
1. yes, you should change the <code>$wgDBTableOptions</code> (see below) to the new default.
1. yes, you should change the <code>$wgDBTableOptions</code> (see below) to the new default.
However, this setting is only used during new table creation, so it doesn't really affect your existing database(s) and the tables in them.
However, this setting is only used during new table creation, so it doesn't really affect your existing database(s) and the tables in them.
2. Do NOT <code>CONVERT TO</code> all your existing tables.  You have to be more precise with what columns you change - if you change anything at all. You really only convert if you have some encoding problem (aka Mojibake).
2. Do NOT <code>CONVERT TO</code> all your existing tables.  You have to be more precise with what columns you change - if you change anything at all. You really only convert if you have some encoding problem (aka [[wp:Mojibake|Mojibake]]).


Using
Using


<source lang="mysql">
<source lang="mysql">
SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
SHOW VARIABLES WHERE Variable_name LIKE 'character_set_%' OR Variable_name LIKE 'collation%';
</source>
</source>


Line 16: Line 24:
Here's a [https://stackoverflow.com/questions/54885178/whats-the-difference-between-utf8-unicode-ci-and-utf8mb4-0900-ai-ci Stack Overflow question + answer] that explains the different encodings (byte mappings) and collations (sorting rules) and why it _might_ matter to you. The newest collations only really matter if you're trying to chase the long tail of performance optimizations.
Here's a [https://stackoverflow.com/questions/54885178/whats-the-difference-between-utf8-unicode-ci-and-utf8mb4-0900-ai-ci Stack Overflow question + answer] that explains the different encodings (byte mappings) and collations (sorting rules) and why it _might_ matter to you. The newest collations only really matter if you're trying to chase the long tail of performance optimizations.


BTW, the answer is by Rick James (not the singer - the [[MySQL|database expert]])
BTW, the answer is by Rick James (not the singer - the [[MySQL#Experts|database expert]])


From WMF, I see [https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+blame/0adab4be323405345c3c63428747c12e3ad4aea2/modules/galera/templates/server.cnf.erb this puppet template for the galera configuration], updated on 2020-06-11 specifying
From WMF, I see [https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+blame/0adab4be323405345c3c63428747c12e3ad4aea2/modules/galera/templates/server.cnf.erb this puppet template for the galera configuration], updated on 2020-06-11 specifying
Line 58: Line 66:
* https://www.mediawiki.org/wiki/Manual:Installation_requirements#Database_server '''Seems like a 'bug' to have a minimum DB requirement for a DB that is out of support'''
* https://www.mediawiki.org/wiki/Manual:Installation_requirements#Database_server '''Seems like a 'bug' to have a minimum DB requirement for a DB that is out of support'''
* https://www.mediawiki.org/wiki/Manual:MysqlUpdater.php - a class called by the Installer that performs database updates.
* https://www.mediawiki.org/wiki/Manual:MysqlUpdater.php - a class called by the Installer that performs database updates.
== Conversion ==
If you need to do conversion, check out the [https://dev.mysql.com/doc/refman/8.3/en/charset-conversion.html MySQL docs v8.3] and [https://dev.mysql.com/doc/refman/5.7/en/charset-conversion.html MySQL docs v5.7]


== checkStorage.php ==
== checkStorage.php ==
What does the [https://doc.wikimedia.org/mediawiki-core/master/php/classCheckStorage.html CheckStorage] class ([https://www.mediawiki.org/wiki/Manual:CheckStorage.php Manual] [https://doc.wikimedia.org/mediawiki-core/master/php/checkStorage_8php_source.html source]) actually do? It is used in conjunction with the [[mw:Manual:External_storage|Manual:External_storage]] ([https://wikitech.wikimedia.org/wiki/External_storage Wikitech]) capability of MediaWiki and therefore is irrelevant here because we're not using External Storage. But, here's the output of running the script for information's sake:


<pre>
<pre>
Line 70: Line 82:
Local object statistics:
Local object statistics:
</pre>
</pre>


== Tables ==
== Tables ==
Return to "Collation" page.