Open main menu

Changes

5,404 bytes added ,  11:50, 27 October 2017
document how to inspect and change collations in your MediaWiki
Trying to Special:Import an XML dump from Special:Export of Commons, I get an "illegal mix of collations" error.

<blockquote>
Import failed: A database error has occurred. Did you forget to run maintenance/update.php after upgrading? See: https://www.mediawiki.org/wiki/Manual:Upgrading#Run_the_update_script Query: SELECT page_id,page_len,page_is_redirect,page_latest,page_content_model FROM `page` WHERE page_namespace = '6' AND page_title = 'Fantomo_ru?a.svg' LIMIT 1 Function: LinkCache::addLinkObj Error: 1267 Illegal mix of collations (latin1_bin,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '=' (localhost)
</blockquote>

----

This is a mediawiki that is "current" but was first established a long time ago (2004).

MediaWiki 1.27.1 (64f7660) 2016-11-25T23:52:22
PHP 5.5.9-1ubuntu4.22 (apache2handler)
MySQL 5.5.57-0ubuntu0.14.04.1
ICU 52.1
Elasticsearch 2.4.4
Lua 5.1.5

{| class="wikitable"
| Schema: || mediawiki
|-
| Default collation: || latin1_swedish_ci
|-
| Default characterset: || latin1
|}

I also have <code>$wgDBmysql5 = true;</code> in LocalSettings.php <ref>https://www.mediawiki.org/wiki/Manual:$wgDBmysql5</ref>

== Default MediaWiki SQL schema setup ==
When MediaWiki is first installed, [https://phabricator.wikimedia.org/source/mediawiki/browse/master/maintenance/tables.sql this is the SQL that is run].

The import failed, and reveals that my database has a problem handling UTF8 characters.

MediaWiki (the application) uses UTF8 throughout. What your database and data use is another story. Since my database was setup a long time ago, I've got old defaults, and a mixture of engines.

You can see what are the defaults for your databases with

<source lang="bash">
mysql -e 'SELECT * FROM INFORMATION_SCHEMA.SCHEMATA;'
</source>

== DB engine Conversion ==
The table type has nothing to do with Collation, but while updating my database, I wanted to switch the engine from MyISAM to InnoDB which is now the default.

This SQL worked flawlessly to switch any MyISAM tables to INNODB; without touching the *searchindex table(s) because that uses MySQL fulltext search which doesn't come into INNODB until MySQL v. 5.6
<source lang="mysql">
/* Find and convert your MediaWiki tables to use INNODB */

SET @DATABASE_NAME = 'mediawiki';

SELECT CONCAT('ALTER TABLE `', table_name, '` ENGINE=InnoDB;') AS sql_statements
FROM information_schema.tables AS tb
WHERE table_schema = @DATABASE_NAME
AND `ENGINE` = 'MyISAM'
AND `TABLE_TYPE` = 'BASE TABLE'
AND `tb`.table_name NOT REGEXP '.*searchindex'
ORDER BY table_name DESC;

/* Then copy the output and execute that. E.g. : */
ALTER TABLE `watchlist` ENGINE=InnoDB;
ALTER TABLE `user_newtalk` ENGINE=InnoDB;
ALTER TABLE `user` ENGINE=InnoDB;
ALTER TABLE `transcache` ENGINE=InnoDB;
ALTER TABLE `trackbacks` ENGINE=InnoDB;
ALTER TABLE `text` ENGINE=InnoDB;
ALTER TABLE `site_stats` ENGINE=InnoDB;
ALTER TABLE `recentchanges` ENGINE=InnoDB;
ALTER TABLE `querycache` ENGINE=InnoDB;
ALTER TABLE `oldimage` ENGINE=InnoDB;
ALTER TABLE `objectcache` ENGINE=InnoDB;
ALTER TABLE `math` ENGINE=InnoDB;
ALTER TABLE `linkscc` ENGINE=InnoDB;
ALTER TABLE `links` ENGINE=InnoDB;
ALTER TABLE `ipblocks_old` ENGINE=InnoDB;
ALTER TABLE `interwiki` ENGINE=InnoDB;
ALTER TABLE `imagelinks` ENGINE=InnoDB;
ALTER TABLE `image` ENGINE=InnoDB;
ALTER TABLE `cur` ENGINE=InnoDB;
ALTER TABLE `categorylinks` ENGINE=InnoDB;
ALTER TABLE `brokenlinks` ENGINE=InnoDB;
ALTER TABLE `blobs` ENGINE=InnoDB;
ALTER TABLE `archive` ENGINE=InnoDB;
</source>


The fact that I use [[ElasticSearch]] makes native full-text search irrelevant, but still I want my MediaWiki to be setup correctly.

So then to change the charset/collation:
* UTF8 is a very bad choice for that.
* If you ever need to put the DB data into something Unicode, then be sure to use the utf8mb4 (Multi-byte 4) variety.

The *recommended* charset/collation is 'binary BINARY'.
<source lang="mysql">
ALTER table <tablename> CHARACTER SET binary COLLATE BINARY;
</source>

== Table or column collation ==
You can go further than setting the database, or table defaults; you can look at collation and character sets for individual columns. Individual columns probably don't need to be (and shouldn't be) converted to a new character set or collation unless you are certain you know what you're doing. MySQL has this [https://dev.mysql.com/doc/refman/5.5/en/charset-conversion.html reference on charset conversion]

You can look at the distribution of column collations with
<source lang="mysql">
SET @DATABASE_NAME = 'mediawiki';
SELECT TABLE_SCHEMA, count(COLLATION_NAME) AS 'count', COLLATION_NAME
FROM information_schema.columns AS cols
WHERE table_schema = @DATABASE_NAME
GROUP BY COLLATION_NAME;
</source>

For a default demo database, you might see something like this
{| class="wikitable"
|-
| TABLE_SCHEMA || count || COLLATION_NAME
|-
| wiki_demo || 0 || NULL
|-
| wiki_demo || 36 || latin1_bin
|-
| wiki_demo || 37 || latin1_swedish_ci
|-
| wiki_demo || 3 || utf8_general_ci
|}

or the full detail at
<source lang="mysql">
SELECT
TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, COLLATION_NAME
FROM information_schema.columns AS cols
WHERE table_schema = "wiki_demo";
</source>

[[Category:MediaWiki]]
[[Category:MySQL]]
[[Category:Data]]
[[Category:Database]]
[[Category:QualityBox]]
4,558

edits