5,404 bytes added
, 10:50, 27 October 2017
Trying to Special:Import an XML dump from Special:Export of Commons, I get an "illegal mix of collations" error.
<blockquote>
Import failed: A database error has occurred. Did you forget to run maintenance/update.php after upgrading? See: https://www.mediawiki.org/wiki/Manual:Upgrading#Run_the_update_script Query: SELECT page_id,page_len,page_is_redirect,page_latest,page_content_model FROM `page` WHERE page_namespace = '6' AND page_title = 'Fantomo_ru?a.svg' LIMIT 1 Function: LinkCache::addLinkObj Error: 1267 Illegal mix of collations (latin1_bin,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '=' (localhost)
</blockquote>
----
This is a mediawiki that is "current" but was first established a long time ago (2004).
MediaWiki 1.27.1 (64f7660) 2016-11-25T23:52:22
PHP 5.5.9-1ubuntu4.22 (apache2handler)
MySQL 5.5.57-0ubuntu0.14.04.1
ICU 52.1
Elasticsearch 2.4.4
Lua 5.1.5
{| class="wikitable"
| Schema: || mediawiki
|-
| Default collation: || latin1_swedish_ci
|-
| Default characterset: || latin1
|}
I also have <code>$wgDBmysql5 = true;</code> in LocalSettings.php <ref>https://www.mediawiki.org/wiki/Manual:$wgDBmysql5</ref>
== Default MediaWiki SQL schema setup ==
When MediaWiki is first installed, [https://phabricator.wikimedia.org/source/mediawiki/browse/master/maintenance/tables.sql this is the SQL that is run].
The import failed, and reveals that my database has a problem handling UTF8 characters.
MediaWiki (the application) uses UTF8 throughout. What your database and data use is another story. Since my database was setup a long time ago, I've got old defaults, and a mixture of engines.
You can see what are the defaults for your databases with
<source lang="bash">
mysql -e 'SELECT * FROM INFORMATION_SCHEMA.SCHEMATA;'
</source>
== DB engine Conversion ==
The table type has nothing to do with Collation, but while updating my database, I wanted to switch the engine from MyISAM to InnoDB which is now the default.
This SQL worked flawlessly to switch any MyISAM tables to INNODB; without touching the *searchindex table(s) because that uses MySQL fulltext search which doesn't come into INNODB until MySQL v. 5.6
<source lang="mysql">
/* Find and convert your MediaWiki tables to use INNODB */
SET @DATABASE_NAME = 'mediawiki';
SELECT CONCAT('ALTER TABLE `', table_name, '` ENGINE=InnoDB;') AS sql_statements
FROM information_schema.tables AS tb
WHERE table_schema = @DATABASE_NAME
AND `ENGINE` = 'MyISAM'
AND `TABLE_TYPE` = 'BASE TABLE'
AND `tb`.table_name NOT REGEXP '.*searchindex'
ORDER BY table_name DESC;
/* Then copy the output and execute that. E.g. : */
ALTER TABLE `watchlist` ENGINE=InnoDB;
ALTER TABLE `user_newtalk` ENGINE=InnoDB;
ALTER TABLE `user` ENGINE=InnoDB;
ALTER TABLE `transcache` ENGINE=InnoDB;
ALTER TABLE `trackbacks` ENGINE=InnoDB;
ALTER TABLE `text` ENGINE=InnoDB;
ALTER TABLE `site_stats` ENGINE=InnoDB;
ALTER TABLE `recentchanges` ENGINE=InnoDB;
ALTER TABLE `querycache` ENGINE=InnoDB;
ALTER TABLE `oldimage` ENGINE=InnoDB;
ALTER TABLE `objectcache` ENGINE=InnoDB;
ALTER TABLE `math` ENGINE=InnoDB;
ALTER TABLE `linkscc` ENGINE=InnoDB;
ALTER TABLE `links` ENGINE=InnoDB;
ALTER TABLE `ipblocks_old` ENGINE=InnoDB;
ALTER TABLE `interwiki` ENGINE=InnoDB;
ALTER TABLE `imagelinks` ENGINE=InnoDB;
ALTER TABLE `image` ENGINE=InnoDB;
ALTER TABLE `cur` ENGINE=InnoDB;
ALTER TABLE `categorylinks` ENGINE=InnoDB;
ALTER TABLE `brokenlinks` ENGINE=InnoDB;
ALTER TABLE `blobs` ENGINE=InnoDB;
ALTER TABLE `archive` ENGINE=InnoDB;
</source>
The fact that I use [[ElasticSearch]] makes native full-text search irrelevant, but still I want my MediaWiki to be setup correctly.
So then to change the charset/collation:
* UTF8 is a very bad choice for that.
* If you ever need to put the DB data into something Unicode, then be sure to use the utf8mb4 (Multi-byte 4) variety.
The *recommended* charset/collation is 'binary BINARY'.
<source lang="mysql">
ALTER table <tablename> CHARACTER SET binary COLLATE BINARY;
</source>
== Table or column collation ==
You can go further than setting the database, or table defaults; you can look at collation and character sets for individual columns. Individual columns probably don't need to be (and shouldn't be) converted to a new character set or collation unless you are certain you know what you're doing. MySQL has this [https://dev.mysql.com/doc/refman/5.5/en/charset-conversion.html reference on charset conversion]
You can look at the distribution of column collations with
<source lang="mysql">
SET @DATABASE_NAME = 'mediawiki';
SELECT TABLE_SCHEMA, count(COLLATION_NAME) AS 'count', COLLATION_NAME
FROM information_schema.columns AS cols
WHERE table_schema = @DATABASE_NAME
GROUP BY COLLATION_NAME;
</source>
For a default demo database, you might see something like this
{| class="wikitable"
|-
| TABLE_SCHEMA || count || COLLATION_NAME
|-
| wiki_demo || 0 || NULL
|-
| wiki_demo || 36 || latin1_bin
|-
| wiki_demo || 37 || latin1_swedish_ci
|-
| wiki_demo || 3 || utf8_general_ci
|}
or the full detail at
<source lang="mysql">
SELECT
TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, COLLATION_NAME
FROM information_schema.columns AS cols
WHERE table_schema = "wiki_demo";
</source>
[[Category:MediaWiki]]
[[Category:MySQL]]
[[Category:Data]]
[[Category:Database]]
[[Category:QualityBox]]