Difference between revisions of "Collation"
(document how to inspect and change collations in your MediaWiki) |
m (added Category:Wiki using HotCat) |
||
Line 133: | Line 133: | ||
[[Category:Database]] | [[Category:Database]] | ||
[[Category:QualityBox]] | [[Category:QualityBox]] | ||
+ | [[Category:Wiki]] |
Revision as of 12:23, 1 November 2017
Trying to Special:Import an XML dump from Special:Export of Commons, I get an "illegal mix of collations" error.
Import failed: A database error has occurred. Did you forget to run maintenance/update.php after upgrading? See: https://www.mediawiki.org/wiki/Manual:Upgrading#Run_the_update_script Query: SELECT page_id,page_len,page_is_redirect,page_latest,page_content_model FROM `page` WHERE page_namespace = '6' AND page_title = 'Fantomo_ru?a.svg' LIMIT 1 Function: LinkCache::addLinkObj Error: 1267 Illegal mix of collations (latin1_bin,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '=' (localhost)
This is a mediawiki that is "current" but was first established a long time ago (2004).
MediaWiki 1.27.1 (64f7660) 2016-11-25T23:52:22
PHP 5.5.9-1ubuntu4.22 (apache2handler)
MySQL 5.5.57-0ubuntu0.14.04.1
ICU 52.1
Elasticsearch 2.4.4
Lua 5.1.5
Schema: | mediawiki |
Default collation: | latin1_swedish_ci |
Default characterset: | latin1 |
I also have $wgDBmysql5 = true;
in LocalSettings.php [1]
Default MediaWiki SQL schema setup[edit | edit source]
When MediaWiki is first installed, this is the SQL that is run.
The import failed, and reveals that my database has a problem handling UTF8 characters.
MediaWiki (the application) uses UTF8 throughout. What your database and data use is another story. Since my database was setup a long time ago, I've got old defaults, and a mixture of engines.
You can see what are the defaults for your databases with
mysql -e 'SELECT * FROM INFORMATION_SCHEMA.SCHEMATA;'
DB engine Conversion[edit | edit source]
The table type has nothing to do with Collation, but while updating my database, I wanted to switch the engine from MyISAM to InnoDB which is now the default.
This SQL worked flawlessly to switch any MyISAM tables to INNODB; without touching the *searchindex table(s) because that uses MySQL fulltext search which doesn't come into INNODB until MySQL v. 5.6
/* Find and convert your MediaWiki tables to use INNODB */
SET @DATABASE_NAME = 'mediawiki';
SELECT CONCAT('ALTER TABLE `', table_name, '` ENGINE=InnoDB;') AS sql_statements
FROM information_schema.tables AS tb
WHERE table_schema = @DATABASE_NAME
AND `ENGINE` = 'MyISAM'
AND `TABLE_TYPE` = 'BASE TABLE'
AND `tb`.table_name NOT REGEXP '.*searchindex'
ORDER BY table_name DESC;
/* Then copy the output and execute that. E.g. : */
ALTER TABLE `watchlist` ENGINE=InnoDB;
ALTER TABLE `user_newtalk` ENGINE=InnoDB;
ALTER TABLE `user` ENGINE=InnoDB;
ALTER TABLE `transcache` ENGINE=InnoDB;
ALTER TABLE `trackbacks` ENGINE=InnoDB;
ALTER TABLE `text` ENGINE=InnoDB;
ALTER TABLE `site_stats` ENGINE=InnoDB;
ALTER TABLE `recentchanges` ENGINE=InnoDB;
ALTER TABLE `querycache` ENGINE=InnoDB;
ALTER TABLE `oldimage` ENGINE=InnoDB;
ALTER TABLE `objectcache` ENGINE=InnoDB;
ALTER TABLE `math` ENGINE=InnoDB;
ALTER TABLE `linkscc` ENGINE=InnoDB;
ALTER TABLE `links` ENGINE=InnoDB;
ALTER TABLE `ipblocks_old` ENGINE=InnoDB;
ALTER TABLE `interwiki` ENGINE=InnoDB;
ALTER TABLE `imagelinks` ENGINE=InnoDB;
ALTER TABLE `image` ENGINE=InnoDB;
ALTER TABLE `cur` ENGINE=InnoDB;
ALTER TABLE `categorylinks` ENGINE=InnoDB;
ALTER TABLE `brokenlinks` ENGINE=InnoDB;
ALTER TABLE `blobs` ENGINE=InnoDB;
ALTER TABLE `archive` ENGINE=InnoDB;
The fact that I use Elasticsearch makes native full-text search irrelevant, but still I want my MediaWiki to be setup correctly.
So then to change the charset/collation:
- UTF8 is a very bad choice for that.
- If you ever need to put the DB data into something Unicode, then be sure to use the utf8mb4 (Multi-byte 4) variety.
The *recommended* charset/collation is 'binary BINARY'.
ALTER table <tablename> CHARACTER SET binary COLLATE BINARY;
Table or column collation[edit | edit source]
You can go further than setting the database, or table defaults; you can look at collation and character sets for individual columns. Individual columns probably don't need to be (and shouldn't be) converted to a new character set or collation unless you are certain you know what you're doing. MySQL has this reference on charset conversion
You can look at the distribution of column collations with
SET @DATABASE_NAME = 'mediawiki';
SELECT TABLE_SCHEMA, count(COLLATION_NAME) AS 'count', COLLATION_NAME
FROM information_schema.columns AS cols
WHERE table_schema = @DATABASE_NAME
GROUP BY COLLATION_NAME;
For a default demo database, you might see something like this
TABLE_SCHEMA | count | COLLATION_NAME |
wiki_demo | 0 | NULL |
wiki_demo | 36 | latin1_bin |
wiki_demo | 37 | latin1_swedish_ci |
wiki_demo | 3 | utf8_general_ci |
or the full detail at
SELECT
TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, COLLATION_NAME
FROM information_schema.columns AS cols
WHERE table_schema = "wiki_demo";