Collation

From Freephile Wiki
Jump to navigation Jump to search

Reference[edit | edit source]

http://kunststube.net/encoding/


MediaWiki[edit | edit source]

Trying to Special:Import an XML dump from Special:Export of Commons, I get an "illegal mix of collations" error.

Import failed: A database error has occurred. Did you forget to run maintenance/update.php after upgrading? See: https://www.mediawiki.org/wiki/Manual:Upgrading#Run_the_update_script Query: SELECT page_id,page_len,page_is_redirect,page_latest,page_content_model FROM `page` WHERE page_namespace = '6' AND page_title = 'Fantomo_ru?a.svg' LIMIT 1 Function: LinkCache::addLinkObj Error: 1267 Illegal mix of collations (latin1_bin,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '=' (localhost)


This is a mediawiki that is "current" but was first established a long time ago (2004).

MediaWiki	1.27.1 (64f7660) 2016-11-25T23:52:22
PHP	5.5.9-1ubuntu4.22 (apache2handler)
MySQL	5.5.57-0ubuntu0.14.04.1
ICU	52.1
Elasticsearch	2.4.4
Lua	5.1.5
Schema: mediawiki
Default collation: latin1_swedish_ci
Default characterset: latin1

I also have $wgDBmysql5 = true; in LocalSettings.php [1]

Default MediaWiki SQL schema setup[edit | edit source]

When MediaWiki is first installed, this is the SQL that is run.

The import failed, and reveals that my database has a problem handling UTF8 characters.

MediaWiki (the application) uses UTF8 throughout. What your database and data use is another story. Since my database was setup a long time ago, I've got old defaults, and a mixture of engines.

You can see what are the defaults for your databases with

mysql -e 'SELECT * FROM INFORMATION_SCHEMA.SCHEMATA;'

DB engine Conversion[edit | edit source]

The table type has nothing to do with Collation, but while updating my database, I wanted to switch the engine from MyISAM to InnoDB which is now the default.

This SQL worked flawlessly to switch any MyISAM tables to INNODB; without touching the *searchindex table(s) because that uses MySQL fulltext search which doesn't come into INNODB until MySQL v. 5.6

/* Find and convert your MediaWiki tables to use INNODB */

SET @DATABASE_NAME = 'mediawiki';

SELECT  CONCAT('ALTER TABLE `', table_name, '` ENGINE=InnoDB;') AS sql_statements
FROM    information_schema.tables AS tb
WHERE   table_schema = @DATABASE_NAME
AND     `ENGINE` = 'MyISAM'
AND     `TABLE_TYPE` = 'BASE TABLE'
AND     `tb`.table_name NOT REGEXP '.*searchindex'
ORDER BY table_name DESC;

/* Then copy the output and execute that. E.g. : */
ALTER TABLE `watchlist` ENGINE=InnoDB;
ALTER TABLE `user_newtalk` ENGINE=InnoDB;
ALTER TABLE `user` ENGINE=InnoDB;
ALTER TABLE `transcache` ENGINE=InnoDB;
ALTER TABLE `trackbacks` ENGINE=InnoDB;
ALTER TABLE `text` ENGINE=InnoDB;
ALTER TABLE `site_stats` ENGINE=InnoDB;
ALTER TABLE `recentchanges` ENGINE=InnoDB;
ALTER TABLE `querycache` ENGINE=InnoDB;
ALTER TABLE `oldimage` ENGINE=InnoDB;
ALTER TABLE `objectcache` ENGINE=InnoDB;
ALTER TABLE `math` ENGINE=InnoDB;
ALTER TABLE `linkscc` ENGINE=InnoDB;
ALTER TABLE `links` ENGINE=InnoDB;
ALTER TABLE `ipblocks_old` ENGINE=InnoDB;
ALTER TABLE `interwiki` ENGINE=InnoDB;
ALTER TABLE `imagelinks` ENGINE=InnoDB;
ALTER TABLE `image` ENGINE=InnoDB;
ALTER TABLE `cur` ENGINE=InnoDB;
ALTER TABLE `categorylinks` ENGINE=InnoDB;
ALTER TABLE `brokenlinks` ENGINE=InnoDB;
ALTER TABLE `blobs` ENGINE=InnoDB;
ALTER TABLE `archive` ENGINE=InnoDB;


The fact that I use Elasticsearch makes native full-text search irrelevant, but still I want my MediaWiki to be setup correctly.

So then to change the charset/collation:

  • UTF8 is a very bad choice for that.
  • If you ever need to put the DB data into something Unicode, then be sure to use the utf8mb4 (Multi-byte 4) variety.

The *recommended* charset/collation is 'binary BINARY'.

ALTER table <tablename> CHARACTER SET = 'binary' COLLATE = 'BINARY';


-- You can generate the above SQL commands with the following:

SET @DATABASE_NAME = 'wiki_freephile';
SELECT 
    CONCAT("ALTER TABLE `",
            table_name,
            "` CHARACTER SET = 'binary' COLLATE = 'BINARY';") AS sql_statements
FROM
    information_schema.tables AS tb
WHERE
    table_schema = @DATABASE_NAME
        AND `TABLE_COLLATION` = 'latin1_swedish_ci'
        AND `TABLE_TYPE` = 'BASE TABLE'
        AND `tb`.table_name NOT REGEXP '.*searchindex'
ORDER BY table_name DESC;

Table or column collation[edit | edit source]

You can go further than setting the database, or table defaults; you can look at collation and character sets for individual columns. Individual columns probably don't need to be (and shouldn't be) converted to a new character set or collation unless you are certain you know what you're doing. MySQL has this reference on charset conversion

You can look at the distribution of column collations with

SET @DATABASE_NAME = 'mediawiki';
SELECT TABLE_SCHEMA, count(COLLATION_NAME) AS 'count', COLLATION_NAME
FROM    information_schema.columns AS cols
WHERE   table_schema = @DATABASE_NAME
GROUP BY COLLATION_NAME;

For a default demo database, you might see something like this

TABLE_SCHEMA count COLLATION_NAME
wiki_demo 0 NULL
wiki_demo 36 latin1_bin
wiki_demo 37 latin1_swedish_ci
wiki_demo 3 utf8_general_ci

or the full detail at

SELECT 
  TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, COLLATION_NAME 
FROM information_schema.columns AS cols 
WHERE table_schema = "wiki_demo";