Open main menu

Transforming HTML to WikiText

Transforming (hopefully well-formed) HTML to WikiText is required if you want to import HTML content into a wiki. Some editors do well at this, and perhaps the best editor to do this is the Visual editor which now is the default editor for the MediaWiki project.

Parsoid

The mw:Parsoid[1] project gives the ability to parse HTML, however the results definitely need to be examined to see how it might be able to work. Take a look at the mw:Parsoid/MediaWiki DOM spec for capabilities.

Example usage

cat uvm.html | webapps/wiki/extensions/parsoid/tests/parse.js --html2wt

Other Classes or Libraries

The 'Wikilog' extension [2] is a MediaWiki extension that adds "blogging" features [3] to MediaWiki (e.g. http://laussy.org/wiki/Blog). One aspect of the project is a PHP class that transforms HTML to wiki text. See https://github.com/mediawiki4intranet/Wikilog/blob/master/HtmlToMediaWiki.php Also, the Wikilog extension makes use of namespaces to create multiple blogs.

Other Transformations

http://en.wikipedia.org/wiki/Help:WordToWiki

References

  1. Git repo https://git.wikimedia.org/summary/mediawiki%2Fextensions%2FParsoid
  2. See also MediaWiki/Bundles
  3. however, it's unmaintained, so see the modified version at github