Open main menu

Changes

5,995 bytes added ,  18:01, 5 December 2008
New page: This page is about tools for converting content from one document/file format to another. This page is about '''all''' document format conversions. There are very useful tools and discus...
This page is about tools for converting content from one document/file format to another.

This page is about '''all''' document format conversions. There are very useful tools and discussions about converting between [[MediaWiki]] wikitax and other target [[wiki]] syntaxes, [[XML]], XHTML, DocBook, [[OpenDocument]], Portable Document Format (PDF) and more

==Office Suites and Formats==

=== Converting Microsoft formats ===
The [[OpenOffice]] suite has built-in conversion tools to read and write a wide variety of Microsoft's proprietary file formats. However, since the Microsoft formats are proprietary, even the best conversion routines will fail somewhere. This is usually not a problem for everyday files. Conversion becomes more error prone with the addition of advanced document features such as special effects, transitions, macros, or advanced Object linking and embedding. Still, with the built-in conversion abilities of OpenOffice, it makes the majority of cases practical. Users of OpenOffice will usually be able to read documents sent to them by users of Microsoft Office products.

Conversion is routinely done by the OpenOffice suite when opening or saving a document. You can use the "File -> Save As" menu option to convert your document to the format of your choice.

==== Programmatic / Automated Conversion ====

You can do bulk conversions of entire collections of documents by hooking into the conversion capabilities of the OpenOffice suite. In this way, you could produce PDF output for your entire collection of marketing materials.

Basically following the information at http://www.xml.com/lpt/a/1638, create a local macro, and run it from the command line:
<bash>
ooffice2 -invisible "macro:///Standard.MyConverters.SaveAsOOO(/home/greg/projects/slides/executive.ppt)"
</bash>
(Note that OOO is just shorthand for OpenOffice.Org)

But, that process failed to create a good document.... When I tried to open the document, OpenOffice hung. After killing OO, and restarting it, OO would try to recover the document, but it would fail to bring up the document after the recovery. Searching the web, it turns out that exporting the following System variable will heal Impress (the snippet below will add it to your current environment and also your bash configuration file for future logins)
<source lang="bash">
echo 'export MALLOC_CHECK_=2' >> ~/.bashrc && source ~/.bashrc
</source>

The following bash script will do a batch interactive conversion of any given directory, finding all Microsoft PowerPoint, Doc and Excel files.

<source lang="bash">
#!/bin/bash

# setup an option where the user can abort
END_CONDITION=quit

directory=${1-`pwd`}
# Defaults to current working directory,
#+ if not otherwise specified.

# use an override during development
# directory='/home/greg/Documents'

echo "Using converter to process $directory"


# for file in "$( find $directory -type f -name '*ppt' -o -name '*doc' -o -name '*xls' )"
for file in `find $directory -type f -name '*ppt' -o -name '*doc' -o -name '*xls'`
do
if [ "$2" = "dry-run" ]
then
echo "found: $file"
else
until [ "$var1" = "$END_CONDITION" ]
do
echo "Do you want to convert?"
echo "$file"
echo "(type '$END_CONDITION' to abort processing; press [enter] to continue)"
read var1
# to do add case statement which checks for the existance of the target file and skip processing
echo "processing... "
ooffice2 -invisible "macro:///Standard.MyConverters.SaveAsOOO($file)";
done
fi
done

exit 0
</source>

I do believe the converter will work in tandem with a PHP script that I'm developing, even if there is no guarantee that the resulting file will be usable or faithful to the original. In order to get the PHP script to work you need to daemonize OpenOffice, which means that you have to give it a virtual frame buffer. Additionally you need to have the python interpreter installed and the python UNO bridge
<source lang="bash">
sudo apt-get update && sudo apt-get install xvfb python python-uno
</source>


==Wiki formats==


=== Mediawiki DTD ===
http://meta.wikimedia.org/wiki/Wikipedia_DTD

=== Wiki To XML ===

There is a tool created by Magnus Manske (lead/core developer of Mediawiki) that converts Mediawiki documents into XML and '''a variety of file formats'''. Since Mediawiki has an XML DTD, it may well prove to be 100% XML based.

* XML
* Plain text Use *_/ markup Put ? before internal links
* Plain text, google-translated to (works only for wikipedia/wikibooks; probably depends on Google [[API]] key)
* XHTML
* DocBook XML
* DocBook PDF
* DocBook HTML
* OpenOffice XML
* OpenOffice ODT

[http://tools.wikimedia.de/~magnus/wiki2xml/README Developer info] In fact, that tool is one of many wiki worker tools that Magnus provides, so if you plan to, or already author in a wiki environment, you might want to check out the tools. http://tools.wikimedia.de/~magnus/

The converter itself is at [[Special:Wiki2XML]]


=== DocBook to Mediawiki ===
Apparently the blender project is converting their internal documentation to the MediaWiki format, and they have developed some useful PHP and Python scripts for doing this. The Python one seems more polished than the PHP version at the time I looked at it. Referencing it here for curiosity more than anything. I do not know of a current need for this particular conversion.
http://mediawiki.blender.org/index.php/Meta/DocBook_to_Wiki

See also http://meta.wikimedia.org/wiki/DocBook_XML_export

=== Resources and external efforts ===
The Hula project has a lot of information on wiki format conversion
http://www.hula-project.org/Wiki_Conversion

Various people are coordinating an effort to make PDF and ODF export of wikis
http://wikimediafoundation.org/wiki/Wikis_Go_Printable

The OpenOffice Writer has an export filter that allows you to author in OpenOffice and then save your document in wiki format.

[[Category:Wiki]]
[[Category:Development]]
[[Category:Applications]]
[[Category:Howto]]
[[Category:Formats]]
4,558

edits