Format conversion: Difference between revisions

Adds wkhtmltopdf
m Text replacement - "<(\/?)source" to "<$1syntaxhighlight"
Tags: Mobile edit Mobile web edit
 
(3 intermediate revisions by one other user not shown)
Line 17: Line 17:


Basically following the information at http://www.xml.com/lpt/a/1638, create a local macro, and run it from the command line:
Basically following the information at http://www.xml.com/lpt/a/1638, create a local macro, and run it from the command line:
<source lang="bash">
<syntaxhighlight lang="bash">
ooffice2 -invisible "macro:///Standard.MyConverters.SaveAsOOO(/home/greg/projects/slides/executive.ppt)"
ooffice2 -invisible "macro:///Standard.MyConverters.SaveAsOOO(/home/greg/projects/slides/executive.ppt)"
</source>
</syntaxhighlight>
(Note that OOO is just shorthand for OpenOffice.Org)
(Note that OOO is just shorthand for OpenOffice.Org)


But, that process failed to create a good document.... When I tried to open the document, OpenOffice hung.  After killing OO, and restarting it, OO would try to recover the document, but it would fail to bring up the document after the recovery.  Searching the web, it turns out that exporting the following System variable will heal Impress (the snippet below will add it to your current environment and also your bash configuration file for future logins)
But, that process failed to create a good document.... When I tried to open the document, OpenOffice hung.  After killing OO, and restarting it, OO would try to recover the document, but it would fail to bring up the document after the recovery.  Searching the web, it turns out that exporting the following System variable will heal Impress (the snippet below will add it to your current environment and also your bash configuration file for future logins)
<source lang="bash">
<syntaxhighlight lang="bash">
echo 'export MALLOC_CHECK_=2' >> ~/.bashrc && source ~/.bashrc
echo 'export MALLOC_CHECK_=2' >> ~/.bashrc && source ~/.bashrc
</source>
</syntaxhighlight>


The following bash script will do a batch interactive conversion of any given directory, finding all Microsoft PowerPoint, Doc and Excel files.
The following bash script will do a batch interactive conversion of any given directory, finding all Microsoft PowerPoint, Doc and Excel files.


<source lang="bash">
<syntaxhighlight lang="bash">
#!/bin/bash
#!/bin/bash


Line 66: Line 66:


exit 0
exit 0
</source>
</syntaxhighlight>


I do believe the converter will work in tandem with a PHP script that I'm developing, even if there is no guarantee that the resulting file will be usable or faithful to the original.  In order to get the PHP script to work you need to daemonize OpenOffice, which means that you have to give it a virtual frame buffer.  Additionally you need to have the python interpreter installed and the python UNO bridge
I do believe the converter will work in tandem with a PHP script that I'm developing, even if there is no guarantee that the resulting file will be usable or faithful to the original.  In order to get the PHP script to work you need to daemonize OpenOffice, which means that you have to give it a virtual frame buffer.  Additionally you need to have the python interpreter installed and the python UNO bridge
<source lang="bash">
<syntaxhighlight lang="bash">
sudo apt-get update && sudo apt-get install xvfb python python-uno
sudo apt-get update && sudo apt-get install xvfb python python-uno
</source>
</syntaxhighlight>




==Wiki formats==
==Wiki formats==


=== Mediawiki DTD ===
=== Mediawiki DTD ===
http://meta.wikimedia.org/wiki/Wikipedia_DTD
http://meta.wikimedia.org/wiki/Wikipedia_DTD
=== pandoc ===
[http://code.google.com/p/pandoc/ Pandoc] is a Haskell library for converting from one markup format to another, and a command-line tool that uses this library. It can read markdown and (subsets of) reStructuredText, HTML, and LaTeX, and it can write markdown, reStructuredText, HTML, LaTeX, ConTeXt, Docbook XML, OpenDocument XML, GNU Texinfo, RTF, ODT, MediaWiki markup, groff man pages, and [[Presentation|S5 HTML slide shows]].
Or more simply, Pandoc rocks the free world!  Because Pandoc does MediaWiki format, we used it in the [[Html2Wiki]] extension.
To convert an HTML document to MediaWiki syntax, you can simply issue a command like
<syntaxhighlight lang="bash">
pandoc --from html --to mediawiki foo.html --output foo.wiki.txt
</syntaxhighlight>


=== Wiki To PDF ===
=== Wiki To PDF ===
Line 117: Line 127:
The OpenOffice Writer has an export filter that allows you to author in OpenOffice and then save your document in wiki format.
The OpenOffice Writer has an export filter that allows you to author in OpenOffice and then save your document in wiki format.


==== [http://code.google.com/p/pandoc/ pandoc - Google Code] ====
== Other ==
 
=== html to pdf ===
Pandoc is a Haskell library for converting from one markup format to another, and a command-line tool that uses this library. It can read markdown and (subsets of) reStructuredText, HTML, and LaTeX, and it can write markdown, reStructuredText, HTML, LaTeX, ConTeXt, Docbook XML, OpenDocument XML, GNU Texinfo, RTF, ODT, MediaWiki markup, groff man pages, and [[Presentation|S5 HTML slide shows]].
 
==== wkhtmltopdf ====
[http://wkhtmltopdf.org/index.html wkhtmltopdf] is an LGPLv3 tool to render HTML into PDF and various image formats using the QT Webkit rendering engine.
[http://wkhtmltopdf.org/index.html wkhtmltopdf] is an LGPLv3 tool to render HTML into PDF and various image formats using the QT Webkit rendering engine.