Difference between revisions of "Tidy"
(fixes option syntax) |
(New page: See Also Using Tidy with PHP == Syntax Checking and Cleanup with Tidy == Tidy is a command-line tool and library (with GUI interfaces available) that is also integrated into Quanta+ e...) |
||
Line 27: | Line 27: | ||
To really clean up content coming from Microsoft Word, try the following | To really clean up content coming from Microsoft Word, try the following | ||
<source lang="bash"> | <source lang="bash"> | ||
− | tidy -i -wrap 120 -bare -asxhtml | + | tidy -i -wrap 120 -bare -asxhtml -drop-empty-paras -drop-font-tags -drop-proprietary-attributes -word-2000 myfile.html |
</source> | </source> | ||
Revision as of 14:36, 7 May 2009
See Also Using Tidy with PHP
Contents
Syntax Checking and Cleanup with Tidy[edit | edit source]
Tidy is a command-line tool and library (with GUI interfaces available) that is also integrated into Quanta+ editor. Using tidy, you can clean and correct the syntax of XML, XHTML, and HTML documents -- especially those found in the wild or authored by other applications such as Open Office or Microsoft Office.
Content authors should be familiar with using Tidy in one or more methods. Tidy can be the integrated option provided with Quanta 'Tools > HTML Tidy syntax checking' or Ctrl + Alt + T. It can be command-line driven by the configuration file. For Microsoft Windows, you can use HTML Tidy kit (http://www.chami.com/html-kit/)
This quick-ref is helpful to lookup tidy configuration options http://tidy.sourceforge.net/docs/quickref.html
In a team environment, you can publish/share a common tidy.rc so that everyone is using the same configuration. This combination of command-line options is pretty good:
tidy -m -i -wrap 120 -clean -bare -asxhtml myfile.html
- -m
- causes the file to be modified in place (if you don't want this, then omit the -m and specify an output file with "-o myfile-Tidied.html"
- -i
- creates an indented structure
- -wrap 120
- Lines will get wrapped at 120 characters
- -clean
- font tags will be replaced with styles
- -bare
- smart quotes and em dashes will get stripped
- -asxhtml
- the output will be XHTML compliant
To really clean up content coming from Microsoft Word, try the following
tidy -i -wrap 120 -bare -asxhtml -drop-empty-paras -drop-font-tags -drop-proprietary-attributes -word-2000 myfile.html
Help[edit | edit source]
The command tidy -h will print out the version, and usage hints below:
tidy -h tidy [option...] [file...] [option...] [file...] Utility to clean up and pretty print HTML/XHTML/XML see http://tidy.sourceforge.net/ Options for HTML Tidy for Linux/x86 released on 12 April 2005: File manipulation ----------------- -output <file>, -o write output to the specified <file> <file> -config <file> set configuration options from the specified <file> -file <file>, -f write errors to the specified <file> <file> -modify, -m modify the original input files Processing directives --------------------- -indent, -i indent element content -wrap <column>, -w wrap text at the specified <column> (default is 68) <column> -upper, -u force tags to upper case (default is lower case) -clean, -c replace FONT, NOBR and CENTER tags by CSS -bare, -b strip out smart quotes and em dashes, etc. -numeric, -n output numeric rather than named entities -errors, -e only show errors -quiet, -q suppress nonessential output -omit omit optional end tags -xml specify the input is well formed XML -asxml, -asxhtml convert HTML to well formed XHTML -ashtml force XHTML to well formed HTML -access <level> do additional accessibility checks (<level> =1, 2, 3) Character encodings ------------------- -raw output values above 127 without conversion to entities -ascii use ISO-8859-1 for input, US-ASCII for output -latin0 use ISO-8859-15 for input, US-ASCII for output -latin1 use ISO-8859-1 for both input and output -iso2022 use ISO-2022 for both input and output -utf8 use UTF-8 for both input and output -mac use MacRoman for input, US-ASCII for output -win1252 use Windows-1252 for input, US-ASCII for output -ibm858 use IBM-858 (CP850+Euro) for input, US-ASCII for output -utf16le use UTF-16LE for both input and output -utf16be use UTF-16BE for both input and output -utf16 use UTF-16 for both input and output -big5 use Big5 for both input and output -shiftjis use Shift_JIS for both input and output -language <lang> set the two-letter language code <lang> (for future use) Miscellaneous ------------- -version, -v show the version of Tidy -help, -h, -? list the command line options -xml-help list the command line options in XML format -help-config list all configuration options -xml-config list all configuration options in XML format -show-config list the current configuration settings Use --blah blarg for any configuration option "blah" with argument "blarg" Input/Output default to stdin/stdout respectively Single letter options apart from -f may be combined as in: tidy -f errs.txt -imu foo.html For further info on HTML see http://www.w3.org/MarkUp
More More Info[edit | edit source]
See the project page at http://www.w3.org/MarkUp
Configuring Tidy[edit | edit source]
The command tidy -help-config will output a list of all the settings that you can put into a tidy resource configuration file (e.g. ~/.tidy.rc)
$ tidy -help-config
HTML Tidy Configuration Settings
Within a file, use the form:
wrap: 72
indent: no
When specified on the command line, use the form:
--wrap 72 --indent no
Name Type Allowable values
==================================== ========================================
accessibility-check enum 0 (Tidy Classic), 1 (Priority 1 Checks),
2 (Priority 2 Checks), 3 (Priority 3
Checks)
add-xml-decl Boolean y/n, yes/no, t/f, true/false, 1/0
add-xml-space Boolean y/n, yes/no, t/f, true/false, 1/0
alt-text String -
ascii-chars Boolean y/n, yes/no, t/f, true/false, 1/0
assume-xml-procins Boolean y/n, yes/no, t/f, true/false, 1/0
bare Boolean y/n, yes/no, t/f, true/false, 1/0
break-before-br Boolean y/n, yes/no, t/f, true/false, 1/0
char-encoding Encoding raw, ascii, latin0, latin1, utf8,
iso2022, mac, win1252, ibm858, utf16le,
utf16be, utf16, big5, shiftjis
clean Boolean y/n, yes/no, t/f, true/false, 1/0
css-prefix String -
doctype DocType omit, auto, strict, transitional, user
drop-empty-paras Boolean y/n, yes/no, t/f, true/false, 1/0
drop-font-tags Boolean y/n, yes/no, t/f, true/false, 1/0
drop-proprietary-attributes Boolean y/n, yes/no, t/f, true/false, 1/0
enclose-block-text Boolean y/n, yes/no, t/f, true/false, 1/0
enclose-text Boolean y/n, yes/no, t/f, true/false, 1/0
error-file String -
escape-cdata Boolean y/n, yes/no, t/f, true/false, 1/0
fix-backslash Boolean y/n, yes/no, t/f, true/false, 1/0
fix-bad-comments Boolean y/n, yes/no, t/f, true/false, 1/0
fix-uri Boolean y/n, yes/no, t/f, true/false, 1/0
force-output Boolean y/n, yes/no, t/f, true/false, 1/0
gnu-emacs Boolean y/n, yes/no, t/f, true/false, 1/0
gnu-emacs-file String -
hide-comments Boolean y/n, yes/no, t/f, true/false, 1/0
hide-endtags Boolean y/n, yes/no, t/f, true/false, 1/0
indent AutoBool auto, y/n, yes/no, t/f, true/false, 1/0
indent-attributes Boolean y/n, yes/no, t/f, true/false, 1/0
indent-cdata Boolean y/n, yes/no, t/f, true/false, 1/0
indent-spaces Integer 0, 1, 2, ...
input-encoding Encoding raw, ascii, latin0, latin1, utf8,
iso2022, mac, win1252, ibm858, utf16le,
utf16be, utf16, big5, shiftjis
input-xml Boolean y/n, yes/no, t/f, true/false, 1/0
join-classes Boolean y/n, yes/no, t/f, true/false, 1/0
join-styles Boolean y/n, yes/no, t/f, true/false, 1/0
keep-time Boolean y/n, yes/no, t/f, true/false, 1/0
language String -
literal-attributes Boolean y/n, yes/no, t/f, true/false, 1/0
logical-emphasis Boolean y/n, yes/no, t/f, true/false, 1/0
lower-literals Boolean y/n, yes/no, t/f, true/false, 1/0
markup Boolean y/n, yes/no, t/f, true/false, 1/0
merge-divs AutoBool auto, y/n, yes/no, t/f, true/false, 1/0
ncr Boolean y/n, yes/no, t/f, true/false, 1/0
new-blocklevel-tags Tag names tagX, tagY, ...
new-empty-tags Tag names tagX, tagY, ...
new-inline-tags Tag names tagX, tagY, ...
new-pre-tags Tag names tagX, tagY, ...
newline enum LF, CRLF, CR
numeric-entities Boolean y/n, yes/no, t/f, true/false, 1/0
output-bom AutoBool auto, y/n, yes/no, t/f, true/false, 1/0
output-encoding Encoding raw, ascii, latin0, latin1, utf8,
iso2022, mac, win1252, ibm858, utf16le,
utf16be, utf16, big5, shiftjis
output-file String -
output-html Boolean y/n, yes/no, t/f, true/false, 1/0
output-xhtml Boolean y/n, yes/no, t/f, true/false, 1/0
output-xml Boolean y/n, yes/no, t/f, true/false, 1/0
punctuation-wrap Boolean y/n, yes/no, t/f, true/false, 1/0
quiet Boolean y/n, yes/no, t/f, true/false, 1/0
quote-ampersand Boolean y/n, yes/no, t/f, true/false, 1/0
quote-marks Boolean y/n, yes/no, t/f, true/false, 1/0
quote-nbsp Boolean y/n, yes/no, t/f, true/false, 1/0
repeated-attributes enum keep-first, keep-last
replace-color Boolean y/n, yes/no, t/f, true/false, 1/0
show-body-only Boolean y/n, yes/no, t/f, true/false, 1/0
show-errors Integer 0, 1, 2, ...
show-warnings Boolean y/n, yes/no, t/f, true/false, 1/0
slide-style String -
split Boolean y/n, yes/no, t/f, true/false, 1/0
tab-size Integer 0, 1, 2, ...
tidy-mark Boolean y/n, yes/no, t/f, true/false, 1/0
uppercase-attributes Boolean y/n, yes/no, t/f, true/false, 1/0
uppercase-tags Boolean y/n, yes/no, t/f, true/false, 1/0
vertical-space Boolean y/n, yes/no, t/f, true/false, 1/0
word-2000 Boolean y/n, yes/no, t/f, true/false, 1/0
wrap Integer 0 (no wrapping), 1, 2, ...
wrap-asp Boolean y/n, yes/no, t/f, true/false, 1/0
wrap-attributes Boolean y/n, yes/no, t/f, true/false, 1/0
wrap-jste Boolean y/n, yes/no, t/f, true/false, 1/0
wrap-php Boolean y/n, yes/no, t/f, true/false, 1/0
wrap-script-literals Boolean y/n, yes/no, t/f, true/false, 1/0
wrap-sections Boolean y/n, yes/no, t/f, true/false, 1/0
write-back Boolean y/n, yes/no, t/f, true/false, 1/0
Defaults[edit | edit source]
To see what configuration values are presently set (could well be the defaults if you're not using a configuration file)
tidy -show-config