Open main menu

Changes

11,322 bytes added ,  14:36, 7 May 2009
New page: See Also Using Tidy with PHP == Syntax Checking and Cleanup with Tidy == Tidy is a command-line tool and library (with GUI interfaces available) that is also integrated into Quanta+ e...
See Also [[Using Tidy with PHP]]

== Syntax Checking and Cleanup with Tidy ==
Tidy is a command-line tool and library (with GUI interfaces available) that is also integrated into Quanta+ editor. Using tidy, you can clean and correct the syntax of XML, XHTML, and HTML documents -- especially those found in the wild or authored by other applications such as [[Open Office]] or Microsoft Office.

Content authors should be familiar with using Tidy in one or more methods. Tidy can be the integrated option provided with Quanta 'Tools > HTML Tidy syntax checking' or Ctrl + Alt + T. It can be command-line driven by the configuration file. For Microsoft Windows, you can use HTML Tidy kit (http://www.chami.com/html-kit/)

This quick-ref is helpful to lookup tidy configuration options http://tidy.sourceforge.net/docs/quickref.html

In a team environment, you can publish/share a common tidy.rc so that everyone is using the same configuration. This combination of command-line options is pretty good:
<source lang="bash">
tidy -m -i -wrap 120 -clean -bare -asxhtml myfile.html
</source>
;-m
: causes the file to be modified in place (if you don't want this, then omit the -m and specify an output file with "-o myfile-Tidied.html"
;-i
: creates an indented structure
; -wrap 120
: Lines will get wrapped at 120 characters
; -clean
: font tags will be replaced with styles
; -bare
: smart quotes and em dashes will get stripped
; -asxhtml
: the output will be XHTML compliant

To really clean up content coming from Microsoft Word, try the following
<source lang="bash">
tidy -i -wrap 120 -bare -asxhtml -drop-empty-paras -drop-font-tags -drop-proprietary-attributes -word-2000 myfile.html
</source>

== Help ==
The command tidy -h will print out the version, and usage hints below:

<pre>
tidy -h
tidy [option...] [file...] [option...] [file...]
Utility to clean up and pretty print HTML/XHTML/XML
see http://tidy.sourceforge.net/

Options for HTML Tidy for Linux/x86 released on 12 April 2005:

File manipulation
-----------------
-output <file>, -o write output to the specified <file>
<file>
-config <file> set configuration options from the specified <file>
-file <file>, -f write errors to the specified <file>
<file>
-modify, -m modify the original input files

Processing directives
---------------------
-indent, -i indent element content
-wrap <column>, -w wrap text at the specified <column> (default is 68)
<column>
-upper, -u force tags to upper case (default is lower case)
-clean, -c replace FONT, NOBR and CENTER tags by CSS
-bare, -b strip out smart quotes and em dashes, etc.
-numeric, -n output numeric rather than named entities
-errors, -e only show errors
-quiet, -q suppress nonessential output
-omit omit optional end tags
-xml specify the input is well formed XML
-asxml, -asxhtml convert HTML to well formed XHTML
-ashtml force XHTML to well formed HTML
-access <level> do additional accessibility checks (<level> =1, 2, 3)

Character encodings
-------------------
-raw output values above 127 without conversion to entities
-ascii use ISO-8859-1 for input, US-ASCII for output
-latin0 use ISO-8859-15 for input, US-ASCII for output
-latin1 use ISO-8859-1 for both input and output
-iso2022 use ISO-2022 for both input and output
-utf8 use UTF-8 for both input and output
-mac use MacRoman for input, US-ASCII for output
-win1252 use Windows-1252 for input, US-ASCII for output
-ibm858 use IBM-858 (CP850+Euro) for input, US-ASCII for output
-utf16le use UTF-16LE for both input and output
-utf16be use UTF-16BE for both input and output
-utf16 use UTF-16 for both input and output
-big5 use Big5 for both input and output
-shiftjis use Shift_JIS for both input and output
-language <lang> set the two-letter language code <lang> (for future use)

Miscellaneous
-------------
-version, -v show the version of Tidy
-help, -h, -? list the command line options
-xml-help list the command line options in XML format
-help-config list all configuration options
-xml-config list all configuration options in XML format
-show-config list the current configuration settings

Use --blah blarg for any configuration option "blah" with argument "blarg"

Input/Output default to stdin/stdout respectively
Single letter options apart from -f may be combined
as in: tidy -f errs.txt -imu foo.html
For further info on HTML see http://www.w3.org/MarkUp
</pre>


== More More Info==
See the project page at http://www.w3.org/MarkUp

==Configuring Tidy==
The command tidy -help-config will output a list of all the settings that you can put into a tidy resource configuration file (e.g. ~/.tidy.rc)

<pre>
$ tidy -help-config

HTML Tidy Configuration Settings

Within a file, use the form:

wrap: 72
indent: no

When specified on the command line, use the form:

--wrap 72 --indent no

Name Type Allowable values
==================================== ========================================
accessibility-check enum 0 (Tidy Classic), 1 (Priority 1 Checks),
2 (Priority 2 Checks), 3 (Priority 3
Checks)
add-xml-decl Boolean y/n, yes/no, t/f, true/false, 1/0
add-xml-space Boolean y/n, yes/no, t/f, true/false, 1/0
alt-text String -
ascii-chars Boolean y/n, yes/no, t/f, true/false, 1/0
assume-xml-procins Boolean y/n, yes/no, t/f, true/false, 1/0
bare Boolean y/n, yes/no, t/f, true/false, 1/0
break-before-br Boolean y/n, yes/no, t/f, true/false, 1/0
char-encoding Encoding raw, ascii, latin0, latin1, utf8,
iso2022, mac, win1252, ibm858, utf16le,
utf16be, utf16, big5, shiftjis
clean Boolean y/n, yes/no, t/f, true/false, 1/0
css-prefix String -
doctype DocType omit, auto, strict, transitional, user
drop-empty-paras Boolean y/n, yes/no, t/f, true/false, 1/0
drop-font-tags Boolean y/n, yes/no, t/f, true/false, 1/0
drop-proprietary-attributes Boolean y/n, yes/no, t/f, true/false, 1/0
enclose-block-text Boolean y/n, yes/no, t/f, true/false, 1/0
enclose-text Boolean y/n, yes/no, t/f, true/false, 1/0
error-file String -
escape-cdata Boolean y/n, yes/no, t/f, true/false, 1/0
fix-backslash Boolean y/n, yes/no, t/f, true/false, 1/0
fix-bad-comments Boolean y/n, yes/no, t/f, true/false, 1/0
fix-uri Boolean y/n, yes/no, t/f, true/false, 1/0
force-output Boolean y/n, yes/no, t/f, true/false, 1/0
gnu-emacs Boolean y/n, yes/no, t/f, true/false, 1/0
gnu-emacs-file String -
hide-comments Boolean y/n, yes/no, t/f, true/false, 1/0
hide-endtags Boolean y/n, yes/no, t/f, true/false, 1/0
indent AutoBool auto, y/n, yes/no, t/f, true/false, 1/0
indent-attributes Boolean y/n, yes/no, t/f, true/false, 1/0
indent-cdata Boolean y/n, yes/no, t/f, true/false, 1/0
indent-spaces Integer 0, 1, 2, ...
input-encoding Encoding raw, ascii, latin0, latin1, utf8,
iso2022, mac, win1252, ibm858, utf16le,
utf16be, utf16, big5, shiftjis
input-xml Boolean y/n, yes/no, t/f, true/false, 1/0
join-classes Boolean y/n, yes/no, t/f, true/false, 1/0
join-styles Boolean y/n, yes/no, t/f, true/false, 1/0
keep-time Boolean y/n, yes/no, t/f, true/false, 1/0
language String -
literal-attributes Boolean y/n, yes/no, t/f, true/false, 1/0
logical-emphasis Boolean y/n, yes/no, t/f, true/false, 1/0
lower-literals Boolean y/n, yes/no, t/f, true/false, 1/0
markup Boolean y/n, yes/no, t/f, true/false, 1/0
merge-divs AutoBool auto, y/n, yes/no, t/f, true/false, 1/0
ncr Boolean y/n, yes/no, t/f, true/false, 1/0
new-blocklevel-tags Tag names tagX, tagY, ...
new-empty-tags Tag names tagX, tagY, ...
new-inline-tags Tag names tagX, tagY, ...
new-pre-tags Tag names tagX, tagY, ...
newline enum LF, CRLF, CR
numeric-entities Boolean y/n, yes/no, t/f, true/false, 1/0
output-bom AutoBool auto, y/n, yes/no, t/f, true/false, 1/0
output-encoding Encoding raw, ascii, latin0, latin1, utf8,
iso2022, mac, win1252, ibm858, utf16le,
utf16be, utf16, big5, shiftjis
output-file String -
output-html Boolean y/n, yes/no, t/f, true/false, 1/0
output-xhtml Boolean y/n, yes/no, t/f, true/false, 1/0
output-xml Boolean y/n, yes/no, t/f, true/false, 1/0
punctuation-wrap Boolean y/n, yes/no, t/f, true/false, 1/0
quiet Boolean y/n, yes/no, t/f, true/false, 1/0
quote-ampersand Boolean y/n, yes/no, t/f, true/false, 1/0
quote-marks Boolean y/n, yes/no, t/f, true/false, 1/0
quote-nbsp Boolean y/n, yes/no, t/f, true/false, 1/0
repeated-attributes enum keep-first, keep-last
replace-color Boolean y/n, yes/no, t/f, true/false, 1/0
show-body-only Boolean y/n, yes/no, t/f, true/false, 1/0
show-errors Integer 0, 1, 2, ...
show-warnings Boolean y/n, yes/no, t/f, true/false, 1/0
slide-style String -
split Boolean y/n, yes/no, t/f, true/false, 1/0
tab-size Integer 0, 1, 2, ...
tidy-mark Boolean y/n, yes/no, t/f, true/false, 1/0
uppercase-attributes Boolean y/n, yes/no, t/f, true/false, 1/0
uppercase-tags Boolean y/n, yes/no, t/f, true/false, 1/0
vertical-space Boolean y/n, yes/no, t/f, true/false, 1/0
word-2000 Boolean y/n, yes/no, t/f, true/false, 1/0
wrap Integer 0 (no wrapping), 1, 2, ...
wrap-asp Boolean y/n, yes/no, t/f, true/false, 1/0
wrap-attributes Boolean y/n, yes/no, t/f, true/false, 1/0
wrap-jste Boolean y/n, yes/no, t/f, true/false, 1/0
wrap-php Boolean y/n, yes/no, t/f, true/false, 1/0
wrap-script-literals Boolean y/n, yes/no, t/f, true/false, 1/0
wrap-sections Boolean y/n, yes/no, t/f, true/false, 1/0
write-back Boolean y/n, yes/no, t/f, true/false, 1/0
</pre>


==Defaults==
To see what configuration values are presently set (could well be the defaults if you're not using a configuration file)
<source lang="bash">
tidy -show-config
</source>

[[Category:Best Practices]]
[[Category:Tools]]
4,558

edits