今天学习python的时候发现一个好玩儿的工具叫做tidy。 安装很简单 yaourt -S tidyhtml 可以直接在命令行中使用

tidy file

python code $ cat chapters-15-tidy.py

from subprocess import Popen, PIPE

text = open(‘./messy.html’).read()
tidy = Popen(‘tidy’, stdin=PIPE, stdout=PIPE, stderr=PIPE)


print tidy.stdout.read()

然后就自动输出修复后的文件。同样搭配 **>**可以重定向到文件。同时这个命令在默认情况下还可以输出修复了那些内容。 例如原文:

$ cat messy.html

Pet Shop



There is no way at all we can accetp returned

Dead pets

_Our pets may tend to rest at times. but rarely die within the
warranty period.


We have just received a really nice parrot.

It’s really nice

The Norwegian Blue

Plumage and

pining behavior

More information Features:_ * _Beautiful plumage


_$ tidy messy.html
line 1 column 1 - Warning: missing  declaration
line 1 column 1 - Warning: inserting implicit 
line 1 column 1 - Warning: missing  before 

line 2 column 1 - Warning: missing 

line 2 column 15 - Warning: discarding unexpected 
line 4 column 19 - Warning: replacing unexpected b by 
line 4 column 29 - Warning: inserting implicit _
line 7 column 5 - Warning: missing _ before 
line 9 column 4 - Warning: inserting implicit _
line 12 column 1 - Warning: missing _ before_ 

 _line 9 column 4 - Warning: missing_ before 

 line 12 column 8 - Warning: inserting implicit _line 12 column 17 - Warning: discarding unexpected_ 
line 14 column 26 - Warning: missing  before 

line 16 column 4 - Warning: inserting implicit **line 18 column 5 - Warning:** 

* * *

 **isn't allowed in 

###  elements
line 18 column 1 - Info: 

###  previously mentioned
line 20 column 17 - Warning: 

* * *

 isn't allowed in 

####  elements
line 20 column 1 - Info:****####  previously mentioned
line 21 column 1 - Warning:is probably intended as 
line 24 column 1 - Warning: discarding unexpected 
line 25 column 1 - Warning: *    isn't allowed in  elements
    line 1 column 1 - Info:  previously mentioned
    line 25 column 1 - Warning: inserting implicit 
    line 25 column 1 - Warning: missing 
    line 1 column 1 - Warning: inserting missing 'title' element
    line 12 column 1 - Warning: trimming empty _Info: Document content looks like HTML 3.2
    24 warnings, 0 errors were found!
    Pet Shop
    There is **no _way_** _at all_ we can accetp
    returned parrots.
    _Dead pets_
    _Our pets may tend to rest at times. but rarely die within the
    warranty period._ 
    We have just received **a really nice parrot.**
    **It's really nice**
    * * *
    ### The Norwegian Blue
    #### Plumage and
    * * *
    #### pining behavior
    [More information](#norwegian-blue)
    *   Beautiful plumage
    To learn more about HTML Tidy see http://tidy.sourceforge.net
    Please fill bug reports and queries using the "tracker" on the Tidy web site.
    Additionally, questions can be sent to html-tidy@w3.org
    HTML and CSS specifications are available from http://www.w3.org/
    Lobby your company to join W3C, see http://www.w3.org/Consortium_** 