今天学习python的时候发现一个好玩儿的工具叫做tidy。 安装很简单 yaourt -S tidyhtml
可以直接在命令行中使用
tidy file
python code $ cat chapters-15-tidy.py
#!/usr/bin/python2
from subprocess import Popen, PIPE
text = open(‘./messy.html’).read()
tidy = Popen(‘tidy’, stdin=PIPE, stdout=PIPE, stderr=PIPE)
tidy.stdin.write(text)
tidy.stdin.close()
print tidy.stdout.read()
然后就自动输出修复后的文件。同样搭配 **>**可以重定向到文件。同时这个命令在默认情况下还可以输出修复了那些内容。 例如原文:
$ cat messy.html
Pet Shop
=========
Complaints
There is no way at all we can accetp returned
parrots.
Dead pets
_Our pets may tend to rest at times. but rarely die within the
warranty period.
News
We have just received a really nice parrot.
It’s really nice
The Norwegian Blue
Plumage and
pining behavior
More information Features:_ * _Beautiful plumage
修复后:_
_$ tidy messy.html
line 1 column 1 - Warning: missing declaration
line 1 column 1 - Warning: inserting implicit
line 1 column 1 - Warning: missing before
line 2 column 1 - Warning: missing
------------------------------------
before
line 2 column 15 - Warning: discarding unexpected
line 4 column 19 - Warning: replacing unexpected b by
line 4 column 29 - Warning: inserting implicit _
line 7 column 5 - Warning: missing _ before
line 9 column 4 - Warning: inserting implicit _
line 12 column 1 - Warning: missing _ before_
_line 9 column 4 - Warning: missing_ before
---------------------------------------------
line 12 column 8 - Warning: inserting implicit _line 12 column 17 - Warning: discarding unexpected_
line 14 column 26 - Warning: missing before
line 16 column 4 - Warning: inserting implicit **line 18 column 5 - Warning:**
* * *
**isn't allowed in
### elements
line 18 column 1 - Info:
### previously mentioned
line 20 column 17 - Warning:
* * *
isn't allowed in
#### elements
line 20 column 1 - Info:****#### previously mentioned
line 21 column 1 - Warning:is probably intended as
line 24 column 1 - Warning: discarding unexpected
line 25 column 1 - Warning: * isn't allowed in elements
line 1 column 1 - Info: previously mentioned
line 25 column 1 - Warning: inserting implicit
line 25 column 1 - Warning: missing
line 1 column 1 - Warning: inserting missing 'title' element
line 12 column 1 - Warning: trimming empty _Info: Document content looks like HTML 3.2
24 warnings, 0 errors were found!
Pet Shop
========
Complaints
----------
There is **no _way_** _at all_ we can accetp
returned parrots.
_Dead pets_
===========
_Our pets may tend to rest at times. but rarely die within the
warranty period._
_News_
------
We have just received **a really nice parrot.**
**It's really nice**
* * *
### The Norwegian Blue
#### Plumage and
* * *
#### pining behavior
[More information](#norwegian-blue)
Features:
* Beautiful plumage
To learn more about HTML Tidy see http://tidy.sourceforge.net
Please fill bug reports and queries using the "tracker" on the Tidy web site.
Additionally, questions can be sent to html-tidy@w3.org
HTML and CSS specifications are available from http://www.w3.org/
Lobby your company to join W3C, see http://www.w3.org/Consortium_**
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------