今天朋友告诉我说终于把手机给我寄了,但是用的是USPS的first class,我的个天,只有看自己人品好不好了。写了个抓取进度的脚本。很简单。 遇到一些问题刚开始两个list循环如下:

for info,date in infos, dates:
print “%s @ %s” % (info.text, ‘ ‘.join(date.text.split()))

会失败报错ValueError: too many values to unpack 想了下可以直接用index指定输出内容呀!(真傻还想了一会儿才想到)于是该了一下成这样

for i in range(len(track_date)):
print “%s @ %s” % (track_info[i].text,track_date[i])

这里重要的是track_info[i].text里的.text 才会提取内容,不然html的tag都一并输出了这就不友好了。

#!/usr/bin/python2

#auth:codewalker #mail:001@codewalker.me

#date: 2013-11-21

from bs4 import BeautifulSoup
import urllib2
import sys

def main():
if len(sys.argv) != 2:
print “Need a track number”
exit(1)
track_number = sys.argv[1]
track_date = []
headers = { ‘User-Agent’ : ‘Mozilla/5.0’ }
url = “https://tools.usps.com/go/TrackConfirmAction!input.action?tRef=qt&tLc=1&tLabels=" + track_number
req = urllib2.Request(url, None, headers)
htmltext = urllib2.urlopen(req).read()
soup = BeautifulSoup(htmltext)
track_info = soup.find_all(“span”, { “class” : “info-text” })
dates = soup.find_all(“td”, { “class” : “date-time” })
for date in dates:
d = ‘ ‘.join(date.p.text.split())
track_date.append(d)
for i in range(len(track_date)):
print “%s @ %s” % (track_info[i].text,track_date[i])

if __name__ == “__main__“:
main()

#INPUT:
# ./track.package.py LC945862504US
#
#

#OUTPUT:
# Processed at USPS Origin Sort Facility @ November 21, 2013 , 12:04 am
# Dispatched to Sort Facility @ November 20, 2013 , 6:14 pm
# Acceptance @ November 20, 2013 , 4:07 pm