我有一个问题,写一个小工具,将提供任何网站的标题。我是python的新手,但想知道在开发该工具时,除了编码之外,我还需要在代码中考虑到什么吗?我有一个粗略的代码草稿,如下所示。python程序员有什么建议吗?
#!/usr/bin/python
import sys, urllib
if len(sys.argv) == 2:
website = sys.argv[1]
website = urllib.urlopen(sys.argv[1])
if(website.code != 200):
print "Something went wrong here"
print website.code
exit(0)
print 'Printing the headers'
print '-----------------------------------------'
for header, value in website.headers.items() :
print header + ' : ' + value发布于 2014-04-25 15:34:32
这似乎是一个相当简单的脚本(尽管这个问题似乎更适合stackoverflow)。先说几句,first curl -I是一个比较有用的命令行工具。其次,即使你没有得到200状态,仍然有一些有用的内容或标题你可能想要显示。例如,
$ curl -I http://security.stackexchange.com/asdf
HTTP/1.1 404 Not Found
Cache-Control: private
Content-Length: 24068
Content-Type: text/html; charset=utf-8
X-Frame-Options: SAMEORIGIN
Set-Cookie: prov=678b5b9c-0130-4398-9834-673475961dc6; domain=.stackexchange.com; expires=Fri, 01-Jan-2055 00:00:00 GMT; path=/; HttpOnly
Date: Fri, 25 Apr 2014 07:24:00 GMT还要注意的是,urllib会自动跟随重定向。例如,使用curl,你会看到:
$ curl -I http://www.security.stackexchange.com
HTTP/1.1 301 Moved Permanently
Content-Length: 157
Content-Type: text/html; charset=UTF-8
Location: http://security.stackexchange.com/
Date: Fri, 25 Apr 2014 07:26:52 GMT而你的工具只会给出。
$ python user3567119.py http://www.security.stackexchange.com
Printing the headers
-----------------------------------------
content-length : 68639
set-cookie : prov=9bf4f3d4-e3ae-4161-8e34-9aaa83f0aa4b; domain=.stackexchange.com; expires=Fri, 01-Jan-2055 00:00:00 GMT; path=/; HttpOnly
expires : Fri, 25 Apr 2014 07:29:32 GMT
vary : *
last-modified : Fri, 25 Apr 2014 07:28:32 GMT
connection : close
cache-control : public, no-cache="Set-Cookie", max-age=60
date : Fri, 25 Apr 2014 07:28:31 GMT
x-frame-options : SAMEORIGIN
content-type : text/html; charset=utf-8第三,如果您继续使用python处理HTTP请求,我强烈建议您使用requests。对于请求,如果您这样做,您将能够看到301:
In [1]: import requests
In [2]: r=requests.get('http://www.security.stackexchange.com')
In [3]: r
Out[3]: <Response [200]>
In [4]: r.history
Out[4]: (<Response [301]>,)在普通的telnet中尝试一些HTTP请求也是值得的。例如,telnet security.stackexchange.com 80,然后快速键入:
GET / HTTP/1.1
Host: security.stackexchange.com后跟一个空行。然后,您将在网络上看到实际的HTTP响应(而不是在urllib处理HTTP响应后重新创建它):
HTTP/1.1 200 OK
Cache-Control: public, no-cache="Set-Cookie", max-age=60
Content-Type: text/html; charset=utf-8
Expires: Fri, 25 Apr 2014 07:38:37 GMT
Last-Modified: Fri, 25 Apr 2014 07:37:37 GMT
Vary: *
X-Frame-Options: SAMEORIGIN
Set-Cookie: prov=a75de1f2-678b-4a9d-bbfd-39e933e60237; domain=.stackexchange.com; expires=Fri, 01-Jan-2055 00:00:00 GMT; path=/; HttpOnly
Date: Fri, 25 Apr 2014 07:37:36 GMT
Content-Length: 68849
<!DOCTYPE html>https://stackoverflow.com/questions/23292024
复制相似问题