我被分配了一项任务,列出25k个网站,并删除那些关闭/无响应的网站。我想最简单的方法就是: with website in websites:
try:
req = Request(test, headers={"User-Agent": "Mozilla/5.0 (Linux i686)"})
with contextlib.closing(urlopen(req)) as response:
new_list.add(response.geturl())
except:
我尝试配置匿名代理(centos+squid),但一些站点仍然只显示我的外部IP (而不是代理IP)。
因此,squid配置如下:
server_persistent_connections off
forwarded_for off
request_header_access From deny all
request_header_access Referer deny all
request_header_access Proxy-Connection deny all
request_header_access Server deny all
request_header_acces