urllib2与网络请求
urllib2 是 Python 2 标准库中处理 HTTP 请求的核心模块。它支持 GET、POST、自定义请求头、Cookie、代理等功能。Python 3 中 urllib2 被拆分为 urllib.request 和 urllib.error。
基本请求
GET 请求:
import urllib2
response = urllib2.urlopen("http://example.com")
print response.read() # HTML 内容
print response.getcode() # 200
print response.geturl() # 最终 URL(可能有重定向)
print response.info() # 响应头信息
urlopen 返回一个类似文件的对象,支持 read()、readline()、readlines()、close()。
POST 请求:
import urllib, urllib2
data = urllib.urlencode({
"username": "alice",
"password": "secret"
})
request = urllib2.Request(
"http://example.com/login",
data=data,
headers={"Content-Type": "application/x-www-form-urlencoded"}
)
response = urllib2.urlopen(request)
print response.read()
urllib.urlencode 将字典编码为 application/x-www-form-urlencoded 格式。
自定义请求头
import urllib2
request = urllib2.Request("http://api.example.com/data")
request.add_header("User-Agent", "MyApp/1.0")
request.add_header("Accept", "application/json")
request.add_header("Authorization", "Bearer token123")
response = urllib2.urlopen(request)
print response.read()
模拟浏览器:
import urllib2
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
"Accept-Encoding": "gzip, deflate",
"Connection": "keep-alive",
}
request = urllib2.Request("http://example.com", headers=headers)
response = urllib2.urlopen(request)
处理异常
import urllib2
try:
response = urllib2.urlopen("http://example.com/notfound")
except urllib2.HTTPError as e:
print "HTTP Error:", e.code # 404
print "URL:", e.url
print "Message:", e.read()
except urllib2.URLError as e:
print "URL Error:", e.reason # 网络连接问题
HTTPError 是 URLError 的子类,专门处理 HTTP 错误状态码。
处理重定向
urllib2 默认自动处理 HTTP 重定向(3xx 状态码):
import urllib2
response = urllib2.urlopen("http://bit.ly/xxx") # 短链接
print response.geturl() # 最终跳转后的 URL
禁止重定向:
import urllib2
class NoRedirectHandler(urllib2.HTTPRedirectHandler):
def http_error_302(self, req, fp, code, msg, headers):
return fp
opener = urllib2.build_opener(NoRedirectHandler())
urllib2.install_opener(opener)
response = urllib2.urlopen("http://bit.ly/xxx")
print response.getcode() # 302
Cookie 处理
import urllib2
import cookielib
# 创建 Cookie 容器
cookie_jar = cookielib.CookieJar()
# 创建带 Cookie 支持的 opener
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookie_jar))
# 登录(服务器设置 Cookie)
login_data = urllib.urlencode({"user": "alice", "pass": "secret"})
opener.open("http://example.com/login", login_data)
# 后续请求自动携带 Cookie
response = opener.open("http://example.com/profile")
print response.read()
# 查看 Cookie
for cookie in cookie_jar:
print cookie.name, cookie.value
代理设置
import urllib2
proxy_handler = urllib2.ProxyHandler({
"http": "http://proxy.example.com:8080",
"https": "https://proxy.example.com:8080",
})
opener = urllib2.build_opener(proxy_handler)
response = opener.open("http://example.com")
print response.read()
超时设置
import urllib2
# 设置 10 秒超时
try:
response = urllib2.urlopen("http://slow.example.com", timeout=10)
except urllib2.URLError as e:
print "Timeout or error:", e
实际应用
下载文件:
import urllib2
def download_file(url, filename):
response = urllib2.urlopen(url)
with open(filename, "wb") as f:
f.write(response.read())
print "Downloaded:", filename
download_file("http://example.com/data.csv", "data.csv")
流式下载(大文件):
import urllib2
def download_large_file(url, filename, chunk_size=8192):
response = urllib2.urlopen(url)
with open(filename, "wb") as f:
while True:
chunk = response.read(chunk_size)
if not chunk:
break
f.write(chunk)
print "Downloaded:", filename
调用 REST API:
import urllib2
import json
def api_call(url, data=None, headers=None):
if headers is None:
headers = {}
headers["Content-Type"] = "application/json"
if data:
data = json.dumps(data)
request = urllib2.Request(url, data=data, headers=headers)
response = urllib2.urlopen(request)
return json.loads(response.read())
# GET
users = api_call("http://api.example.com/users")
# POST
new_user = api_call(
"http://api.example.com/users",
data={"name": "Alice", "email": "alice@example.com"}
)
与第三方库的选择
| 需求 | 推荐 |
|---|---|
| 简单请求 | urllib2(标准库) |
| 复杂 HTTP | requests(第三方) |
| 异步请求 | urllib3 / aiohttp |
requests 库(需 pip install requests)API 更简洁:
# urllib2 写法
import urllib2
request = urllib2.Request("http://api.example.com")
request.add_header("Authorization", "Bearer token")
response = urllib2.urlopen(request)
data = json.loads(response.read())
# requests 写法(更简洁)
import requests
response = requests.get("http://api.example.com", headers={"Authorization": "Bearer token"})
data = response.json()
Python 2.7 项目中,如果允许安装第三方库,requests 是更好的选择。如果只能用标准库,urllib2 完全够用。