本教程使用Python语言,需提前安装Pip3 or Pip,例如Linux类的,请在命令行内输入
sudo apt install python3-pip
一条命令(临时换源):
sudo pip install requests -i https://mirrors.aliyun.com/pypi/simple/
Pypi包源官网: Requests 在这可以看到有关这个第三方库的一切。
另一个就是PIP命令行安装,很简单,一条命令。
由于包源在国外,所以访问速度感人,可以先Pip换源,再试。(后面说)
pip install requests
加速:
pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/
本人无钱购买Mac,所以没有钞能力去完成这个教程,各位有钱人施舍施舍?
import requests
data = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36'
} # 向服务器传递的数据
response = requests.get('https://bing.com',data=data).text # text函数:获取对应网址的源代码
print(response)
User-Agent就像是浏览器的身份证,如果是Requests默认的ua的话,对应服务器会拒绝你的爬虫请求,简而言之,拿不到数据。
# -*- coding:utf-8 -*-
import requests
import json
host = "http://httpbin.org/"
endpoint = "post"
url = ''.join([host,endpoint])
data = {'key1':'value1','key2':'value2'}
r = requests.post(url,data=data)
#response = r.json()
print (r.text)
# -*- coding:utf-8 -*-
import requests
import json
host = "http://httpbin.org/"
endpoint = "post"
url = ''.join([host,endpoint])
#多文件上传
files = [
('file1',('test.txt',open('test.txt', 'rb'))),
('file2', ('test.png', open('test.png', 'rb')))
]
r = requests.post(url,files=files)
print (r.text)
import requests
import json
url_put = "http://127.0.0.1:8080/"
headers_put = {
'Content-Type': "application/json"
}
param = {
'myObjectField': 'hello'
}
payload = json.dumps(param)
response_put = requests.put(url, data=payload, headers=headers_put)
# !/usr/bin/python3
# -*- coding:utf-8 -*-
"""
@author:
@file: 梨视频爬虫.py
@time: 2021/7/11 21:48
@desc:
"""
import requests
url = "https://www.pearvideo.com/video_1731260"
headers = {
"user-agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36",
"referer": url,
}
contId = url.split("_")[1]
videoStatusUrl = f"https://video.pearvideo.com/mp4/third/20210603/cont-{contId}-15316010-202041-hd.mp4"
# 'https://video.pearvideo.com/mp4/third/20210603/cont-1731260-15316010-202041-hd.mp4'
resp = requests.get(videoStatusUrl, headers=headers).content
with open("video.mp4", mode="wb") as file:
file.write(resp)
防盗链
,即字典中的referer
,如果没有,则获取不到视频数据。.content
是获取网页的二进制数据
。这库是真的好用,方便。比Python标准库urllib好N倍不止,平时爬虫爬个电影数据就好,不要太过分。