Nginx 反向代理 + Python 后端 (推荐)
提高网站的搜索引擎收录率并实现一键提交
robots.txt限制爬虫访问指定目录
Gitee(码云)作为国内主流的开源托管平台
MeiliSearch是一个快速、开源的全文搜索引擎-宝塔
Portainer是一个轻量级的Docker和Kubernetes管理界面-宝塔
如何配置强制内容重新验证?
css控制文字显示字数
robots.txt怎么看
Meta Robots 详解
如何优化meta参数提升排名
以下是使用 Nginx 结合其他技术获取百度热搜标题的完整解决方案:
# nginx.conf 配置
SERVER {
listen 80;
server_name your-domain.com;
location /baidu-hot {
proxy_pass http://localhost:5000;
proxy_set_header Host $host;
proxy_cache my_cache;
proxy_cache_valid 200 5m; # 缓存5分钟
}
}Python 爬虫脚本 (baidu_hot.py):
import requests
from bs4 import BeautifulSoup
import json
def get_baidu_hot():
url = 'https://top.baidu.com/board?tab=realtime'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) appleWebKit/537.36 (KHtml, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}
try:
response = requests.get(url, headers=headers, timeout=10)
soup = BeautifulSoup(response.text, 'html.parser')
# 提取热搜标题
titles = [item.get_text(strip=True) for item in
soup.select('.c-single-text-ellipsis')][:20]
# 提取热搜指数
indices = [item.get_text(strip=True) for item in
soup.select('.hot-index_1Bl1a')][:20]
return [{"title": t, "index": i} for t, i in zip(titles, indices)]
except Exception as e:
return [{"error": str(e)}]Flask 应用 (app.py):
from flask import Flask, jsonify
from baidu_hot import get_baidu_hot
app = Flask(__name__)
@app.route('/')
def hot_list():
return jsonify({
"source": "百度热搜",
"data": get_baidu_hot(),
"update_time": datetime.now().strftime("%Y-%m-%d %H:%M:%S")
})
if __name__ == '__main__':
app.run(port=5000)# 需要安装 OpenResty
http {
lua_package_path "/path/to/lua-resty-http/lib/?.lua;;";
server {
listen 80;
location /baidu-hot {
content_by_lua_block {
local http = require "resty.http"
local httpc = http.new()
-- 请求百度热搜
local res, err = httpc:request_uri("https://top.baidu.com/board", {
method = "GET",
headers = {
["User-Agent"] = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
}
})
if not res then
ngx.say('{"error": "', err, '"}')
return
end
-- 解析HTML
local titles = {}
for title in res.body:gmatch('class="ciilii_com c%-single%-text%-ellipsis">([^<]+)') do
table.insert(titles, title:gsub("^%s*(.-)%s*$", "%1"))
end
-- 输出JSON格式
ngx.header['Content-Type'] = 'application/json'
ngx.say('{"baidu_hot": ', require("cjson").encode(titles), '}')
}
}
}
}{
"source": "百度热搜",
"update_time": "2025-08-12 14:30:45",
"data": [
{"title": "神舟十八号载人飞船返回舱成功着陆", "index": "485万"},
{"title": "中国科学家发现新型超导材料", "index": "432万"},
{"title": "2025年新能源汽车补贴政策公布", "index": "398万"},
{"title": "全球首条量子通信干线正式商用", "index": "376万"},
{"title": "国际油价突破100美元大关", "index": "354万"},
{"title": "某明星演唱会门票3秒售罄", "index": "321万"},
{"title": "教育部推出AI教育新课程标准", "index": "298万"},
{"title": "新冠新型变异株引发全球关注", "index": "276万"},
{"title": "亚运会筹备进入最后冲刺阶段", "index": "254万"},
{"title": "某科技公司发布革命性AR眼镜", "index": "231万"}
]
}python app.pysudo nginx -s reloadhttp://your-server/baidu-hot提示:对于生产环境,建议添加身份验证和速率限制,例如:
location /baidu-hot {
auth_basic "Restricted";
auth_basic_user_file /etc/nginx/.htpasswd;
limit_req zone=one burst=10;
...其他配置
}
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。