携程作为中国领先的在线旅行服务平台,提供了丰富的机票预订服务。其国际机票价格受多种因素影响,包括季节、节假日、航班时刻等。通过抓取携程国际机票价格数据,我们可以进行价格趋势分析、性价比评估以及旅行规划建议等。
本项目的目标是:
携程国际机票页面(如 **<font style="color:rgb(64, 64, 64);background-color:rgb(236, 236, 236);">flights.ctrip.com</font>**
)通常采用动态加载,数据可能通过AJAX请求返回JSON格式。我们需要:
如果携程的机票数据可以直接通过HTML获取(部分旧版页面适用),可以使用 **<font style="color:rgb(64, 64, 64);background-color:rgb(236, 236, 236);">requests + Beautifu</font>**
import requests
from bs4 import BeautifulSoup
import pandas as pd
def scrape_ctrip_flights(departure, arrival, date):
url = f"https://flights.ctrip.com/international/{departure}-{arrival}?depdate={date}"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
flights = []
for flight in soup.select('.flight-item'):
airline = flight.select_one('.airline-name').text.strip()
departure_time = flight.select_one('.depart-time').text.strip()
arrival_time = flight.select_one('.arrival-time').text.strip()
price = flight.select_one('.price').text.strip()
flights.append({
'Airline': airline,
'DepartureTime': departure_time,
'ArrivalTime': arrival_time,
'Price': price
})
return pd.DataFrame(flights)
# 示例:抓取上海到东京的2023-12-01航班
df = scrape_ctrip_flights('SHA', 'TYO', '2023-12-01')
print(df.head())
如果数据是动态加载的,需使用 **<font style="color:rgb(64, 64, 64);background-color:rgb(236, 236, 236);">Selenium</font>**
模拟浏览器操作:
from selenium import webdriver
from selenium.webdriver.common.by import By
import time
def scrape_ctrip_dynamic(departure, arrival, date):
driver = webdriver.Chrome() # 需安装ChromeDriver
url = f"https://flights.ctrip.com/international/{departure}-{arrival}?depdate={date}"
driver.get(url)
time.sleep(5) # 等待页面加载
flights = []
for flight in driver.find_elements(By.CSS_SELECTOR, '.flight-item'):
airline = flight.find_element(By.CSS_SELECTOR, '.airline-name').text
departure_time = flight.find_element(By.CSS_SELECTOR, '.depart-time').text
arrival_time = flight.find_element(By.CSS_SELECTOR, '.arrival-time').text
price = flight.find_element(By.CSS_SELECTOR, '.price').text
flights.append({
'Airline': airline,
'DepartureTime': departure_time,
'ArrivalTime': arrival_time,
'Price': price
})
driver.quit()
return pd.DataFrame(flights)
# 示例:动态抓取数据
df = scrape_ctrip_dynamic('SHA', 'TYO', '2023-12-01')
print(df.head())
携程可能有反爬机制,需采取以下措施:
示例(使用 **<font style="color:rgb(64, 64, 64);background-color:rgb(236, 236, 236);">fake_useragent</font>**
和代理):
from fake_useragent import UserAgent
import requests
# 初始化UserAgent对象
ua = UserAgent()
# 设置请求头
headers = {
"User-Agent": ua.random,
"Accept-Language": "en-US,en;q=0.9"
}
# 设置代理信息
proxyHost = "www.16yun.cn"
proxyPort = "5445"
proxyUser = "16QMSOML"
proxyPass = "280651"
# 构造代理服务器的认证信息
proxy_auth = f"{proxyUser}:{proxyPass}"
# 构造代理服务器的URL
proxies = {
"http": f"http://{proxy_auth}@{proxyHost}:{proxyPort}",
"https": f"https://{proxy_auth}@{proxyHost}:{proxyPort}"
}
# 目标URL
url = "https://example.com" # 替换为你的目标URL
# 发送请求
response = requests.get(url, headers=headers, proxies=proxies)
# 打印响应内容
print(response.text)
# 转换价格格式(如 "¥2,500" → 2500)
df['Price'] = df['Price'].str.replace('¥', '').str.replace(',', '').astype(float)
# 按价格排序
df_sorted = df.sort_values('Price')
print(df_sorted.head())
import matplotlib.pyplot as plt
import seaborn as sns
plt.figure(figsize=(10, 6))
sns.histplot(df['Price'], bins=20, kde=True)
plt.title('International Flight Price Distribution (Shanghai to Tokyo)')
plt.xlabel('Price (¥)')
plt.ylabel('Frequency')
plt.show()
plt.figure(figsize=(12, 6))
sns.boxplot(x='Airline', y='Price', data=df)
plt.xticks(rotation=45)
plt.title('Flight Price Comparison by Airline')
plt.show()
本文介绍了如何使用Python爬取携程国际机票数据,并进行分析与可视化。关键点包括:
**<font style="color:rgb(64, 64, 64);background-color:rgb(236, 236, 236);">Requests</font>**
或 **<font style="color:rgb(64, 64, 64);background-color:rgb(236, 236, 236);">Selenium</font>**
抓取数据。**<font style="color:rgb(64, 64, 64);background-color:rgb(236, 236, 236);">Pandas</font>**
和 **<font style="color:rgb(64, 64, 64);background-color:rgb(236, 236, 236);">Matplotlib</font>**
进行价格趋势分析。扫码关注腾讯云开发者
领取腾讯云代金券
Copyright © 2013 - 2025 Tencent Cloud. All Rights Reserved. 腾讯云 版权所有
深圳市腾讯计算机系统有限公司 ICP备案/许可证号:粤B2-20090059 深公网安备号 44030502008569
腾讯云计算(北京)有限责任公司 京ICP证150476号 | 京ICP备11018762号 | 京公网安备号11010802020287
Copyright © 2013 - 2025 Tencent Cloud.
All Rights Reserved. 腾讯云 版权所有