Python读取嵌入代码,提取URL并将URL标题写入新的CSV文件的过程可以通过以下步骤完成:
import re
import csv
import requests
from bs4 import BeautifulSoup
def extract_url_title(embedded_code):
urls = re.findall(r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', embedded_code)
titles = []
for url in urls:
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
title = soup.title.string if soup.title else ''
titles.append(title)
return urls, titles
embedded_code_file = 'embedded_code.txt'
output_file = 'output.csv'
with open(embedded_code_file, 'r') as file:
embedded_code = file.read()
urls, titles = extract_url_title(embedded_code)
with open(output_file, 'w', newline='') as file:
writer = csv.writer(file)
writer.writerow(['URL', 'Title'])
for url, title in zip(urls, titles):
writer.writerow([url, title])
完整的Python代码如下:
import re
import csv
import requests
from bs4 import BeautifulSoup
def extract_url_title(embedded_code):
urls = re.findall(r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', embedded_code)
titles = []
for url in urls:
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
title = soup.title.string if soup.title else ''
titles.append(title)
return urls, titles
embedded_code_file = 'embedded_code.txt'
output_file = 'output.csv'
with open(embedded_code_file, 'r') as file:
embedded_code = file.read()
urls, titles = extract_url_title(embedded_code)
with open(output_file, 'w', newline='') as file:
writer = csv.writer(file)
writer.writerow(['URL', 'Title'])
for url, title in zip(urls, titles):
writer.writerow([url, title])
这段代码通过正则表达式提取嵌入代码中的URL,然后使用requests库发送HTTP请求获取网页内容。使用BeautifulSoup库解析网页内容,提取标题。最后,将URL和标题写入CSV文件中。
推荐的腾讯云相关产品:腾讯云对象存储(COS),用于存储和管理文件、图片、视频等静态资源。产品介绍链接地址:https://cloud.tencent.com/product/cos
领取专属 10元无门槛券
手把手带您无忧上云