我是第一次使用CSS选择器抓取数据。
而锚固内容的抓取存在问题。
这是我的代码:
import requests
from bs4 import BeautifulSoup
url = "https://weworkremotely.com/remote-jobs/search?utf8=✓&term=ruby"
wwr_result = requests.get(url)
wwr_soup = BeautifulSoup(wwr_result.text, "html.parser")
posts = wwr_soup.find_all("li", {"class": "feature"})
link = post.select("#category-2 > article > ul > li:nth-child(1) > a[href]")
title = post.find("span", {"class": "title"}).get_text()
company = post.find("span", {"class": "company"}).get_text()
location = post.find("span", {"class": "region company"}).get_text()
link = post.select("#category-2 > article > ul > li:nth-child(1) > a[href]")
print {"title": title, "company": company, "location": location, "link":f"https://weworkremotely.com/{link}"}我想废除锚的内容,使每个帖子的链接。所以我让阿瑞夫。
但它不起作用,但所有子类别的内容都报废了。
我怎么才能把锚的内容换掉呢?
发布于 2022-02-01 05:48:41
假设您正确地从列出的所有作业中选择了感兴趣的作业,则需要一个循环,然后使用子字符串-jobs (即循环期间的post.select_one('[href*=-jobs]' )提取第一个href属性:
import requests
from bs4 import BeautifulSoup
url = "https://weworkremotely.com/remote-jobs/search?utf8=✓&term=ruby"
wwr_result = requests.get(url)
wwr_soup = BeautifulSoup(wwr_result.text, "html.parser")
posts = wwr_soup.find_all("li", {"class": "feature"})
for post in posts:
print('https://weworkremotely.com' + post.select_one('a[href*=-jobs]')['href'])若要将页面上的所有列表切换到:
posts = wwr_soup.select('li:has(.tooltip)')https://stackoverflow.com/questions/70934334
复制相似问题