日安!我只需要href="this-value“在h4块中。糟糕的是,这个href没有任何类/ is。在html中,这个块是这样的:
<h4 class="article_title_list" itemprop="name">
<a href="10-deutsche-pokemon-karten-sparpack">10 deutsche Pokemon Karten - mit Rare oder Holo/EX/GX - wie ein Booster!</a></h4>Python代码:
page = requests.get(product_fetch_url, headers=headers)
soup = BeautifulSoup(page.content, "html.parser")
product_fetch_url_class = "article_title_list"
product_fetch_url_html = "h4"
find_urls = soup.find_all('{0}'.format(product_fetch_url_html), class_='{0}'.format(product_fetch_url_class))
for row in find_urls:
string = row
print("Produkt: {0}".format(string))
html = BeautifulSoup(string, "html.parser")
for a in html.find('a', href=True):
print("Produkt URL-Slug: {0}".format(a['href']))输出:
Produkt: <h4 class="article_title_list" itemprop="name">
<a href="10-deutsche-pokemon-karten-sparpack">10 deutsche Pokemon Karten - mit Rare oder Holo/EX/GX - wie ein Booster!</a></h4>
Traceback (most recent call last):
File "/usr/share/nginx/html/mp-masterdb/pokefri.de/scraper.py", line 45, in <module>
fetch_urls()
File "/usr/share/nginx/html/mp-masterdb/pokefri.de/scraper.py", line 38, in fetch_urls
html = BeautifulSoup(string, "html.parser")
File "/usr/lib/python3.10/site-packages/bs4/__init__.py", line 312, in __init__
markup = markup.read()
TypeError: 'NoneType' object is not callable异常输出:
Produkt: <h4 class="article_title_list" itemprop="name"><a href="10-deutsche-pokemon-karten-sparpack">10 deutsche Pokemon Karten - mit Rare oder Holo/EX/GX - wie ein Booster!</a></h4>
Produkt Url-slug: 10-deutsche-pokemon-karten-sparpack有什么办法可以更早地用BeautifulSoup而不是regex来解决这个问题呢?
发布于 2022-10-07 10:47:27
如果您只是尝试获取链接,请选择更具体的元素。
for a in soup.select('h4>a'):
print(a.get('href'))或者如果你喜欢每一行:
for e in soup.select('#product-list > div'):
print(e.h4.a.get('href'))示例
import requests
from bs4 import BeautifulSoup
soup = BeautifulSoup(requests.get('https://www.lotticards.de/pokemon-sammelkarten').text)
for e in soup.select('#product-list > div'):
print(e.h4.a.get('href'))输出
10-deutsche-pokemon-karten-sparpack
Glaenzendes-Schicksal-Booster-Deutsch
Pokemon-Celebrations-Booster-Packung-Deutsch
Pikachu-V-Kollektion-Glaenzendes-Schicksal-Deutsch
Verborgenes-Schicksal-Top-Trainer-Box
Sun-Moon-Tag-Team-All-Stars-GX-High-Class-Pack-SM12a-Display-Japanisch
Champions-Path-Elite-Trainer-Box-Englisch
Glaenzendes-Schicksal-Mini-Tin-Set-Alle-5-Motive-Deutsch
...或者是list comprehension,并基于itemprop="url"
[a.get('content') for a in soup.select('#product-list [itemprop="url"]')]输出:
['https://www.lotticards.de10-deutsche-pokemon-karten-sparpack',
'https://www.lotticards.deGlaenzendes-Schicksal-Booster-Deutsch',
'https://www.lotticards.dePokemon-Celebrations-Booster-Packung-Deutsch',
'https://www.lotticards.dePikachu-V-Kollektion-Glaenzendes-Schicksal-Deutsch',
'https://www.lotticards.deVerborgenes-Schicksal-Top-Trainer-Box',
'https://www.lotticards.deSun-Moon-Tag-Team-All-Stars-GX-High-Class-Pack-SM12a-Display-Japanisch',
'https://www.lotticards.deChampions-Path-Elite-Trainer-Box-Englisch',
'https://www.lotticards.deGlaenzendes-Schicksal-Mini-Tin-Set-Alle-5-Motive-Deutsch',
'https://www.lotticards.deShining-Fates-Elite-Trainer-Box-Englisch',
'https://www.lotticards.deHidden-Fates-Elite-Trainer-Box-Reprint-Januar-2021',
'https://www.lotticards.deVMAX-Climax-s8b-Display-Japanisch',
'https://www.lotticards.deSonne-Mond-Ultra-Prisma-Booster-Deutsch',
'https://www.lotticards.desonne-mond-2-stunde-der-waechter-booster-deutsch-kaufen',
'https://www.lotticards.deSchwert-Schild-Kampfstile-Display-Deutsch',
'https://www.lotticards.dePokemon-Celebrations-Booster-Pack-Englisch',...]https://stackoverflow.com/questions/73985884
复制相似问题