问题是我想让这段代码找到"input.html“文件中的所有链接,但它只找到并显示了第一个链接。代码如下:
import codecs
from bs4 import BeautifulSoup
fd = codecs.open('input.html', 'r')
def clean(html):
soup = BeautifulSoup(html, "lxml")
for link in soup.find_all('a'):
link.extract()
text = link.get('href')
return text
发布于 2021-11-01 23:34:07
可能是:
import codecs
from bs4 import BeautifulSoup
fd = codecs.open('input.html', 'r')
text = []
def clean(html):
soup = BeautifulSoup(html, "lxml")
for link in soup.find_all('a'):
link.extract()
text.append(link.get('href'))
return text
发布于 2021-11-01 23:29:44
您在循环的末尾返回文本,该循环只迭代一次。执行以下操作:
def clean(html):
soup = BeautifulSoup(html, "lxml")
links = []
for link in soup.find_all('a'):
link.extract()
text = link.get('href')
links.append(text)
return links
此外,您可以使用简单的列表理解来代替函数:
soup = BeautifulSoup(html, "lxml")
links = [link.extract().get('href') for link in soup.find_all('a')]
发布于 2021-11-02 00:06:35
似乎你在循环的末尾得到了一个链接。您可以使用以下命令:
def clean(html):
soup = BeautifulSoup(html, 'html.parser')
hrefs = soup.find_all('a')
links = []
if hrefs:
for href in hrefs:
href.extract()
link = href.get('href')
links.append(link)
return links
https://stackoverflow.com/questions/69803828
复制相似问题