首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >在一行中检索html文件的每个<div>

在一行中检索html文件的每个<div>
EN

Stack Overflow用户
提问于 2020-04-16 06:07:48
回答 2查看 61关注 0票数 2
代码语言:javascript
运行
复制
<div id="div_1">
    <p class="keywords">
        <strong> Those are the main keywords </strong>
        <ol>
            <li>Decentralization</li>
            <li>Planning</li>
        </ol>
    </p> 
</div>
<div id="div_2">
<p class="keywords">
    <strong>This is the first paragraph of the second div </strong>
    <strong>This is the second paragraph of the second div </strong>
</p> 
</div>
<div id="div_3">
<p> This is the first paragraph of the second div </p> 
</div>

代码语言:javascript
运行
复制
Those are the main keywords Decentralization Planning
This is the first paragraph of the second div This is the second paragraph of the second div
This is the first paragraph of the third div

这是我的代码:

代码语言:javascript
运行
复制
soup = BeautifulSoup (open(document, encoding = "utf8"), "html.parser")
myDivs = soup.findAll("div", id = re.compile("^div_"))
for div in myDivs:
    txt = div.text + "\n"
    print (txt)

这会将< div >的文本返回给我,但它的每个标记(< p>、)都在一行中

你知道我该怎么做吗?

EN

回答 2

Stack Overflow用户

发布于 2020-04-16 07:01:17

Yap在div > P上运行for循环

代码语言:javascript
运行
复制
<html>
	<head></head>
		<body>
			<div id="div_1">
				<p class="keywords">
					<strong> Those are the main keywords </strong>
					<ol>
						<li>Decentralization</li>
						<li>Planning</li>
					</ol>
				</p> 
			</div>
			
			
			<div id="div_2">
				<p class="keywords">
					<strong>This is the first paragraph of the second div </strong>
					<strong>This is the second paragraph of the second div </strong>
				</p> 
			</div>
			
			<div id="div_3">
				<p> This is the first paragraph of the second div </p> 
			</div>
		</body>
</html>

代码语言:javascript
运行
复制
from bs4 import BeautifulSoup

url = r"D:\Temp\example.html"

with open(url, "r") as page:
    contents = page.read()
    html = BeautifulSoup(contents, 'html.parser')

    html_body = html.find('body')
    elements = html.find_all('div')

    for div in elements:
        p = div.find_all('p')
        text = [i.text for i in p]
        print(text)

票数 0
EN

Stack Overflow用户

发布于 2020-04-16 07:10:27

代码语言:javascript
运行
复制
import re
from bs4 import BeautifulSoup

html = """
<div id="div_1">
    <p class="keywords">
        <strong> Those are the main keywords </strong>
        <ol>
            <li>Decentralization</li>
            <li>Planning</li>
        </ol>
    </p> 
</div>
<div id="div_2">
<p class="keywords">
    <strong>This is the first paragraph of the second div </strong>
    <strong>This is the second paragraph of the second div </strong>
</p> 
</div>
<div id="div_3">
<p> This is the first paragraph of the second div </p> 
</div>
"""

soup = BeautifulSoup(html, 'html.parser')

for item in soup.findAll("div", id=re.compile("^div_")):
    target = [a.get_text(strip=True, separator=" ") for a in item.findAll("p")]
    print(*target)

输出:

代码语言:javascript
运行
复制
Those are the main keywords Decentralization Planning
This is the first paragraph of the second div This is the second paragraph of the second div
This is the first paragraph of the second div
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/61239444

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档