首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
社区首页 >问答首页 >从漂亮的lxml文件中提取文本

从漂亮的lxml文件中提取文本
EN

Stack Overflow用户
提问于 2020-10-01 18:26:23
回答 2查看 36关注 0票数 2

如何从div class="ember-view" id="ember760">开始提取此lxml中的文本。请帮帮忙。我尝试了下面的代码,但是文本没有被捕获。

我尝试过的代码

代码语言:javascript
代码运行次数:0
运行
复制
#soup is an beautifulsoup element

exp = soup.find('header', {'class': 'pv-profile-section__card-header'})
exp

lxml文件

代码语言:javascript
代码运行次数:0
运行
复制
<div class="pv-recommendation-entity__highlights">
<blockquote class="pv-recommendation-entity__text relative">
<div class="ember-view" id="ember760"> <span class="lt-line-clamp__line">I know Abc from Data Analysis training sessions with abc,</span>
<span class="lt-line-clamp__line">Abc
is an enthusiastic candidature in training sessions. He is an</span>
<span class="lt-line-clamp__line">extremely capable and dedicated entry-level Data Science Analyst.</span>
<span class="lt-line-clamp__line">He is enhancing Analytics skills by his enthusiasm for learning new</span>
<span class="lt-line-clamp__line lt-line-clamp__line--last">
      things, and has learnt new tools like R, SPSS, and Pytho<span class="lt-line-clamp__ellipsis">...
            <a aria-expanded="false" class="lt-line-clamp__more" data-test-line-clamp-show-more-button="true" href="#" id="line-clamp-show-more-button" role="button">See more</a>
</span></span>
<!-- --><span class="lt-line-clamp__ellipsis lt-line-clamp__ellipsis--dummy">... <a class="lt-line-clamp__more" href="#" role="button">See more</a></span></div>
</blockquote>
</div>
</li>
</ul>
<!-- --></div>
</div></div>

预期输出

代码语言:javascript
代码运行次数:0
运行
复制
I know Abc from Data Analysis training sessions with abc,
is an enthusiastic candidature in training sessions. He is an
extremely capable and dedicated entry-level Data Science Analyst.
He is enhancing Analytics skills by his enthusiasm for learning new
      things, and has learnt new tools like R, SPSS, and Pytho
EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2020-10-01 18:56:50

代码语言:javascript
代码运行次数:0
运行
复制
soup = BeautifulSoup(html, 'lxml')
lines = soup.select('div.ember-view > span.lt-line-clamp__line')
text = ''.join([line.find(text=True, recursive=False) for line in lines])
print(text)

给出了文本:

代码语言:javascript
代码运行次数:0
运行
复制
I know Abc from Data Analysis training sessions with abc,Abc
is an enthusiastic candidature in training sessions. He is anextremely capable and dedicated entry-level Data Science Analyst.He is enhancing Analytics skills by his enthusiasm for learning new
      things, and has learnt new tools like R, SPSS, and Pytho

“查看更多信息..”将被忽略

票数 1
EN

Stack Overflow用户

发布于 2020-10-01 18:54:35

您可以使用CSS选择器div#ember760来选择<div class="ember-view" id="ember760">.get_text()方法:

代码语言:javascript
代码运行次数:0
运行
复制
from bs4 import BeautifulSoup


txt = '''
<div class="pv-recommendation-entity__highlights">
<blockquote class="pv-recommendation-entity__text relative">
<div class="ember-view" id="ember760"> <span class="lt-line-clamp__line">I know Abc from Data Analysis training sessions with abc,</span>
<span class="lt-line-clamp__line">Abc
is an enthusiastic candidature in training sessions. He is an</span>
<span class="lt-line-clamp__line">extremely capable and dedicated entry-level Data Science Analyst.</span>
<span class="lt-line-clamp__line">He is enhancing Analytics skills by his enthusiasm for learning new</span>
<span class="lt-line-clamp__line lt-line-clamp__line--last">
      things, and has learnt new tools like R, SPSS, and Pytho<span class="lt-line-clamp__ellipsis">...
            <a aria-expanded="false" class="lt-line-clamp__more" data-test-line-clamp-show-more-button="true" href="#" id="line-clamp-show-more-button" role="button">See more</a>
</span></span>
<!-- --><span class="lt-line-clamp__ellipsis lt-line-clamp__ellipsis--dummy">... <a class="lt-line-clamp__more" href="#" role="button">See more</a></span></div>
</blockquote>
</div>
</li>
</ul>
<!-- --></div>
</div></div>'''

soup = BeautifulSoup(txt, 'lxml')

print(soup.select_one('div#ember760').get_text(strip=True, separator='\n'))

打印:

代码语言:javascript
代码运行次数:0
运行
复制
I know Abc from Data Analysis training sessions with abc,
Abc
is an enthusiastic candidature in training sessions. He is an
extremely capable and dedicated entry-level Data Science Analyst.
He is enhancing Analytics skills by his enthusiasm for learning new
things, and has learnt new tools like R, SPSS, and Pytho
...
See more
...
See more
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/64153485

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档