Term 1: (Eastern division): Tuesday 29 January — Friday 12 April
Term 1: (Western d">
我希望抓取以下html的内容,并希望捕获每个h2
直到下一个h2用上漂亮的汤。这个是可能的吗?
<hr /><h2>California</h2>
<p><strong>Term 1:</strong> (Eastern division): Tuesday 29 January —
Friday
12 April</p>
<p><strong>Term 1:</strong> (Western division): Tuesday 5 February —
Friday
12 April</p>
<p><strong>Term 2</strong><strong>:</strong> Monday 29 April — Friday 5
July</p>
<p><strong>Term 3:</strong> Monday 22 July — Friday 27 September</p>
<p><strong>Term 4:</strong> Monday 14 October — Friday 20 December</p>
<hr /><h2>New York</h2>
<p><strong>Term 1</strong>: Tuesday 29 January — Friday 12 April</p>
<p><strong>Term 2:</strong> Monday 29 April — Friday 5 July</p>
<p><strong>Term 3</strong>: Monday 22 July — Friday 27 September</p>
<p><strong>Term 4</strong>: Monday 14 October — Friday 13 December</p>
</pre>
soup = BeautifulSoup(page.text, 'html.parser')
for each_div in soup.findAll(['h2', 'p']):
myval = str(each_div.prettify("ascii"))
我希望获得每个州的以下结果
发布于 2019-04-04 03:20:02
这里有一些我认为你应该能够使用的东西。list capture
跟踪您想要用于每个标题的元素。这段代码使用find_next_siblings
method来获取树中的所有兄弟元素,并对它们进行迭代。当它到达另一个h2
标记时,它会中断。
soup = BeautifulSoup(content, 'html.parser')
for head in soup.find_all('h2'):
capture = [head]
for sibling in head.find_next_siblings():
if sibling.name == 'h2':
break
capture += [sibling]
我只需要改变你存储捕获的标签的方式。
编辑:忘记提到content
是您问题中提供的html字符串。
https://stackoverflow.com/questions/55507002
复制相似问题