BeautifulSoup:有没有办法设置find_all()方法的起始点？

BeautifulSoup 是一个用于解析 HTML 和 XML 文档的 Python 库。它提供了一种简单的方法来提取页面中的数据。find_all() 方法是 BeautifulSoup 中非常常用的一个方法，用于查找文档中所有匹配的标签。

基础概念

find_all() 方法的基本语法如下：

soup.find_all(name, attrs, recursive, text, **kwargs)

name：标签名，如 'div', 'a' 等。
attrs：一个字典，用于查找具有指定属性的标签。
recursive：布尔值，表示是否递归查找子标签，默认为 True。
text：查找包含指定文本的标签。
**kwargs：其他属性，如 class_, id 等。

类型

根据标签名查找。
根据属性查找。
根据文本内容查找。
组合查找条件。

应用场景

网页数据抓取。
数据清洗和分析。
自动化测试。

设置 `find_all()` 方法的起始点

BeautifulSoup 的 find_all() 方法本身没有直接提供设置起始点的参数。但是，你可以通过以下几种方法间接实现类似的效果：

1. 使用 `find()` 方法获取起始标签

你可以先使用 find() 方法找到一个起始标签，然后从这个标签开始查找其子标签。

from bs4 import BeautifulSoup

html = """
<html>
<head><title>Example</title></head>
<body>
    <div class="container">
        <p>Paragraph 1</p>
        <p>Paragraph 2</p>
    </div>
    <div class="container">
        <p>Paragraph 3</p>
        <p>Paragraph 4</p>
    </div>
</body>
</html>
"""

soup = BeautifulSoup(html, 'html.parser')
start_div = soup.find('div', class_='container')

# 从 start_div 开始查找所有 p 标签
paragraphs = start_div.find_all('p')
for p in paragraphs:
    print(p.text)

2. 使用 CSS 选择器

BeautifulSoup 支持使用 CSS 选择器进行查找，你可以通过选择器指定起始点。

from bs4 import BeautifulSoup

html = """
<html>
<head><title>Example</title></head>
<body>
    <div class="container">
        <p>Paragraph 1</p>
        <p>Paragraph 2</p>
    </div>
    <div class="container">
        <p>Paragraph 3</p>
        <p>Paragraph 4</p>
    </div>
</body>
</html>
"""

soup = BeautifulSoup(html, 'html.parser')

# 使用 CSS 选择器从第一个 .container 开始查找所有 p 标签
paragraphs = soup.select('.container:first-child p')
for p in paragraphs:
    print(p.text)