python regex -查找具有特定类的html标记

Python正则表达式（regex）是一种强大的工具，用于在文本中查找、匹配和操作具有特定模式的字符串。在HTML标记中查找具有特定类的标记时，可以使用Python的regex来实现。

答案内容：正则表达式是一种用于匹配和操作字符串的模式匹配工具。在Python中，可以使用re模块来使用正则表达式。对于查找具有特定类的HTML标记，可以使用以下正则表达式：

import re

html = """
<html>
<head>
<title>Example</title>
</head>
<body>
<div class="class1">Content 1</div>
<div class="class2">Content 2</div>
<div class="class1 class2">Content 3</div>
<div class="class3">Content 4</div>
</body>
</html>
"""

pattern = r'<div\s+class="([^"]*\bclass1\b[^"]*)"[^>]*>(.*?)</div>'
matches = re.findall(pattern, html)

for match in matches:
    class_attr = match[0]
    content = match[1]
    print(f"Class attribute: {class_attr}")
    print(f"Content: {content}")
    print("")

上述代码中，我们使用了正则表达式<div\s+class="([^"]*\bclass1\b[^"]*)"[^>]*>(.*?)</div>来匹配具有"class1"类的div标记。解释一下这个正则表达式的含义：