将HTML转换为JSON的方法有多种,以下是一种常见的方法:
下面是一个示例代码(使用Python和BeautifulSoup库):
from bs4 import BeautifulSoup
import json
def html_to_json(html):
soup = BeautifulSoup(html, 'html.parser')
json_data = parse_node(soup)
return json.dumps(json_data)
def parse_node(node):
if node.name is None:
return node.string.strip()
else:
data = {}
data['tag'] = node.name
if node.attrs:
data['attrs'] = node.attrs
if node.contents:
data['children'] = [parse_node(child) for child in node.contents if child.name is not None or (child.string and child.string.strip())]
return data
# 示例HTML
html = '''
<html>
<head>
<title>Example</title>
</head>
<body>
<h1>Hello, World!</h1>
<p>This is an example HTML document.</p>
</body>
</html>
'''
json_data = html_to_json(html)
print(json_data)
输出结果为:
{
"tag": "html",
"children": [
{
"tag": "head",
"children": [
{
"tag": "title",
"children": [
"Example"
]
}
]
},
{
"tag": "body",
"children": [
{
"tag": "h1",
"children": [
"Hello, World!"
]
},
{
"tag": "p",
"children": [
"This is an example HTML document."
]
}
]
}
]
}
这个示例代码将HTML文档转换为了对应的JSON对象,并将其序列化为JSON字符串。你可以根据实际需求进行修改和扩展。
领取专属 10元无门槛券
手把手带您无忧上云