考虑到字符串不是通用格式的,而是来自UDF函数的输出,我需要一些帮助来处理字符串到Dict。
来自PySpark UDF的返回如下所示:
"{list=[{a=1}, {a=2}, {a=3}]}"我需要将其转换为具有以下结构的python字典:
{
"list": [
{"a": 1}
{"a": 2}
{"a": 3}
]
}所以我可以访问它的值,就像
dict["list"][1]["a"]我已经试过用:
有人能帮帮我吗?
作为如何生成未解析字符串的示例:
@udf()
def execute_method():
return {"list": [{"a":1},{"b":1}{"c":1}]}
df_result = df_source.withColumn("result", execute_method())发布于 2021-03-19 20:59:52
至少您需要用=替换:,用双引号环绕键:
import json
import re
string = "{list=[{a=1}, {a=2}, {a=3}]}"
fixed_string = re.sub(r'(\w+)=', r'"\1":', string)
print(type(fixed_string), fixed_string)
parsed = json.loads(fixed_string)
print(type(parsed), parsed)输出
<class 'str'> {"list":[{"a":1}, {"a":2}, {"a":3}]}
<class 'dict'> {'list': [{'a': 1}, {'a': 2}, {'a': 3}]}发布于 2021-03-19 21:32:19
试试这个:
import re
import json
data="{list=[{a=1}, {a=2}, {a=3}]}"
data=data.replace('=',':')
pattern=[e.group() for e in re.finditer('[a-z]+', data, flags=re.IGNORECASE)]
for e in set(pattern):
data=data.replace(e,"\""+e+"\"")
print(json.loads(data))https://stackoverflow.com/questions/66715415
复制相似问题