我有一个数据帧,其中有2个json列。我需要更新基于j1列的j2列。如果j2列的元素为空,则从j1列中选取元素值。j1为JSON字符串,j2为JSON数组。
输入数据帧,
+---------------------------+---------------------------------------+
| j1 | j2 |
+---------------------------+---------------------------------------+
|{"A": "50", "B": "30"} | [{"A": null}, {"A": "20", "B": null}] |
+---------------------------+---------------------------------------+
输出
+------------------------+---------------------------------------+
| j1 | j2 |
+------------------------+---------------------------------------+
|{"A": "50", "B": "30"} | [{"A":"50"}, {"A": "20", "B": "30"}] |
+------------------------+---------------------------------------+
spark版本: 2.4.0语言: scala
发布于 2021-08-04 13:39:35
您可以使用regexp_replace函数来获取所需的结果。
val spark = SparkSession.builder().master("local[*]").getOrCreate()
import spark.implicits._
spark.sparkContext.setLogLevel("ERROR")
val inDF = Seq(("""{"A": "50", "B": "30"} """, """[{"A": null}, {"A": "20", "B": null}]""")).toDF("j1", "j2")
inDF.withColumn("A", get_json_object('j1, "$.A"))
.withColumn("B", get_json_object('j1, "$.B"))
.withColumn("j2", regexp_replace('j2, lit("\"A\": null"),
concat(lit("\"A\": "), lit("\""), 'A, lit("\""))))
.withColumn("j2", regexp_replace('j2, lit("\"B\": null"),
concat(lit("\"B\": "), lit("\""), 'B, lit("\""))))
.drop("A", "B")
.show(false)
+-----------------------+-------------------------------------+
|j1 |j2 |
+-----------------------+-------------------------------------+
|{"A": "50", "B": "30"} |[{"A": "50"}, {"A": "20", "B": "30"}]|
+-----------------------+-------------------------------------+
https://stackoverflow.com/questions/68649073
复制相似问题