如何根据id将spark dataframe列的所有唯一值合并成一行,并将该列转换为json格式。
输入示例:
+---+------+-----------+
|id |gender|banner_desc|
+---+------+-----------+
|123|male  |banner1    |
|123|male  |banner2    |
|123|male  |banner3    |
|124|female|banner1    |
|124|female|banner2    |
|125|male  |banner1    |
|126|female|banner3    |
+---+------+-----------+输出示例:
+---+------+-------------------------------------------------------------+
|id |gender|banner_desc                                                  | 
+---+------+-------------------------------------------------------------+
|123|male  |[{"name":"banner1"}, {"name":"banner2"}, {"name":"banner3"}] |
|124|female|[{"name":"banner1"}, {"name":"banner2"}]                     |
|125|male  |[{"name":"banner1"}]                                         |
|126|female|[{"name":"banner3"}]                                         |
+---+------+-------------------------------------------------------------+发布于 2021-04-08 22:26:40
您可以使用to_json从collect_list(struct())获取JSON字符串
val result = df.groupBy(
    "id","gender"
).agg(
    to_json(
        collect_list(
            struct(col("banner_desc").as("name"))
        )
    ).as("banner_desc")
)
result.show(false)
+---+------+----------------------------------------------------------+
|id |gender|banner_desc                                               |
+---+------+----------------------------------------------------------+
|124|female|[{"name":"banner1"},{"name":"banner2"}]                   |
|126|female|[{"name":"banner3"}]                                      |
|125|male  |[{"name":"banner1"}]                                      |
|123|male  |[{"name":"banner1"},{"name":"banner2"},{"name":"banner3"}]|
+---+------+----------------------------------------------------------+https://stackoverflow.com/questions/67006015
复制相似问题