使用Scala Spark将JSON数组拆分成多个JSON可以通过以下步骤实现:
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
val spark = SparkSession.builder()
.appName("JSON Array Split")
.getOrCreate()
val schema = ArrayType(StructType(Seq(
StructField("id", StringType),
StructField("name", StringType),
StructField("age", IntegerType)
)))
val jsonFile = spark.read.text("path/to/json/file.json")
val jsonArray = jsonFile.select(from_json(col("value"), schema).as("jsonArray"))
val explodedDF = jsonArray.select(explode(col("jsonArray")).as("json"))
val resultDF = explodedDF.select(
col("json.id").as("id"),
col("json.name").as("name"),
col("json.age").as("age")
)
resultDF.write.json("path/to/output/file.json")
完整代码示例:
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
val spark = SparkSession.builder()
.appName("JSON Array Split")
.getOrCreate()
val schema = ArrayType(StructType(Seq(
StructField("id", StringType),
StructField("name", StringType),
StructField("age", IntegerType)
)))
val jsonFile = spark.read.text("path/to/json/file.json")
val jsonArray = jsonFile.select(from_json(col("value"), schema).as("jsonArray"))
val explodedDF = jsonArray.select(explode(col("jsonArray")).as("json"))
val resultDF = explodedDF.select(
col("json.id").as("id"),
col("json.name").as("name"),
col("json.age").as("age")
)
resultDF.write.json("path/to/output/file.json")
这个方法可以将一个包含JSON数组的文件拆分成多个JSON,并提取出每个JSON的字段。适用于需要对JSON数组进行进一步处理或分析的场景。
腾讯云相关产品和产品介绍链接地址:
领取专属 10元无门槛券
手把手带您无忧上云