错误类型:
CSV data source does not support struct<type:tinyint,size:int,indices:array<int>,values:array<double>> data type.
predictPredict.select("user_id", "probability", "label").coalesce(1)
.write.format("com.databricks.spark.csv").mode("overwrite")
.option("header", "true").option("delimiter","\t").option("nullValue", Const.NULL)
.save(fileName.predictResultFile + day)
predictPredict选择probability列保存会出现'`probability`' is of struct<type:tinyint,size:int,indices:array<int>,values:array<double>> type 这个错误, 因为是DenseVector不可以直接报保存到csv文件, 可以有下面两种解决方法: (主要思想是选择DenseVector中预测为1的那一列,类型为double)
/*
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder().config("spark.debug.maxToStringFields", 500).enableHiveSupport.appName("QDSpark Pipeline").getOrCreate()
import spark.implicits._
val probabilityDataFrame = predictPredict.select("user_id", "probability", "label").rdd.map( row => (row.getInt(0), row.getAs[DenseVector](1)(1), row.getDouble(2)) ).toDF
probabilityDataFrame.select("_1", "_2", "_3").coalesce(1)
.write.format("com.databricks.spark.csv").mode("overwrite")
.option("header", "true").option("delimiter","\t").option("nullValue", Const.NULL)
.save(fileName.predictResultFile + day)
*/
val stages = new ArrayBuffer[StructField]()
stages += StructField("user_id", IntegerType, true)
stages += StructField("probability", DoubleType, true)
stages += StructField("label", DoubleType, true)
val schema = new StructType( stages.toArray )
val probabilityNewRDD = predictPredict.select("user_id", "probability", "label").rdd.map( row => Row(row.getInt(0), row.getAs[DenseVector](1)(1), row.getDouble(2)) )
val probabilityDataFrame = SparkConfTrait.spark.createDataFrame(probabilityNewRDD, schema)
probabilityDataFrame.select("user_id", "probability", "label").coalesce(1)
.write.format("com.databricks.spark.csv").mode("overwrite")
.option("header", "true").option("delimiter","\t").option("nullValue", Const.NULL)
.save(fileName.predictResultFile + day)
扫码关注腾讯云开发者
领取腾讯云代金券
Copyright © 2013 - 2025 Tencent Cloud. All Rights Reserved. 腾讯云 版权所有
深圳市腾讯计算机系统有限公司 ICP备案/许可证号:粤B2-20090059 深公网安备号 44030502008569
腾讯云计算(北京)有限责任公司 京ICP证150476号 | 京ICP备11018762号 | 京公网安备号11010802020287
Copyright © 2013 - 2025 Tencent Cloud.
All Rights Reserved. 腾讯云 版权所有