去除null、NaN
去除 dataframe 中的 null 、 NaN 有方法 drop ,用 dataframe.na 找出带有 null、 NaN 的行,用 drop 删除行:
import org.apache.spark...sentenceDataFrame = spark.createDataFrame(Seq(
(1, "asf"),
(2, "2143"),
(3, "rfds"),
(4, null..."label", "sentence")
sentenceDataFrame.show()
sentenceDataFrame.na.drop().show()
}
}
去除空字符串...去除空字符串用 dataframe.where :
import org.apache.spark....sentenceDataFrame = spark.createDataFrame(Seq(
(1, "asf"),
(2, "2143"),
(3, "rfds"),
(4, null