我知道数据帧是在下面的代码中编码的,但我不明白的是为什么要使用StringIndexer?StringIndexer是否应该与OneHotEncoderEstimator一起使用?val si = StringIndexer() .setHandleInvalid("keep") .setInputCol(ProcuctTypeCol) .setOutputCol(ProcuctTypeSIOutCol) val ohe = new OneHotEncoderEstimator()
setOutputCol("features")
val labelIndexer = new StringIndexerArray(tokenizer,countVectorizer,labelIndexer,labelEncoder,logisticRegression))cv: org.apache.spark.ml.tuning.CrossValidatorjava.lang.IllegalArgumentE
我尝试使用pyspark ml (spark 2.4.0)运行一个随机森林分类器,并使用OHE对目标标签进行编码。#%%import pyspark.sql.functions as F
from pyspark.ml.feature import StringIndexer,OneHotEncoderEstimator'buy',1000,100)],schema=("id","date","transaction",&qu