Spark - Frequent Pattern Mining
官方文档:https://spark.apache.org/docs/2.2.0/ml-frequent-pattern-mining.html...、子序列或者其他子结构通常是大规模数据分析的第一步,这也是近些年数据挖掘领域的活跃研究话题;
目录:
FP-Growth
FP-Growth
FP-Growth算法基于这篇论文,“FP”的意思就是频繁模式...;
associationRules:生成的可信度大于minConfidence的关联规则,同样是DataFrame格式;
transform;
from pyspark.ml.fpm import FPGrowth...df = spark.createDataFrame([
(0, [1, 2, 5]),
(1, [1, 2, 3, 5]),
(2, [1, 2])
], ["id", "...items"])
fpGrowth = FPGrowth(itemsCol="items", minSupport=0.5, minConfidence=0.6)
model = fpGrowth.fit