FPGrowth算法星火星火、算法、FPGrowth

2023-09-11 05:52:29 作者:小怪兽爱上了奥特曼

我试图运行FPGrowth算法星火的例子,但是,我穿过一个错误的到来。这是我的code:

I am trying to run an example of the FPGrowth algorithm in Spark, however, I am coming across an error. This is my code:

import org.apache.spark.rdd.RDD
import org.apache.spark.mllib.fpm.{FPGrowth, FPGrowthModel}

val transactions: RDD[Array[String]] = sc.textFile("path/transations.txt").map(_.split(" ")).cache()

val fpg = new FPGrowth().setMinSupport(0.2).setNumPartitions(10)

val model = fpg.run(transactions)

model.freqItemsets.collect().foreach { itemset => println(itemset.items.mkString("[", ",", "]") + ", " + itemset.freq)}

在code工作,直到在那里我得到的错误的最后一行:

The code works up until the last line where I get the error:

WARN TaskSetManager: Lost task 0.0 in stage 4.0 (TID 16, ip-10-0-0-###.us-west-1.compute.internal): 
com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: Can not set 
final scala.collection.mutable.ListBuffer field org.apache.spark.mllib.fpm.FPTree$Summary.nodes to scala.collection.mutable.ArrayBuffer
Serialization trace:
nodes (org.apache.spark.mllib.fpm.FPTree$Summary)

我甚至尝试使用这里提出的解决方案: SPARK-7483

I have even tried to use the solution that was proposed here: SPARK-7483

我还没有得到任何运气这两种。 有没有人找到一个解决的办法?还是没有人知道的一种方式,只是查看结果或将其保存到一个文本文件?

I haven't had any luck with this either. Has anyone found a solution to this? Or does anyone know of a way to just view the results or save them to a text file?

任何帮助将是很大的AP preciated!

Any help would be greatly appreciated!

我还发现了完整的源$ C ​​$ C这个算法 - http://mail-archives.apache.org/mod_mbox/spark-commits/201502.mbox/%3C1cfe817dfdbf47e3bbb657ab343dcf82@git.apache.org%3E

I also found the full source code for this algorithm - http://mail-archives.apache.org/mod_mbox/spark-commits/201502.mbox/%3C1cfe817dfdbf47e3bbb657ab343dcf82@git.apache.org%3E

推荐答案

我得到了同样的错误:这是因为火花版本。在星火1.5.2,这是固定的,但是我用的是1.3。我固定通过执行以下操作:

I got the same error: This is because of spark version. In Spark 1.5.2 this is fixed, however I was using 1.3. I fixed by doing the following:

我使用的火花壳火花提交开关,然后更改了配置kryoserializer。这是我的code:

I switched from using spark-shell to spark-submit and then changed the configuration for kryoserializer. Here is my code:

进口org.apache.spark {SparkConf,SparkContext} 进口org.apache.spark.rdd.RDD 进口org.apache.spark.mllib.fpm.FPGrowth 进口scala.collection.mutable.ArrayBuffer 进口scala.collection.mutable.ListBuffer

import org.apache.spark.{SparkConf, SparkContext} import org.apache.spark.rdd.RDD import org.apache.spark.mllib.fpm.FPGrowth import scala.collection.mutable.ArrayBuffer import scala.collection.mutable.ListBuffer

对象fpgrowth {   高清主(参数:数组[字符串]){     VAL的conf =新SparkConf()。setAppName(星火FPGrowth)     conf.registerKryoClasses(阵列(classOf [ArrayBuffer [字符串],classOf [ListBuffer [字符串]]))

object fpgrowth { def main(args: Array[String]) { val conf = new SparkConf().setAppName("Spark FPGrowth") conf.registerKryoClasses(Array(classOf[ArrayBuffer[String]], classOf[ListBuffer[String]]))

val sc = new SparkContext(conf)

val data = sc.textFile("<path to file.txt>")

val transactions: RDD[Array[String]] = data.map(s => s.trim.split(' '))

val fpg = new FPGrowth()
  .setMinSupport(0.2)
  .setNumPartitions(10)
val model = fpg.run(transactions)

model.freqItemsets.collect().foreach { itemset =>
  println(itemset.items.mkString("[", ",", "]") + ", " + itemset.freq)
}

} }