我是新来的火花,我试图在EC2上运行它。我遵循的火花网页教程用火花EC2推出的Spark集群。于是,我尝试使用火花提交
应用程序提交到集群。命令如下:
I am new to Spark and I am trying to run it on EC2. I follow the tutorial on spark webpage by using spark-ec2 to launch a Spark cluster. Then, I try to use spark-submit
to submit the application to the cluster. The command looks like this:
./斌/火花提交--class org.apache.spark.examples.SparkPi --master火花://ec2-54-88-9-74.compute-1.amazonaws .COM:7077 --executor-2G内存--total执行人-芯1 ./examples/target/scala-2.10/spark-examples_2.10-1.0.0.jar 100
不过,我得到了以下错误:
However, I got the following error:
错误SparkDeploySchedulerBackend:应用程序已被杀害。原因:所有的高手都没有反应!放弃。
请让我知道如何解决它。谢谢你。
Please let me know how to fix it. Thanks.
您现在看到的这个问题,因为你的火花独立集群的主节点无法打开一个TCP连接返回到驱动器(您的计算机上)。默认模式火花提交
终止的客户的它运行提交其机器上的驱动程序。
You're seeing this issue because the master node of your spark-standalone cluster cant open a TCP connection back to the drive (on your machine). The default mode of spark-submit
is client which runs the driver on the machine that submitted it.
一个新的集群模式被添加到火花部署将作业提交到主在那里,然后运行在客户端上,不再需要直接连接。不幸的是这种模式在单机模式下支持。
A new cluster mode was added to spark-deploy that submits the job to the master where it is then run on a client, removing the need for a direct connection. Unfortunately this mode is not supported in standalone mode.
您可以投票给这里的JIRA问题:https://issues.apache.org/jira/browse/SPARK-2260
You can vote for the JIRA issue here: https://issues.apache.org/jira/browse/SPARK-2260
隧道通过SSH的连接是可能的,但等待时间将是一个大问题,因为司机将您的计算机上本地运行。
Tunneling your connection via SSH is possible but latency would be a big issue since the driver would be running locally on your machine.