使用火花提交提交申请EC2集群集群、火花

2023-09-11 08:42:34 作者:男人就该有国际范

我是新来的火花,我试图在EC2上运行它。我遵循的火花网页教程用火花EC2推出的Spark集群。于是,我尝试使用火花提交应用程序提交到集群。命令如下:

I am new to Spark and I am trying to run it on EC2. I follow the tutorial on spark webpage by using spark-ec2 to launch a Spark cluster. Then, I try to use spark-submit to submit the application to the cluster. The command looks like this:

./斌/火花提交--class org.apache.spark.examples.SparkPi --master火花://ec2-54-88-9-74.compute-1.amazonaws .COM:7077 --executor-2G内存--total执行人-芯1 ./examples/target/scala-2.10/spark-examples_2.10-1.0.0.jar 100

不过,我得到了以下错误:

However, I got the following error:

错误SparkDeploySchedulerBackend:应用程序已被杀害。原因:所有的高手都没有反应!放弃。

请让我知道如何解决它。谢谢你。

Please let me know how to fix it. Thanks.

推荐答案

您现在看到的这个问题,因为你的火花独立集群的主节点无法打开一个TCP连接返回到驱动器(您​​的计算机上)。默认模式火花提交终止的客户的它运行提交其机器上的驱动程序。

You're seeing this issue because the master node of your spark-standalone cluster cant open a TCP connection back to the drive (on your machine). The default mode of spark-submit is client which runs the driver on the machine that submitted it.

一个新的集群模式被添加到火花部署将作业提交到主在那里,然后运行在客户端上,不再需要直接连接。不幸的是这种模式在单机模式下支持。

A new cluster mode was added to spark-deploy that submits the job to the master where it is then run on a client, removing the need for a direct connection. Unfortunately this mode is not supported in standalone mode.

您可以投票给这里的JIRA问题:https://issues.apache.org/jira/browse/SPARK-2260

You can vote for the JIRA issue here: https://issues.apache.org/jira/browse/SPARK-2260

隧道通过SSH的连接是可能的,但等待时间将是一个大问题,因为司机将您的计算机上本地运行。

Tunneling your connection via SSH is possible but latency would be a big issue since the driver would be running locally on your machine.