亚马逊弹性麻preduce引导操作不工作亚马逊、弹性、操作、工作

2023-09-11 10:57:48 作者:烟雨.

我试图引导行动的下列组合来提高我的工作堆大小,但没有人似乎工作:

I have tried the following combinations of bootstrap actions to increase the heap size of my job but none of them seem to work:

--mapred-key-value mapred.child.java.opts=-Xmx1024m 
--mapred-key-value mapred.child.ulimit=unlimited

--mapred-key-value mapred.map.child.java.opts=-Xmx1024m 
--mapred-key-value mapred.map.child.ulimit=unlimited

-m mapred.map.child.java.opts=-Xmx1024m
-m mapred.map.child.ulimit=unlimited 

-m mapred.child.java.opts=-Xmx1024m 
-m mapred.child.ulimit=unlimited 

什么是正确的语法?

What is the right syntax?

推荐答案

您有两种选择来实现这一目标:

You have two options to achieve this:

为了应用自定义设置,你可能想看看的Bootstrap操作文档亚马逊弹性麻preduce(亚马逊EMR),具体的行动Configure守护程序的:

In order to apply custom settings, You might want to have a look at the Bootstrap Actions documentation for Amazon Elastic MapReduce (Amazon EMR), specifically action Configure Daemons:

这pdefined引导作用$ P $,您可以指定堆大小或   其他的Java虚拟机(JVM),用于Hadoop守护进程选项。您   可以使用此引导操作来配置Hadoop的为大型作业   需要更多的内存比Hadoop的分配默认。您还可以使用   这个引导作用,以修改高级JVM选项,如垃圾   收集行为。

This predefined bootstrap action lets you specify the heap size or other Java Virtual Machine (JVM) options for the Hadoop daemons. You can use this bootstrap action to configure Hadoop for large jobs that require more memory than Hadoop allocates by default. You can also use this bootstrap action to modify advanced JVM options, such as garbage collection behavior.

这是例子提供为好,其中的堆大小设置为2048,并配置了Java名称节点选项的:

An example is provided as well, which sets the heap size to 2048 and configures the Java namenode option:

$ ./elastic-mapreduce –create –alive \
  --bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-daemons \
  --args --namenode-heap-size=2048,--namenode-opts=-XX:GCTimeRatio=19   

predefined JVM设置

另外,按照常见问题解答如何配置Hadoop的设置为我的工作流程?,如果你的工作流任务是内存密集型的,你可以选择每个核心使用较少的任务并降低你的工作,跟踪堆大小。对于这种情况,一个pre定义的引导作用可配置在启动时你的工作流程的 - 这是指动作Configure内存密集型工作负载,其中的允许您设置群集范围的Hadoop设置为适合工作的值与内存密集型工作负载流的,例如:

Predefined JVM Settings

Alternatively, as per the FAQ How do I configure Hadoop settings for my job flow?, if your job flow tasks are memory-intensive, you may choose to use fewer tasks per core and reduce your job tracker heap size. For this situation, a pre-defined Bootstrap Action is available to configure your job flow on startup - this refers to action Configure Memory-Intensive Workloads, which allows you to set cluster-wide Hadoop settings to values appropriate for job flows with memory-intensive workloads, for example:

$ ./elastic-mapreduce --create \
--bootstrap-action \
  s3://elasticmapreduce/bootstrap-actions/configurations/latest/memory-intensive

本$ P $应用的具体配置设置pdefined引导行动列在Hadoop内存密集型配置设置。

祝你好运!