哪里是我的AWS EMR减速机的输出为我完成的工作(应该是在S3上,但什么也没有)?我的、是在、为我、什么也没

2023-09-11 08:40:00 作者:旧梦南苑

我在哪里在AWS上的电子病历我Hadoop的工作不被保存到S3的问题。当我运行在一个较小的样本作业,作业存储输出就好了。当我运行相同的命令,但在我的完整数据集,工作重新完成,但并没有什么存在的S3上,我指定我的输出去。

I'm having an issue where my Hadoop job on AWS's EMR is not being saved to S3. When I run the job on a smaller sample, the job stores the output just fine. When I run the same command but on my full dataset, the job completes again, but there is nothing existing on S3 where I specified my output to go.

显然有一个与AWS EMR一个错误,在2009年,但它是固定。

Apparently there was a bug with AWS EMR in 2009, but it was "fixed".

任何人都曾经有这个问题吗?我仍然有我的集群网络,希望数据是埋在服务器上的某个地方。如果任何人有一个想法,我可以找到这些数据,请让我知道!

Anyone else ever have this problem? I still have my cluster online, hoping that the data is buried on the servers somewhere. If anyone has an idea where I can find this data, please let me know!

更新:当我看着从还原剂的一个日志,一切看起来不错:

Update: When I look at the logs from one of the reducers, everything looks fine:

2012-06-23 11:09:04,437 INFO org.apache.hadoop.fs.s3native.NativeS3FileSystem (main): Creating new file 's3://myS3Bucket/output/myOutputDirFinal/part-00000' in S3
2012-06-23 11:09:04,439 INFO org.apache.hadoop.fs.s3native.NativeS3FileSystem (main): Outputstream for key 'output/myOutputDirFinal/part-00000' writing to tempfile '/mnt1/var/lib/hadoop/s3/output-3834156726628058755.tmp'
2012-06-23 11:50:26,706 INFO org.apache.hadoop.fs.s3native.NativeS3FileSystem (main): Outputstream for key 'output/myOutputDirFinal/part-00000' is being closed, beginning upload.
2012-06-23 11:50:26,958 INFO org.apache.hadoop.fs.s3native.NativeS3FileSystem (main): Outputstream for key 'output/myOutputDirFinal/part-00000' upload complete
2012-06-23 11:50:27,328 INFO org.apache.hadoop.mapred.Task (main): Task:attempt_201206230638_0001_r_000000_0 is done. And is in the process of commiting
2012-06-23 11:50:29,927 INFO org.apache.hadoop.mapred.Task (main): Task 'attempt_201206230638_0001_r_000000_0' done.

当我连接到这个任务的节点上,提到的临时目录是空的。

When I connect to this task's node, the temp directory mentioned is empty.

更新2:看完Difference Amazon S3和S3N Hadoop中之间,我想知道如果我的问题是使用:而不是S3N://作为我的输出路径S3 //。在我的我的两个小样本(即门店精),和我的职位,我用S3://。对,如果这可能是我的问题有什么想法?

Update 2: After reading Difference between Amazon S3 and S3n in Hadoop, I'm wondering if my problem is using "s3://" instead of "s3n://" as my output path. In my both my small sample (that stores fine), and my full job, I used "s3://". Any thoughts on if this could be my problem?

更新3:我现在看到在AWS上的电子病历,S3://和S3N://既映射到S3本地文件系统( AWS EMR文档)。

Update 3: I see now that on AWS's EMR, s3:// and s3n:// both map to the S3 native file system (AWS EMR documentation).

更新4:我重新运行该作业两次,每次增加服务器和减速机的数量。首先这两个与89/90减速器输出成品被复制到S3。第90说,根据日志成功复制,但AWS支持说文件不存在。他们已经升级了这个问题的工程团队。我的第二次​​运行使用更加减速机,并与所有的数据实际完成服务器复制到S3(谢天谢地!)。一种奇怪是,虽然一些减速器采取永远数据至S3复制 - 在这两个新的试验中,有一个减速器,其输出了1或2个小时,以复制到S3,在那里与其他还原剂只用了10分钟的最大(文件是3GB左右)。我认为这是涉及到一些错误的S3NativeFileSystem使用EMR(如长的吊 - 这我就要照单支付,当然,和所谓成功上传了不会上载到)。我上传到本地HDFS先,然后到S3,但我是having这方面的问题,以及(待定AWS工程团队的审查)。

Update 4: I re-ran this job two more times, each time increasing the number of servers and reducers. The first of these two finished with 89/90 reducer outputs being copied to S3. The 90th said it successfully copied according to logs, but AWS Support says file is not there. They've escalated this problem to their engineering team. My second run with even more reducers and and servers actually finished with all data being copied to S3 (thankfully!). One oddness though is that some reducers take FOREVER to copy the data to S3 -- in both of these new runs, there was a reducer whose output took 1 or 2 hours to copy to S3, where as the other reducers only took 10 minutes max (files are 3GB or so). I think this is relates to something wrong with the S3NativeFileSystem used by EMR (e.g. the long hanging -- which I'm getting billed for of course; and the alleged successful uploads that don't get uploaded). I'd upload to local HDFS first, then to S3, but I was having issues on this front as well (pending AWS engineering team's review).

TLDR;使用AWS EMR直接存储在S3上似乎越野车;他们的工程团队正在研究。

TLDR; Using AWS EMR to directly store on S3 seems buggy; their engineering team looking into.

推荐答案

这竟然是在AWS上的部分缺陷,他们已经在最新的AMI版本2.2.1固定它,在简要介绍这些发行说明。

This turned out to be a bug on AWS's part, and they've fixed it in the latest AMI version 2.2.1, briefly described in these release notes.

长的解释,我从AWS得到的是,当减速文件>为S3块限制(即5GB?),那么多部分被使用,但没有正确的错误检查怎么回事,所以这就是为什么它有时会工作,而不是其他时间。

The long explanation I got from AWS is that when the reducer files are > the block limit for S3 (i.e. 5GB?), then multipart is used, but there was not proper error-checking going on, so that is why it would sometimes work, and other times not.

在此情况下,继续为其他任何人,请参考我的案件编号,62849531。

In case this continues for anyone else, refer to my case number, 62849531.