- [R SEGUE createCluster()问题问题、SEGUE、createCluster

2023-09-12 21:27:37 作者:谁惊艳了谁的时光

我试图创建EC2集群。我有一个帐户设置和验证AWS。我已经成功下载并安装了 SEGUE 包和相关包,并把我的AWS凭据。我的问题开始,当我试图创建一个集群中,我得到了以下内容:

I'm trying to create a cluster on EC2. I have an account setup and validated with AWS. I have successfully downloaded and installed the segue package and related packages and set my AWS credentials. My problem starts when I try to create a cluster and I get the following:

> library(segue)
Loading required package: rJava
Loading required package: caTools
Loading required package: bitops
Segue did not find your AWS credentials. Please run the setCredentials() function.
> setCredentials('', '') #keys hidden

> myCluster <- createCluster(numInstances=5)
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl,  : 
  com.amazonaws.AmazonClientException: Can't turn bucket name into a URI: Illegal character in authority at index 8: https://c:\users\backup~1\appdata\local\temp\rtmp4u0n8yqaaoducils-segue.s3.amazonaws.com

任何想法?

推荐答案

acesnap,我是Segue公司的作者,我可以自信地说,你正在运行到的问题是, Segue公司包尚未实现在Windows平台上运行。我怀疑的问题是,Windows并有趣的事情与文件路径,临时文件等。将Segue公司包的服务器端始终是亚马逊弹性的Map Reduce它运行的Linux业务,但临时文件是建立在客户机上,因此Segue公司必须谈好与本地操作系统。

acesnap, I'm the author of Segue and I can say with confidence that the issue you're running into is that the Segue package has not been implemented to run on the Windows platform. I'm suspicious that the issue is that windows does funny things with file paths, temp files, and the like. The server side of the Segue package is always the Amazon Elastic Map Reduce service which runs Linux, but temporary files are built on the client machine and so Segue must talk nice with the local operating system.

有几种变通办法我能想到的:

There are several work-arounds I can think of:

设置虚拟框在本地计算机上,并得到的Ubuntu和R安装。

Set up Virtual Box on your local machine and get Ubuntu and R installed.

设置一个EC2机并安装R和Segue公司,然后使用该机器断火Segue公司的工作。

Set up an EC2 machine and install R and Segue and then use that machine to fire off Segue jobs.

购买一台Mac或台式机上安装Linux(还挺明显的,我猜)

Buy a Mac or install Linux on a desktop machine (kinda obvious, I guess)

虽然我的台式机Mac和Linux,我用#2上面频繁。我这样做是因为它加快了机器运行Segue公司和后端集群之间的通信。它也降低了Segue公司主设备将失去连接于EMR后端的概率。这是有价值的,因为如果通讯中断,Segue公司和亚马逊云之间,而一个作业正在运行,则该作业将运行云计算集群上,但也没有办法返回结果给Segue公司主机(从提交作业的机)。

Even though my desktop machines are Mac and Linux, I use #2 above frequently. I do this because it speeds up the communication between the machine running Segue and the backend cluster. It also reduces the probability that the Segue main machine will lose connectivity to the EMR backend. This is valuable because if communication is lost between Segue and the amazon cloud while a job is running then the job will run on the cloud cluster, but have no way of returning results to the Segue main machine (the machine you submit jobs from).