可笑缓慢写入亚马逊DynamoDB(PHP API)亚马逊、缓慢、可笑、API

2023-09-11 23:55:27 作者:我是一个祸害者

此问题已被张贴在AWS论坛,但但仍然没有答案的https: //forums.aws.amazon.com/thread.jspa?threadID=94589

This question has been already posted on AWS forums, but yet remains unanswered https://forums.aws.amazon.com/thread.jspa?threadID=94589

我想执行的项目短一长列的初始上传(约1.2亿人),到后来找回它们通过独特的键,它似乎是一个很好的例子,DynamoDb。

I'm trying to to perform an initial upload of a long list of short items (about 120 millions of them), to retrieve them later by unique key, and it seems like a perfect case for DynamoDb.

不过,我现在的写入速度很慢(大约每100写入8-9秒),这使得最初的上传几乎是不可能的(它会采取与目前的速度约3个月)。

However, my current write speed is very slow (roughly 8-9 seconds per 100 writes) which makes initial upload almost impossible (it'd take about 3 months with current pace).

我看过AWS论坛上寻找答案,已经尝试了以下几件事:

I have read AWS forums looking for an answer and already tried the following things:

我从切换单put_item要求的25个项目批量写入(推荐最大批量写入大小)​​,和我的每一个项目比1K位(也是推荐)较小。这是非常典型的,甚至25的我的项目是1K位下为好,但不保证(且不应反正无所谓,因为我知道,因为只有单个项目的规模是DynamoDB重要)。

I switched from single "put_item" calls to batch writes of 25 items (recommended max batch write size), and each of my items is smaller than 1Kb (which is also recommended). It is very typical even for 25 of my items to be under 1Kb as well, but it is not guaranteed (and shouldn't matter anyway as I understand as only single item size is important for DynamoDB).

我用的是最近推出的欧盟地区(我在英国)直接指定的切入点,通过调用set_region('dynamodb.eu-west-1.amazonaws.com'),因为显然没有其他这样做,在PHP API。 AWS控制台显示的表中适当的区域,使作品。

I use the recently introduced EU region (I'm in the UK) specifying its entry point directly by calling set_region('dynamodb.eu-west-1.amazonaws.com') as there is apparently no other way to do that in PHP API. AWS console shows that the table in a proper region, so that works.

我禁用了SSL致电disable_ssl()(获得每100个记录1秒)。

I have disabled SSL by calling disable_ssl() (gaining 1 second per 100 records).

不过,测试组的100个项目(4批写入要求25项)从未需要不到8秒的索引。每批写请求大约需要2秒,所以它不是像第一个是即时和随之而来的请求,然后缓慢。

Still, a test set of 100 items (4 batch write calls for 25 items) never takes less than 8 seconds to index. Every batch write request takes about 2 seconds, so it's not like the first one is instant and consequent requests are then slow.

我的表配置的吞吐量为100的写入和读出100单位应该是足够的,到目前为止(以防万一尝试更高的限制,同时,没有影响)。

My table provisioned throughput is 100 write and 100 read units which should be enough so far (tried higher limits as well just in case, no effect).

我也知道,有要求序列化的一些费用,所以我大概可以使用队列为收集我的要求,但确实是真正的问题那么多的batch_writes?而且我不认为这是问题,因为即使一个请求耗时过长。

I also know that there are some expenses on request serialisation so I can probably use the queue to "accumulate" my requests, but does that really matter that much for batch_writes? And I don't think that is the problem because even a single request takes too long.

我发现,一些人修改卷曲头(期待:特别是)API来加快请求在,但我不认为这是一个正确的途径,也是API一直以来他的意见更新被张贴。

I found that some people modify the cURL headers ("Expect:" particularly) in the API to speed the requests up, but I don't think that is a proper way, and also the API has been updated since that advice was posted.

我的应用程序正在运行的服务器是罚款,以及 - 我读过,有时CPU负载穿过屋顶,但对我来说一切都很好,这只是时间过长的网络请求

The server my application is running on is fine as well - I've read that sometimes the CPU load goes through the roof, but in my case everything is fine, it's just the network request that takes too long.

我现在很卡 - 有什么我能试试吗?请随时索取详细信息,如果我没有提供了足够的。

I'm stuck now - is there anything else I can try? Please feel free to ask for more information if I haven't provided enough.

有近其它线程,显然同样的问题,这里(无答案为止虽然)。

There are other recent threads, apparently on the same problem, here (no answer so far though).

此服务被认为是超快的,所以我真的被这个问题在一开始感到困惑。

This service is supposed to be ultra-fast, so I'm really puzzled by that problem in the very beginning.

推荐答案

如果你从你的本地计算机上传,速度会被各种各样的流量/防火墙等你和服务器之间的影响。如果我叫DynamoDB每个请求采取的第二个简单的时间0.3从澳大利亚前往/。

If you're uploading from your local machine, the speed will be impacted by all sorts of traffic / firewall etc between you and the servers. If I call DynamoDB each request takes 0.3 of a second simply because of the time to travel to/from Australia.

我的建议是将自己创建一个EC2实例(服务器)与PHP,上传脚本和所有文件复制到EC2服务器作为一个块,然后执行转储从那里。在EC2服务器shuold有起泡的速度向DynamoDB服务器。

My suggestion would be to create yourself an EC2 instance (server) with PHP, upload the script and all files to the EC2 server as a block and then do the dump from there. The EC2 server shuold have the blistering speed to the DynamoDB server.

如果你不自信有关设置EC2与LAMP自己,那么他们有一个新的服务弹性魔豆,可以做这一切为您服务。当你完成了上传,简单地烧服务器 - 希望你可以做的一切,他们的自由层的价格结构中:)

If you're not confident about setting up EC2 with LAMP yourself, then they have a new service "Elastic Beanstalk" that can do it all for you. When you've completed the upload, simply burn the server - and hopefully you can do all that within their "free tier" pricing structure :)

不能解决连接的长期问题,反而会降低三个月上传!

Doesn't solve long term issues of connectivity, but will reduce the three month upload!

 
精彩推荐
图片推荐