如何有效地直接从S3进口许多大型的JSON文件到MongoDB中有效地、直接、文件、JSON

2023-09-11 23:43:24 作者:僞裝旳代名詞

我在S3 COM pressed JSON文件,我想建立的MongoDB在EC2到这些文件中包含的服务器的JSON文件。在COM pressed文件> 100M,有他们的1000。每个文件都包含小文件100000s。

I have compressed JSON files in S3 and I would like to set up MongoDB in EC2 to server json documents contained in these files. The compressed files are >100M and there are 1000s of them. Each file contains 100000s of small documents.

什么是让这个数据到了Mongo的最佳方法是什么? 如果有一种方式让蒙戈S3的路径,这将是最好的 并将它获取他们自己。 我还有什么比下载数据到服务器,并做mongoimport好?

What is the best way to get this data into Mongo? It would be nicest if there was a way to give Mongo the S3 paths and have it retrieve them itself. I there anything better than downloading the data to the server and doing mongoimport?

另外如何蒙戈处理这种数据量?

Also how well Mongo handle this amount of data?

推荐答案

您并不需要存储中间文件,可以通过管道S3文件输出到标准输出,你可以得到输入 mongoimport 从标准输入。

You don't need to store intermediate files, you can pipe the output of s3 file to stdout and you can get input to mongoimport from stdin.

您完整的命令看起来是这样的:

Your full command would look something like:

s3cmd get s3://<yourFilename> - | mongoimport -d <dbName> -c <collectionName>

注意 - 它说将文件发送到标准输出,而不是一个文件名

note the - which says send the file to stdout rather than to a filename.