导出配置单元表到S3存储单元

2023-09-11 08:19:53 作者:彼岸见花不见叶╮

我通过弹性麻preduce互动对话期间建立的蜂巢表,并从CSV填充它的文件是这样的:

I've created a Hive Table through an Elastic MapReduce interactive session and populated it from a CSV file like this:

CREATE TABLE csvimport(id BIGINT, time STRING, log STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t';

LOAD DATA LOCAL INPATH '/home/hadoop/file.csv' OVERWRITE INTO TABLE csvimport;

我现在想存储的配置单元表的S3存储桶使该表是preserved一旦我终止麻preduce实例。

I now want to store the Hive table in a S3 bucket so the table is preserved once I terminate the MapReduce instance.

有谁知道如何做到这一点?

Does anyone know how to do this?

推荐答案

是的,你需要导出,并在你的蜂巢会议的开始和结束导入数据。

Yes you have to export and import your data at the start and end of your hive session

要做到这一点,你需要创建一个映射到S3存储桶和目录

To do this you need to create a table that is mapped onto S3 bucket and directory

CREATE TABLE csvexport (
  id BIGINT, time STRING, log STRING
  ) 
 row format delimited fields terminated by ',' 
 lines terminated by '\n' 
 STORED AS TEXTFILE
 LOCATION 's3n://bucket/directory/';

将数据插入S3表,并在插入完成后,目录将有一个CSV文件

Insert data into s3 table and when the insert is complete the directory will have a csv file

 INSERT OVERWRITE TABLE csvexport 
 select id, time, log
 from csvimport;

您表,现在preserved,当你创建一个新的蜂巢例如,你可以重新导入您的数据。

Your table is now preserved and when you create a new hive instance you can reimport your data

您的表可以存储在几个不同的格式取决于你想用它。

Your table can be stored in a few different formats depending on where you want to use it.