亚马逊红移如何从S3复制,并设置一个JOB_ID亚马逊、JOB_ID

2023-09-11 11:12:56 作者:剑啸九天

亚马逊红移提供了使用复制命令S3对象加载表格数据的能力。是他们的一种方法,使用copy命令,还设置额外的COL =常数为每个插入行。

Amazon Redshift provides the ability to load table data from s3 objects using the "Copy" command. Is their a way to use the copy command, but also set additional "col=CONSTANT" for each inserted row.

我想每个复制的行设置一个作业ID(这是不是在源数据),我认为这将是一个耻辱,必须执行一个几百万的插入只是让每一行都有一个工作属性,当复制获取的方式我99%与有更好的性能。

I want to set a job_id (which is not in the source data) on each copied row, and I think it would be a shame to have to execute a few million inserts just so each row has a job attribute, when "copy" gets me 99% of the way there with much better performance.

也许有一个更聪明的解决方案?

Maybe there is a more clever solution?

推荐答案

如果你想让你在一个单一的COPY命令添加的行有JOB_ID的值相同,那么您可以复制数据到临时表中,再加入JOB_ID列成表,然后插入从临时表中的所有数据到像决赛桌:

If you want all your rows added in a single COPY command to have the same value of job_id, then you may COPY data into staging table, then add job_id column into that table, then insert all data from the staging table into final table like:

CREATE TABLE destination_staging (LIKE destination);
ALTER TABLE destination_staging DROP COLUMN job_id;
COPY destination_staging FROM 's3://data/destination/(...)' (...)
ALTER TABLE destination_staging ADD COLUM job_id INT DEFAULT 42;
INSERT INTO destination SELECT * FROM destination_staging ORDER BY sortkey_column;
DROP TABLE destination_staging;
ANALYZE TABLE destination;
VACUUM destination;

分析和真空不是必需的,但强烈建议,以更新查询分析器,并把所有的新数据到正确的位置。

ANALYZE and VACUUM are not necessary, but highly recommended in order to update query analyzer and put all new data into correct positions.