对于AWS室壁运动ShardIteratorType TRIM_HORIZON预期的行为行为、AWS、ShardIteratorType、TRIM_HORIZON

2023-09-11 11:12:06 作者:花漫九州

上下文:我不一定是指一个KCL的应用程序,只是单纯的Kinesis API调用

是否使用 TRIM_HORIZON 碎片迭代器类型,马上给你流中的最早发表的记录(即最早可在室壁运动内置的24小时窗口),或者干脆一个迭代器/多达24小时前光标一段时间内,你必须再使用沿着小溪前进,直到你击中最早发表记录?

换句话说,如果这不是很清楚......

在使用 TRIM_HORIZON 的碎片迭代器类型,是预期的行为,这将首先返回在24小时前可用的记录,但如果零记录被公布正好是24小时前,而是只有3个小时前,你的应用程序将需要通过previous21小时迭代查询才达到公布的4小时前?记录

时间轴例如:

在9月29日上午5:00 - 1分片创建一个流福 在9月29日上午5时02分 - 发布的一条记录,项目= A,以富流 在9月29日上午05时03分 - 第一个 GetShardIterator TRIM_HORIZON 调用作为碎片迭代器类型,那么问题一个 GetRecords 与碎片迭代器调用和接收记录项= A 在9月30日7:02 - 发布第二记录,项目= B,以富流 在9月30日上午7点03分 - 第一个 GetShardIterator TRIM_HORIZON 调用作为碎片迭代器类型,那么问题一个 GetRecords 与碎片迭代器调用。 我应该期望从这个调用的结果 (注:我们不记得/从第3步重新使用碎片迭代器)?的

有关上述步骤5中,它是自档案= B的出版已经超过24小时自项= A的消息发布在流和仅一分钟。将新鲜的碎片迭代器 TRIM_HORIZON 马上给你最早的可用记录,或者你需要需要不断迭代,直到你打一个时间段的东西已经发布

我一直在尝试用室壁运动和一切工作正常昨天或前两天(即我的发布和使用没有任何问题)。我做了一些其他修改我的code,并开始今天再次发布。当我启动了我的消费,什么也没有,甚至让它运行几分钟后出来的。我尝试发布和使用完全相同的时间,仍然一无所获。经过与 AFTER_SEQUENCE_NUMBER 迭代器类型手动播放,并使用一些序列号从我的消费记录来自前几天,我能达到我的近期发布的消息。不过,如果我回去使用 TRIM_HORIZON 键入,我看不出有什么消息都没有。

我看了看文档,但大部分的文档,我发现假设您正在使用的氯化钾(其实我是用KCL开始,但是当它开始失败我下降到原始API调用),并提,你必须有一个应用程序名称和DynamoDB表用于跟踪状态。这是最好的,我可以告诉是不正确的,如果你使用的是纯的Kinesis API调用或室壁运动CLI,这两个我终于尝试。我终于写了一个纯API脚本启动与 TRIM_HORIZON 和poll无限并最终创下新记录(带〜600次迭代,开始了14小时的背后现在,发现记录约5小时后方现在)。如果这是预期的行为,这似乎是在措辞的文档只是有点混乱/误导性:

  

TRIM_HORIZON - 开始读数的碎片最后修剪记录   在系统中,这是在分片的最旧的数据记录。

我认为(现在似乎不正确地),术语最早的数据记录的意思,我已经发布到流记录,不是简单的一个时间段的激流。

这将会是巨大的,如果有人能帮助确认/解释一下我看到的行为。

谢谢!

解决方案

这是在TRIM天涯,或在流修剪发生在地平线上。

在碎片迭代器可能会得到0的记录叫的时候,所以你需要不断迭代到达那里的最早的记录是该地区(如果你不经常推到流或者有时间的差距)。该getRecords会给你下一个碎片迭代器,你可以用它来迭代。

从文档: http://docs.aws.amazon.com/kinesis/latest/APIReference /API_GetRecords.html

  クラウド利用で最も心配な セキュリティ AWS Well Architectedを更に掘り下げた オージス総研独自の さらに使える フレームワーク活用事例とは

如果没有在碎片的部分可用记录的   迭代器指向,GetRecords返回一个空列表。需要注意的是它   可能需要多次调用才能到碎片的部分,其   包含的记录。

Context: I'm not necessarily referring to a KCL-based application, just pure Kinesis API calls.

Does the using the TRIM_HORIZON shard iterator type immediately give you the earliest published record in the stream (ie earliest available within Kinesis' built-in 24hr window), or simply an iterator/cursor for some time period as much as 24 hours ago, that you must then use to advance along the stream until you hit the earliest published record?

Put another way, in case that's not quite clear....

When using the shard iterator type of TRIM_HORIZON, is the expected behavior that it will begin with returning the records that were available 24 hours ago, BUT if zero records were published exactly 24 hours ago, and instead only 3 hours ago, that your application will need to iteratively poll through the previous 21 hours before it reaches the records published 3 hours ago?

Timeline example:

Sept 29 5:00 am - Create a stream "foo" with 1 shard Sept 29 5:02 am - Publish a single record, "Item=A", to the "foo" stream Sept 29 5:03 am - Issue a GetShardIterator call with TRIM_HORIZON as your shard iterator type, then issue a GetRecords call with that shard iterator and receive the record "Item=A" Sept 30 7:02 am - Publish a second record, "Item=B", to the "foo" stream Sept 30 7:03 am - Issue a GetShardIterator call with TRIM_HORIZON as your shard iterator type, then issue a GetRecords call with that shard iterator. What should be expected as the result from this call? (Note: we did not remember/re-use the shard iterator from step 3)

For Step 5 above, it's been more than 24 hours since the "Item=A" message was published on the stream and only a minute since "Item=B" was published. Will a fresh shard iterator with TRIM_HORIZON immediately give you the earliest available record, or do you need to need to keep iterating until you hit a time period when something has been published?

I'd been experimenting with Kinesis and everything was working fine yesterday or two days ago (ie. I was publishing AND consuming without any issues). I made some additional modifications to my code and began publishing again today. When I fired up my consumer, nothing was coming out at all even after letting it run for a few minutes. I tried publishing and consuming at exactly the same time, and still nothing. After manually playing with the AFTER_SEQUENCE_NUMBER iterator type, and using some sequence numbers from my consumer logs from a few days ago, I was able to reach my recently published messages. But then if I go back to using the TRIM_HORIZON type, I see no messages at all.

I've looked at the docs, but most of docs I found assume you are using the KCL (I actually was using KCL initially, but when it started failing I dropped down to raw API calls) and mention that you must have an application name and that DynamoDB tables are used for tracking state. Which as best I can tell is not true if you're using pure Kinesis API calls or the Kinesis CLI, both of which I eventually tried. I finally wrote a pure API script to start with TRIM_HORIZON and poll infinitely and eventually it hit new records (took ~600 iterations; started out 14hrs behind "now" and found records at about 5 hours behind "now"). If this is expected behavior, it seems like the wording in the docs is just a little confusing/misleading:

TRIM_HORIZON - Start reading at the last untrimmed record in the shard in the system, which is the oldest data record in the shard.

I assumed (now seemingly incorrectly) that the terms "oldest data record" meant record that I've published into the stream, not simply a time period in the stream.

It'd be great if someone can help confirm/explain the behavior I'm seeing.

Thanks!

解决方案

it's at the TRIM HORIZON, or the HORIZON where the stream TRIMming happens.

the shard iterator may get 0 records when called, so you'll need to keep iterating to reach the area where the oldest record is (if you push infrequently to the stream or have time gaps). the getRecords will give you the next shard iterator you can use to iterate.

from doc: http://docs.aws.amazon.com/kinesis/latest/APIReference/API_GetRecords.html

If there are no records available in the portion of the shard that the iterator points to, GetRecords returns an empty list. Note that it might take multiple calls to get to a portion of the shard that contains records.