通过在DynamoDB表中的所有项目迭代迭代、项目、DynamoDB

2023-09-11 11:36:19 作者：沐北清歌寒

我想遍历我DynamoDB表中的所有项目。（我知道这是一个低效的过程，但我这样做一次，以建立一个索引表。）

I'm trying to iterate through all items in my DynamoDB table. (I understand this is an inefficient process but am doing this one-time to build an index table.)

据我所知，DynamoDB的扫描（）函数返回1MB或供应限制的较小者。为了弥补这一点，我写了一个函数，查找LastEvaluatedKey的结果，并重新查询从LastEvaluatedKey开始得到的所有结果。

I understand that DynamoDB's scan() function returns the lesser of 1MB or a supplied limit. To compensate for this, I wrote a function that looks for the "LastEvaluatedKey" result and re-queries starting from the LastEvaluatedKey to get all the results.

不幸的是，它好像每次我的函数循环时间，在整个数据库中每一个键扫描，迅速吃了我读分配单位。这是非常缓慢的。

Unfortunately, it seems like every time my function loops, every single key in the entire database is scanned, quickly eating up my allocated read units. It's extremely slow.

下面是我的code：

def search(table, scan_filter=None, range_key=None,
           attributes_to_get=None,
           limit=None):
    """ Scan a database for values and return
        a dict.
    """

    start_key = None
    num_results = 0
    total_results = []
    loop_iterations = 0
    request_limit = limit

    while num_results < limit:
        results = self.conn.layer1.scan(table_name=table,
                                  attributes_to_get=attributes_to_get,
                                  exclusive_start_key=start_key,
                                  limit=request_limit)
        num_results = num_results + len(results['Items'])
        start_key = results['LastEvaluatedKey']
        total_results = total_results + results['Items']
        loop_iterations = loop_iterations + 1
        request_limit = request_limit - results['Count']

        print "Count: " + str(results['Count'])
        print "Scanned Count: " + str(results['ScannedCount'])
        print "Last Evaluated Key: " + str(results['LastEvaluatedKey']['HashKeyElement']['S'])
        print "Capacity: " + str(results['ConsumedCapacityUnits'])
        print "Loop Iterations: " + str(loop_iterations)

    return total_results

调用函数：

db = DB()
results = db.search(table='media',limit=500,attributes_to_get=['id'])

和我的输出：

Count: 96
Scanned Count: 96
Last Evaluated Key: kBR23QJNAwYZZxF4E3N1crQuaTwjIeFfjIv8NyimI9o
Capacity: 517.5
Loop Iterations: 1
Count: 109
Scanned Count: 109
Last Evaluated Key: ATcJFKfY62NIjTYY24Z95Bd7xgeA1PLXAw3gH0KvUjY
Capacity: 516.5
Loop Iterations: 2
Count: 104
Scanned Count: 104
Last Evaluated Key: Lm3nHyW1KMXtMXNtOSpAi654DSpdwV7dnzezAxApAJg
Capacity: 516.0
Loop Iterations: 3
Count: 104
Scanned Count: 104
Last Evaluated Key: iirRBTPv9xDcqUVOAbntrmYB0PDRmn5MCDxdA6Nlpds
Capacity: 513.0
Loop Iterations: 4
Count: 100
Scanned Count: 100
Last Evaluated Key: nBUc1LHlPPELGifGuTSqPNfBxF9umymKjCCp7A7XWXY
Capacity: 516.5
Loop Iterations: 5

这是预期的行为？或者说，我是什么做错了吗？

Is this expected behavior? Or, what am I doing wrong?

通过在DynamoDB表中的所有项目迭代迭代、项目、DynamoDB

推荐答案