集和解算法的实现算法

2023-09-11 22:52:00 作者:低头看月影

我在找的一套和解算法实现。问题是以下几点:有两套与确定一些相对紧凑的值(例如UUID或MD5 / SHA1 /不管哈希)坐在不同的机器上的元素。这些集合的不同之处较少的因素,我想同步这些集在传输的数据量极少。大多数谷歌搜索线索这里的。这是GPL的实施似乎是最先进的方法来工作。问题是,我不能在我的应用程序使用GPL下的code。最有可能我会使用类似nzmath重新实现它自己,但也许还有其他的实现(preferably Python或C / C ++),或者有其他更好的算法?

I'm looking for implementations of set reconciliation algorithm. The problem is following: there are two sets with elements identified by some relatively compact value (e.g. UUID or MD5/SHA1/whatever hash) sitting on different machines. These sets differ in relatively few elements and I want to synchronize these sets while transferring minimal amount of data. Most of googling leads here. This is GPL'd implementation of what seems to be the state-of-art approach to the task. The problem is that I can't use GPL'd code in my app. Most likely I'll have to reimplement it myself using something like nzmath, but maybe there are other implementations (preferably Python or C/C++), or maybe there are other nicer algorithms?

推荐答案

这code是从我的头,从而覆盖任何许可证本网站适用于code样本。

This code is out of my head, and thus covered by whatever license applies for code samples in this site.

# given two finite sequences of unique and hashable data,
# return needed opcodes and data needed for reconciliation

def set_reconcile(src_seq, dst_seq):
    "Return required operations to mutate src_seq into dst_seq"
    src_set= set(src_seq) # no-op if already of type set
    dst_set= set(dst_seq) # ditto

    for item in src_set - dst_set:
        yield 'delete', item

    for item in dst_set - src_set:
        yield 'create', item

使用方法如下:

Use as follows:

for opcode, datum in set_reconcile(machine1_stuff, machine2_stuff):
    if opcode == 'create':
        # act accordingly
    elif opcode == 'delete':
        # likewise
    else:
        raise RuntimeError, 'unexpected opcode'