高效的算法找出两套最大公共子集?子集、高效、两套、算法

2023-09-11 02:20:25 作者:爱傻笑的菇凉°

每个集合包含一堆校验和。例如: 集A: {  4445968d0e100ad08323df8c895cea15  a67f8052594d6ba3f75502c0b91b868f  07736dde2f8484a4a3af463e05f039e3  5b1e374ff2ba949ab49870ca24d3163a }

Each set contains bunch of checksums. For example: Set A: { 4445968d0e100ad08323df8c895cea15 a67f8052594d6ba3f75502c0b91b868f 07736dde2f8484a4a3af463e05f039e3 5b1e374ff2ba949ab49870ca24d3163a }

B组: {  6639e1da308fd7b04b7635a17450df7c  4445968d0e100ad08323df8c895cea15  a67f8052594d6ba3f75502c0b91b868f }

Set B: { 6639e1da308fd7b04b7635a17450df7c 4445968d0e100ad08323df8c895cea15 a67f8052594d6ba3f75502c0b91b868f }

A和B的最大公共子集: {  4445968d0e100ad08323df8c895cea15  a67f8052594d6ba3f75502c0b91b868f }

The maximum common subset of A and B is: { 4445968d0e100ad08323df8c895cea15 a67f8052594d6ba3f75502c0b91b868f }

很多这些操作都将被执行,所以我在寻找一个有效的算法来做到这一点。 感谢您的帮助。

A lot of these operations will be performed, so I'm looking for an efficient algorithm to do so. Thanks for your help.

推荐答案

将组之一在哈希表和遍历其他,丢弃不在散列元素。另外,无论排序并通过他们迭代同时,作为归并排序。

Put one of the sets in a hash table and iterate through the other, discarding elements that aren't in the hash. Alternatively, sort both and iterate through them simultaneously, as in merge sort.

编辑:后一种方法创建一个排序结果。我要补充一点,如果集广泛不同大小的,他们是presorted(说是因为你做了一堆交点),那么你就可以实现大的性能提升用无界二分查找跳过极目大名单。

The latter method creates a sorted result. I should add that if the sets are of widely disparate sizes and they're presorted (say because you're doing a bunch of intersections), then you can realize a large performance improvement by using "unbounded" binary search to skip ahead in the large list.