保险丝元组找到等价类保险丝

2023-09-11 03:38:53 作者:不挽离人

假设我们有一个有限域D = {D1,... DK} containg k个元素。

Suppose we have a finite domain D={d1,..dk} containg k elements.

我们认为S.D。子集的n次方,即一组的形式与其中的元组; A1,...,一>,艾D中。

We consider S a subset of D^n, i.e. a set of tuples of the form < a1,..,an >, with ai in D.

我们想用S'2的子集^ D ^ N,即一组的形式与其中的元组重新present它(紧凑); A1,。一个>艾是D的子集的含义是,对于任何元组S'在S'S中存在艾的交叉产品的所有元素。

We want to represent it (compactly) using S' a subset of 2^D^n, i.e. a set of tuples of the form < A1,..An > with Ai being subsets of D. The implication is that for any tuple s' in S' all elements in the cross product of Ai exist in S.

例如,考虑D = {A,B,C}所以K = 3,N = 2,元组S =&LT; A,B> +&LT; A,C> +&LT; B,B> +&LT; B,C>。

For instance, consider D={a,b,c} so k=3, n=2 and the tuples S=< a,b >+< a,c >+< b,b >+< b,c >.

我们可以用S'=&LT; {A,B},{B,C}>重新present小号

We can use S'=<{a,b},{b,c}> to represent S.

这单解决方案也微乎其微,S'=&LT; {一},{B,C}> +≤{B},{B,C}>也是一个解决方案,但它更大,因此不太理想

This singleton solution is also minimal, S'=<{a},{b,c}>+<{b},{b,c}> is also a solution but it is larger, therefore less desirable.

有些尺寸,在具体情况下,我们需要处理:在域D K〜1000个元素,N&LT; = 10相对较小(复杂性的主要来源),| S |范围大值> 10 ^ 6。

Some sizes, in concrete instances, that we need to handle : k ~ 1000 elements in the domain D, n <= 10 relatively small (main source of complexity), |S| ranging to large values > 10^6.

一个幼稚的方法包括在第一切入s转换的域S'2 ^ D ^ n,则使用下面的测试,两个两个,两个元S1,S2在S',可融合成一个单一的元组S'当且仅当。它们仅由一个元件不同。

A naïve approach consists in first plunging S into the domain of S' 2^D^n, then using the following test, two by two, two tuples s1,s2 in S' can be fused to form a single tuple in S' iff. they differ by only one component.

例如, &LT; A,B> +&LT; A,C> - >&LT; {一},{B,C}>(不同的第二个组成部分)

e.g. < a,b >+< a,c > -> <{a},{b,c}> (differ on second component)

&LT; B,B> +&LT; B,C> - >&LT; {B},{B,C}>(不同的第二个组成部分)

< b,b >+< b,c > -> <{b},{b,c}> (differ on second component)

&LT; {一},{B,C}> +≤{B},{B,C}> - >&LT; {A,B},{B,C}>(不同的第一个组件)

<{a},{b,c}> + <{b},{b,c}> -> <{a,b},{b,c}> (differ on first component)

现在可以有几个极小S',我们有兴趣在寻找任何一个,并尽量减少一些的近似也行,只要他们不给错误的结果(也就是说,即使S'不小它可能是,但我们得到非常快速的结果)。

Now there could be several minimal S', we are interested in finding any one, and approximations of minimisation of some kind are also ok, provided they don't give wrong results (i.e. even if S' is not as small as it could be, but we get very fast results).

朴素算法处理的事实,任何新推出的融合的元组可以匹配其他一些元组,因此扩展实在太差了对较大的输入集,即使有n个剩余低。你需要| S'| ^ 2比较,以确保衔接,任何时候你做的导火索两个元素,我目前正在重新测试每对(我怎么能提高呢?)。

Naive algorithm has to deal with the fact that any newly introduced "fused" tuple could match with some other tuple so it scales really badly on large input sets, even with n remaining low. You need |S'|^2 comparisons to ensure convergence, and any time you do fuse two elements, I'm currently retesting every pair (how can I improve that ?).

很多的效率是迭代顺序依赖,所以整理了一组以某种方式(S)可能是一种选择,或者可能会使用散列索引,但我不知道该怎么做。

A lot of efficiency is iteration order dependent, so sorting the set in some way(s) could be an option, or perhaps indexing using hashes, but I'm not sure how to do it.

势在必行伪code将是理想的,或指针的问题的东西的改写我可以将真正帮助运行一个解算器。

Imperative pseudo code would be ideal, or pointers to a reformulation of the problem to something I can run a solver on would really help.

推荐答案

下面是一些伪演示您的 S'=&LT(C#$ C $,我没有测试过三); {一},{B ,C}> +≤{B},{B,C}> 方式。除空间的需求​​,其中,当对于所述元件是可以忽略的使用整数指数;整体效率和速度Add'ing和Test'ing元组应该是非常快的。如果你想有一个切实可行的解决方案,那么你已经有一个,你只需要使用正确的ADT。

Here's some psuedo (C# code that I haven't tested) that demonstrates your S'=<{a},{b,c}>+<{b},{b,c}> method. Except for the space requirements, which when using an integer index for the element are negligible; the overall efficiency and speed for Add'ing and Test'ing tuples should be extremely fast. If you want a practical solution then you already have one you just have to use the correct ADTs.

ElementType[] domain = new ElementType[]; // a simple array of domain elements
  FillDomain(domain); // insert all domain elements
  SortArray(domain); // sort the domain elements  K log K time
SortedDictionary<int, HashSet<int>> subsets; // int's are index/ref into domain
subsets = new SortedDictionary<int, HashSet<int>>();
//
void AddTuple(SortedDictionary<int, HashSet<int>> tuples, ElementType[] domain, ElementType first, elementType second) {
    int a = BinarySearch(domain, first); // log K time (binary search)
    int b = BinarySearch(domain, second); // log K time (binary search)
    if(tuples.ContainsKey(a)) { // log N time (binary search on sorted keys)
        if(!tuples[a].Contains(b)) { // constant time (hash lookup)
            tuples[a].Add(b); // constant time (hash add)
        }         
    } else { // constant time (instance + hash add)
        tuples[a] = new HashSet<in>();
        tuples[a].Add(b);
    }
}
//
bool ContainsTuple(SortedDictionary<int, HashSet<int>> tuples, ElementType[] domain, ElementType first, ElementType second) {
    int a = BinarySearch(domain, first); // log K time (binary search)
    int b = BinarySearch(domain, second); // log K time (binary search)
    if(tuples.ContainsKey(a)) { // log N time (binary search on sorted keys)
        if(tuples[a].Contains(b)) { // constant time (hash test)
            return true;
        }
    }
    return false;
}

节省的空间来优化您的元组的子集S'不会outweight优化过程本身的放缓。对于尺寸优化(如果你知道你k将会低于65536你可以使用短整数,而不是在SortedDictionary和HashSet的整数。但即使是50万整数只需要每32位整数* 50万〜= 200 MB的4个字节

The space savings for optimizing your tuple subset S' won't outweight the slowdown of the optimization process itself. For size optimization (if you know you're K will be less than 65536 you could use short integers instead of integers in the SortedDictionary and HashSet. But even 50 mil integers only take up 4 bytes per 32bit integer * 50 mil ~= 200 MB.

修改 这里是由编码另一种方法/映射你的元组为一个字符串,你可以利用二进制字符串进行比较,事实上,UTF-16 / UTF-8编码是非常有效的大小。同样,这仍然不能做你想做的合并优化,但速度和效率将是pretty的好。

EDIT Here's another approach by encoding/mapping your tuples to a string you can take advantage of binary string compare and the fact that UTF-16 / UTF-8 encoding is very size efficient. Again this still doesn't doing the merging optimization you want, but speed and efficiency would be pretty good.

下面是一些快速的伪code在JavaScript中。

Here's some quick pseudo code in JavaScript.

Array.prototype.binarySearch = function(elm) {
  var l = 0, h = this.length - 1, i; 
  while(l <= h) { 
    i = (l + h) >> 1; 
    if(this[i] < elm) l = ++i; 
    else if(this[i] > elm) h = --i; 
    else return i; 
  } 
  return -(++l); 
};
// map your ordered domain elements to characters 
// For example JavaScript's UTF-16 should be fine
// UTF-8 would work as well
var domain = {
  "a": String.fromCharCode(1),
  "b": String.fromCharCode(2),
  "c": String.fromCharCode(3),
  "d": String.fromCharCode(4)
}
var tupleStrings = [];
// map your tuple to the string encoding
function map(tuple) {
  var str = "";
  for(var i=0; i<tuple.length; i++) {
    str += domain[tuple[i]];
  }
  return str;
}
function add(tuple) {
  var str = map(tuple);
  // binary search
  var index = tupleStrings.binarySearch(str);
  if(index < 0) index = ~index;
  // insert depends on tupleString's type implementation
  tupleStrings.splice(index, 0, str);
}
function contains(tuple) {
  var str = map(tuple);
  // binary search 
  return tupleString.binarySearch(str) >= 0;
}

add(["a","b"]);
add(["a","c"]);
add(["b","b"]);
add(["b","c"]);
add(["c","c"]);
add(["d","a"]);
alert(contains(["a","a"]));
alert(contains(["d","a"]));
alert(JSON.stringify(tupleStrings, null, "\n"));
相关推荐