有效地消除公共子前pressions在.NET防爆pression树有效地、pressions、pression、NET

2023-09-11 02:11:01 作者:风月不如你

我已经写了DSL和编译器生成.NET EX pression树从它。 树中的所有EX pressions的副作用和无前pression保证是一个不言EX pression(无本地人,环,块等)。 (修改:树可以包括文字,属性访问,标准的运营商和函数调用 - 它可以做花哨的东西就像记忆化里面,但外部无副作用)。

I've written a DSL and a compiler that generates a .NET expression tree from it. All expressions within the tree are side-effect-free and the expression is guaranteed to be a "non-statement" expression (no locals, loops, blocks etc.). (Edit: The tree may include literals, property accesses, standard operators and function calls - which may be doing fancy things like memoization inside, but are externally side-effect free).

现在,我想就可以执行公共子-EX pression淘汰的优化。

Now I would like to perform the "Common sub-expression elimination" optimization on it.

例如,给予相应的C#的lambda树:

For example, given a tree corresponding to the C# lambda:

foo =>      (foo.Bar * 5 + foo.Baz * 2 > 7) 
         || (foo.Bar * 5 + foo.Baz * 2 < 3)  
         || (foo.Bar * 5 + 3 == foo.Xyz)

...我想生成树当量(忽视的事实是有些短路语义也常常被忽略):

...I would like to generate the tree-equivalent of (ignore the fact that some of the short-circuiting semantics are being ignored):

foo =>
{
     var local1 = foo.Bar * 5;

     // Notice that this local depends on the first one.        
     var local2 = local1 + foo.Baz * 2; 

     // Notice that no unnecessary locals have been generated.
     return local2 > 7 || local2 < 3 || (local1 + 3 == foo.Xyz);
}

我熟悉写作前pression-游客,但算法这种优化是不会立即明显,我 - 我当然能找到一个树内重复,但有明显有些技巧分析内子树之间的依赖关系,以正确有效地消除子前pressions。

I'm familiar with writing expression-visitors, but the algorithm for this optimization isn't immediately obvious to me - I could of course find "duplicates" within a tree, but there's obviously some trick to analyzing the dependencies within and between sub-trees to eliminate sub-expressions efficiently and correctly.

我看了关于谷歌的算法,但他们似乎相当复杂的迅速实施。此外,他们似乎很一般,并不一定把树的简单起见,我必须考虑进去。

I looked for algorithms on Google but they seem quite complicated to implement quickly. Also, they seem very "general" and don't necessarily take the simplicity of the trees I have into account.

推荐答案

你在注意这不是一个简单的问题,正确的。

You're correct in noting this is not a trivial problem.

经典的方式,编译器处理它是一个向无环图(DAG)重新presentation的EX pression 。 DAG的是建立在以相同的方式作为抽象语法树(并且可以通过遍历AST建 - 为前pression访客也许作业;我不知道很多的C#的库),所不同的是一个字典的previously发出子图保持不变。之前生成与给定的子女任何给定的节点类型,该词典咨询,看看是否已经存在。仅当该检查失败是创造一个新的,然后添加到词典中。

The classical way that compilers handle it is a Directed Acyclic Graph (DAG) representation of the expression. The DAG is built in the same manner as the abstract syntax tree (and can be built by traversing the AST - perhaps a job for the expression visitor; I don't know much of C# libraries), except that a dictionary of previously emitted subgraphs is maintained. Before generating any given node type with given children, the dictionary is consulted to see if one already exists. Only if this check fails is a new one created, then added to the dictionary.

由于现在一个节点可以下从多个父母,结果是一个DAG。

Since now a node may descend from multiple parents, the result is a DAG.

然后DAG遍历深度最先产生code。因为公共子前pressions现在再由单一节点psented $ P $,该值仅计算一次,并存储在一个临时购买在code代以使用发射其他前pressions 。如果原来的code包含任务,这个阶段变得复杂。由于您的树是无副作用,DAG的应该是最简单的方法来解决你的问题。

Then the DAG is traversed depth first to generate code. Since common sub-expressions are now represented by a single node, the value is only computed once and stored in a temp for other expressions emitted later in the code generation to use. If the original code contains assignments, this phase gets complicated. Since your trees are side-effect free, the DAG ought to be the most straightforward way to solve your problem.

我记得的DAG在龙书是特别好。

As I recall, the coverage of DAGs in the Dragon book is particularly nice.

正如其他人所指出的,如果你的树最终将通过现有的编译器编译,它是一种徒劳重做什么是已经存在。

As others have noted, if your trees will ultimately be compiled by an existing compiler, it's kind of futile to redo what's already there.

添加

我有一些Java code从一个学生项目(我教),所以破解了如何工作的一个小例子周围铺设。这太长时间后,但见这里的要点。

I had some Java code laying around from a student project (I teach) so hacked up a little example of how this works. It's too long to post, but see the Gist here.

您的输入运行它打印下面的DAG。在括号中的数字是(唯一的ID,DAG母公司数)。父计数需要决定何时计算本地临时变量,当只使用前pression为一个节点。

Running it on your input prints the DAG below. The numbers in parens are (unique id, DAG parent count). The parent count is needed to decide when to compute the local temp variables and when to just use the expression for a node.

Binary OR (27,1)
  lhs:
    Binary OR (19,1)
      lhs:
        Binary GREATER (9,1)
          lhs:
            Binary ADD (7,2)
              lhs:
                Binary MULTIPLY (3,2)
                  lhs:
                    Id 'Bar' (1,1)
                  rhs:
                    Number 5 (2,1)
              rhs:
                Binary MULTIPLY (6,1)
                  lhs:
                    Id 'Baz' (4,1)
                  rhs:
                    Number 2 (5,1)
          rhs:
            Number 7 (8,1)
      rhs:
        Binary LESS (18,1)
          lhs:
            ref to Binary ADD (7,2)
          rhs:
            Number 3 (17,2)
  rhs:
    Binary EQUALS (26,1)
      lhs:
        Binary ADD (24,1)
          lhs:
            ref to Binary MULTIPLY (3,2)
          rhs:
            ref to Number 3 (17,2)
      rhs:
        Id 'Xyz' (25,1)

然后,它会生成此code:

Then it generates this code:

t3 = (Bar) * (5);
t7 = (t3) + ((Baz) * (2));
return (((t7) > (7)) || ((t7) < (3))) || (((t3) + (3)) == (Xyz));

您可以看到,临时VAR数字对应DAG节点。你可以让code生成更复杂摆脱不必要的括号,但我会留给他人。

You can see that the temp var numbers correspond to DAG nodes. You could make the code generator more complex to get rid of the unnecessary parentheses, but I'll leave that for others.