哈希树结构结构、哈希树

2023-09-10 23:00:03 作者:↗殇↖

我刚刚碰到的场景在我的项目在那里我需要与已知实例平等比较不同的树对象,并认为某种散列算法的,经营上的任意树将是非常有用的。

举个例子为以下三种:

        Ø
       / \
      / \
     ○○
    / | \ |
   / | \ |
  ○○○○
          / \
         / \
        ○○

其中每个 0 重presents树的节点,是一个任意的对象,已经有相关的哈希函数。所以,问题简化为:给定的树结构的节点的哈希值code,和已知的结构,什么是体面的算法计算(相对)无碰撞的哈希值$ C $下整个树? / P>

在哈希函数的性质的几个注意事项:

散列函数应取决于树中的每个节点的哈希code以及它的位置。 重新排列一个节点的孩子的应的明显改变的结果散列code。 在反映任何部分树的应的明显改变的结果散列code

如果有帮助,我在我的项目使用C#在这里4.0,但我主要找一个理论上的解决方案,所以伪code,描述或code另一种命令式语言会没事的。

更新

好了,这是我自己提出的解决方案。它已经帮助了很多由几个在这里的答案。

从字典树到可持续化字典树再到可持续化线段树

每个节点(子树/叶节点)具有以下散列函数:

 公众覆盖INT GetHash code()
{
    INT散列code =选中((this.Symbol.GetHash code()* 31 +
        this.Value.GetHash code()));
    的for(int i = 0; I< this.Children.Count;我++)
        哈希code =选中(哈希code * 31 + this.Children [I] .GetHash code());
    返回哈希code;
}
 

有关此方法的好处,在我看来,是哈希codeS可以缓存,并重新计算只有当节点或其后代的变化之一。 (感谢vatine和Jason Orendorff指出了这一点)。

不管怎样,我将不胜感激,如果人们能在我这里建议的解决方案发表意见 - 如果它的工作好了,再大,否则,任何可能的改进将受到欢迎。

。 解决方案

如果我要做到这一点,我可能会做一些这样的:

有关的每个叶节点,计算为0的串联和节点数据的散列

有关每个内部节点,计算为1的级联和任何本地数据的哈希(注意:可能不适用)。和孩子的散列从左到右

这将导致级联起来每次你改变什么时间树,但可能是足够低的开销是值得的。如果与改变的量的变化是相对不频繁,它甚至可能是有意义的去一个加密安全散列

EDIT1:的还有增加的可能性哈希有效标志到每个节点并简单地传播一个假了树(或散列无效和传播真)上一个节点改变树。通过这种方式,有可能避免一个完整的重新计算时,需要在树散列和可能避免未使用的多个哈希计算,以略少predictable时间的风险来获得需要的时候的散列。

EDIT3:的散列code。通过诺多的问题,建议上看​​起来是有冲突的机会,如果GetHash code中的结果都不能为0。从本质上讲,有没有办法区分单个节点组成的树,用符号散30和价值散列25和一个两节点树,根源在哪里有一个符号散为0,值哈希 30和子节点有25总散列的例子完全是发明的,我不知道有什么期望的散列范围,所以我只能在我的presented code进行评论。

使用31乘常数是很好的,因为它会导致任何溢出发生在非位边界,但我在想,如果有足够的儿童和树可能是对抗性的内容,从项目的哈希贡献散列的早期会被后来的散列项目为主。

但是,如果散列体面的执行预期的数据,它看起来好像它会做的工作。这当然比使用加密散列(如做在下面列出的例子code)更快。

EDIT2:的至于具体的算法,需要最少的数据结构,类似如下(Python的,翻译成其他语言应该是比较容易)

#!在/ usr /斌/包膜蟒蛇

进口Crypto.Hash.SHA

类节点:
    高清__init__(个体经营,母公司=无,内容=,孩子= []):
        self.valid =假
        self.hash =假
        self.contents =内容
        self.children =儿童


    高清append_child(个体经营,小孩):
        self.children.append(子)

        self.invalidate()

    高清无效(个体经营):
        self.valid =假
        如果self.parent:
            self.parent.invalidate()

    高清gethash(个体经营):
        如果self.valid:
            回报self.hash

        沼气池= crypto.hash.SHA.new()

        digester.update(self.contents)

        如果self.children:
            对于孩子在self.children:
                digester.update(child.gethash())
            self.hash =1+ digester.hexdigest()
        其他:
            self.hash =0+ digester.hexdigest()

        回报self.hash

    高清setcontents(个体经营):
        self.valid =假
        返回self.contents

I've just come across a scenario in my project where it I need to compare different tree objects for equality with already known instances, and have considered that some sort of hashing algorithm that operates on an arbitrary tree would be very useful.

Take for example the following tree:

        O
       / \
      /   \
     O     O
    /|\    |
   / | \   |
  O  O  O  O
          / \
         /   \
        O     O

Where each O represents a node of the tree, is an arbitrary object, has has an associated hash function. So the problem reduces to: given the hash code of the nodes of tree structure, and a known structure, what is a decent algorithm for computing a (relatively) collision-free hash code for the entire tree?

A few notes on the properties of the hash function:

The hash function should depend on the hash code of every node within the tree as well as its position. Reordering the children of a node should distinctly change the resulting hash code. Reflecting any part of the tree should distinctly change the resulting hash code

If it helps, I'm using C# 4.0 here in my project, though I'm primarily looking for a theoretical solution, so pseudo-code, a description, or code in another imperative language would be fine.

UPDATE

Well, here's my own proposed solution. It has been helped much by several of the answers here.

Each node (sub-tree/leaf node) has the following hash function:

public override int GetHashCode()
{
    int hashCode = unchecked((this.Symbol.GetHashCode() * 31 +
        this.Value.GetHashCode()));
    for (int i = 0; i < this.Children.Count; i++)
        hashCode = unchecked(hashCode * 31 + this.Children[i].GetHashCode());
    return hashCode;
}

The nice thing about this method, as I see it, is that hash codes can be cached and only recalculated when the node or one of its descendants changes. (Thanks to vatine and Jason Orendorff for pointing this out).

Anyway, I would be grateful if people could comment on my suggested solution here - if it does the job well, then great, otherwise any possible improvements would be welcome.

解决方案

If I were to do this, I'd probably do something like the following:

For each leaf node, compute the concatenation of 0 and the hash of the node data.

For each internal node, compute the concatenation of 1 and the hash of any local data (NB: may not be applicable) and the hash of the children from left to right.

This will lead to a cascade up the tree every time you change anything, but that MAY be low-enough of an overhead to be worthwhile. If changes are relatively infrequent compared to the amount of changes, it may even make sense to go for a cryptographically secure hash.

Edit1: There is also the possibility of adding a "hash valid" flag to each node and simply propagate a "false" up the tree (or "hash invalid" and propagate "true") up the tree on a node change. That way, it may be possible to avoid a complete recalculation when the tree hash is needed and possibly avoid multiple hash calculations that are not used, at the risk of slightly less predictable time to get a hash when needed.

Edit3: The hash code suggested by Noldorin in the question looks like it would have a chance of collisions, if the result of GetHashCode can ever be 0. Essentially, there is no way of distinguishing a tree composed of a single node, with "symbol hash" 30 and "value hash" 25 and a two-node tree, where the root has a "symbol hash" of 0 and a "value hash" of 30 and the child node has a total hash of 25. The examples are entirely invented, I don't know what expected hash ranges are so I can only comment on what I see in the presented code.

Using 31 as the multiplicative constant is good, in that it will cause any overflow to happen on a non-bit boundary, although I am thinking that, with sufficient children and possibly adversarial content in the tree, the hash contribution from items hashed early MAY be dominated by later hashed items.

However, if the hash performs decently on expected data, it looks as if it will do the job. It's certainly faster than using a cryptographic hash (as done in the example code listed below).

Edit2: As for specific algorithms and minimum data structure needed, something like the following (Python, translating to any other language should be relatively easy).

#! /usr/bin/env  python

import Crypto.Hash.SHA

class Node:
    def __init__ (self, parent=None, contents="", children=[]):
        self.valid = False
        self.hash = False
        self.contents = contents
        self.children = children


    def append_child (self, child):
        self.children.append(child)

        self.invalidate()

    def invalidate (self):
        self.valid = False
        if self.parent:
            self.parent.invalidate()

    def gethash (self):
        if self.valid:
            return self.hash

        digester = crypto.hash.SHA.new()

        digester.update(self.contents)

        if self.children:
            for child in self.children:
                digester.update(child.gethash())
            self.hash = "1"+digester.hexdigest()
        else:
            self.hash = "0"+digester.hexdigest()

        return self.hash

    def setcontents (self):
        self.valid = False
        return self.contents

 
精彩推荐
图片推荐