一个良好的64位哈希函数在Java中的文本字符串是什么?字符串、函数、文本、良好

2023-09-07 17:56:54 作者:温眸。

我正在寻找一个散列函数:

在哈希表的文本字符串以及(如一些碰撞) 是用Java编写的,并广泛应用于 奖励:(串联他们将散列的连接字符串,而不是我)在几个领域的作品 奖励:拥有128位的变体 奖励:无CPU密集型 解决方案

你为什么不使用默认的字符串。哈希code()(其中一些很聪明的家伙肯定会把精力使其有效 - 不提数以千计的开发者的目光已经看了这个code)

  //改编自String.hash code()
公共静态长哈希(字符​​串字符串){
  长H = 1125899906842597L; //黄金
  INT LEN = string.length减();

  的for(int i = 0; I< LEN;我++){
    H = 31 * H + string.charAt(ⅰ);
  }
  返回H;
}
 

如果您正在寻找更多的位,你很可能使用的BigInteger 编辑:

正如我在一个注释@brianegge的答复中提到,有没有太多的用例的哈希有超过32位,最有可能不是一个单一的哈希有超过64位:

  

我能想象一个巨大的哈希表分布在几十个服务器,也许存储数百亿映射。对于这样的情况下,@brianegge仍然有一个有效的点这里:32位允许2 ^ 32(约4.3十亿)不同的哈希键。假设一个强大的算法,你应该还是有比较少的碰撞。随着64位(18446744073十亿不同的密钥)你肯定节省,无论任何疯狂的情况下,您需要它。用例的128位密钥(340,282,366,920,938,463,463,374,607,431十亿可能的密钥)不可能尽管有很多思考是pretty的。

要结合哈希几个领域,简单地做一个异或乘以一个具有黄金,并将它们添加:

 长哈希= MyHash.hash(字符串1)* 31 + MyHash.hash(字符串2);
 
什么是哈希函数

小的素数是在那里,以避免相同的散列code为交换价值,即{'富','酒吧'}和{'巴','富'}是不相等的,应该有不同的哈希code。 XOR是坏的,因为它返回0,如果这两个值相等。因此,{'富','富'}和{'巴','酒吧'}将具有相同的哈希code。

I'm looking for a hash function that:

Hashes textual strings well (e.g. few collisions) Is written in Java, and widely used Bonus: works on several fields (instead of me concatenating them and applying the hash on the concatenated string) Bonus: Has a 128-bit variant. Bonus: Not CPU intensive.

解决方案

Why don't you use a long variant of the default String.hashCode() (where some really smart guys certainly put effort into making it efficient - not mentioning the thousands of developer eyes that already looked at this code)?

// adapted from String.hashCode()
public static long hash(String string) {
  long h = 1125899906842597L; // prime
  int len = string.length();

  for (int i = 0; i < len; i++) {
    h = 31*h + string.charAt(i);
  }
  return h;
}

If you're looking for even more bits, you could probably use a BigInteger Edit:

As I mentioned in a comment to the answer of @brianegge, there are not much usecases for hashes with more than 32 bits and most likely not a single one for hashes with more than 64 bits:

I could imagine a huge hashtable distributed across dozens of servers, maybe storing tens of billions of mappings. For such a scenario, @brianegge still has a valid point here: 32 bit allow for 2^32 (ca. 4.3 billion) different hash keys. Assuming a strong algorithm, you should still have quite few collisions. With 64 bit (18,446,744,073 billion different keys) your certainly save, regardless of whatever crazy scenario you need it for. Thinking of usecases for 128 bit keys (340,282,366,920,938,463,463,374,607,431 billion possible keys) is pretty much impossible though.

To combine the hash for several fields, simply do an XOR multiply one with a prime and add them:

long hash = MyHash.hash(string1) * 31 + MyHash.hash(string2);

The small prime is in there to avoid equal hash code for switched values, i.e. {'foo','bar'} and {'bar','foo'} aren't equal and should have a different hash code. XOR is bad as it returns 0 if both values are equal. Therefore, {'foo','foo'} and {'bar','bar'} would have the same hash code.