为什么字符串GetHash code只能处理每四个字符?字符串、字符、GetHash、code

2023-09-11 05:55:11 作者:软兔酱

我一直在阅读这篇文章,因为它是由Jon飞碟双向这个答案。我想真正得到散列如何工作的理解,为什么乔恩喜欢的算法,他提供了这么多。我不是说有一个问题的答案,但我确实有关于底部的具体问题 System.String 实施 GetHash code

考虑code,注重注释<<<<< ========== 行:

 公众覆盖不安全INT GetHash code()
{
  如果(HashHelpers.s_UseRandomizedStringHashing)
    返回string.InternalMarvin32HashString(这一点,this.Length,0L);
  固定(字符* chPtr =本)
  {
    INT NUM1 = 352654597;
    INT NUM2 = NUM​​1;
    INT * numPtr =(INT *)chPtr;
    INT长度= this.Length;
    而(长度大于2)
    {
      NUM1 =(NUM1&其中;小于5)+ NUM1 +(NUM1>> 27)^ * numPtr;
      NUM2 =(NUM2&其中;小于5)+ NUM2 +(NUM2>> 27)^ numPtr [1];
      numPtr + = 2;
      长度 -  = 4; <<<<< ==========
    }
    如果(长度大于0)
      NUM1 =(NUM1&其中;小于5)+ NUM1 +(NUM1>> 27)^ * numPtr;
    返回NUM1 + NUM2 * 1566083941;
  }
}
 

为什么他们只处理每个第四个字符?而且,如果你愿意的是,为什么他们从右向左处理它? 解决方案   

为什么他们只处理每个第四个字符?而且,如果你愿意的是,为什么他们从右向左处理它?

他们没有做任何。他们正在处理的字符作为对整数值(的注意,他们使用 * numPtr numPtr [1] 的while循环)。两个的Int32 值采用相同的空间,4个字符,这就是为什么他们每次减去4的长度。

这是由前至后(数组顺序)进行处理,但长度减少,因为它是重presenting字符串的剩余的处理的长度。这意味着它们从左至右在4个字符块的处理在一个时间的同时尽可能

Leetcode 387 字符串中的第一个唯一字符 哈希

I've been reading this article because it was linked by Jon Skeet on this answer. I'm trying to really get an understanding of how hashing works and why Jon likes the algorithm he provided so much. I'm not claiming to have an answer to that yet, but I do have a specific question about the base System.String implementation of GetHashCode.

Consider the code, focusing on the annotated <<<<<========== line:

public override unsafe int GetHashCode()
{
  if (HashHelpers.s_UseRandomizedStringHashing)
    return string.InternalMarvin32HashString(this, this.Length, 0L);
  fixed (char* chPtr = this)
  {
    int num1 = 352654597;
    int num2 = num1;
    int* numPtr = (int*) chPtr;
    int length = this.Length;
    while (length > 2)
    {
      num1 = (num1 << 5) + num1 + (num1 >> 27) ^ *numPtr;
      num2 = (num2 << 5) + num2 + (num2 >> 27) ^ numPtr[1];
      numPtr += 2;
      length -= 4;   <<<<<==========
    }
    if (length > 0)
      num1 = (num1 << 5) + num1 + (num1 >> 27) ^ *numPtr;
    return num1 + num2 * 1566083941;
  }
}

Why do they only process every fourth character? And, if you're willing enough, why do they process it from right to left?

解决方案

Why do they only process every fourth character? And, if you're willing enough, why do they process it from right to left?

They're not doing either. They're processing the characters as pairs of integer values (note that they use *numPtr and numPtr[1] in the while loop). Two Int32 values takes the same space as 4 characters, which is why they're subtracting 4 from the length each time.

This is processed from front to back (in array order), but length is decremented since it's representing the length of the string remaining to process. This means they're processing from left to right in "blocks of 4 characters" at a time while possible.