计算一组线性时间的方式(最常见的元素)?线性、最常见、元素、方式

2023-09-11 04:37:59 作者:英文昵称

在这本书的算法设计手册由Skiena,计算的模式的一组(最常见的元素),据说有一个Ω( N 的日志 N 的)下限(这让我为难),而且(正确的我猜的),没有更快的最坏情况下的算法存在计算模式。我只是在下界是Ω困惑( N 的日志的 N 的)。

  

请参阅本书上谷歌图书

但可以肯定,这可能在某些情况下以线性时间(最好情况),例如可以计算在Java code像下面(发现的最常见的字符的字符串),该绝招是计算使用哈希表事件再度发生。这似乎是显而易见的。

那么,我在我的问题的理解缺失?

编辑:(解开了谜底)作为StriplingWarrior指出,下界持有,如果只比较时,即没有索引的内存,另见:的 http://en.wikipedia.org/wiki/Element_distinctness_problem

  //线性时间
焦炭computeMode(字符串输入){
  //初始化CURRENTMODE到第一个字符
  的char []字符= input.toCharArray();
  焦炭CURRENTMODE =字符[0];
  INT currentModeCount = 0;
  HashMap的<性格,整数GT;数=新的HashMap<性格,整数GT;();
  对于(CHAR字符:字符){
    诠释计数= putget(计数,字符); //事件再度发生至今
    //测试是否性格应该是新CURRENTMODE
    如果(计数> currentModeCount){
      CURRENTMODE =字符;
      currentModeCount =计数; //也节省了数
    }
  }
  返回CURRENTMODE;
}

//定时间
INT putget(HashMap的<性格,整数GT;地图,CHAR字符){
  如果(!map.containsKey(字符)){
    //如果角色没有见过的,初始化为0
    map.put(字符,0);
  }
 //增量
  INT为newValue = map.get(字符)+ 1;
  map.put(字符,为newValue);
  返回为newValue;
}
 
使用Excel进行线性插值的两种计算方法

解决方案

笔者似乎是根据他的逻辑上的的假设比较的是提供给你的唯一操作。使用基于散列的数据结构的排序的得到解决这个通过减少需要做的大多数情况下的比较的地步,你基本上可以做到这一点在固定时间的可能性

但是,如果数字是钦点总是产生散列冲突,你最终会有效地将你的哈希设置成一个列表,这将使你的算法为O(N²)。正如作者所指出的那样,只是排序值成一个列表首先提供最好的保证的算法,即使在大多数情况下,哈希集合将是preferable。

In the book "The Algorithm Design Manual" by Skiena, computing the mode (most frequent element) of a set, is said to have a Ω(n log n) lower bound (this puzzles me), but also (correctly i guess) that no faster worst-case algorithm exists for computing the mode. I'm only puzzled by the lower bound being Ω(n log n).

See the page of the book on Google Books

But surely this could in some cases be computed in linear time (best case), e.g. by Java code like below (finds the most frequent character in a string), the "trick" being to count occurences using a hashtable. This seems obvious.

So, what am I missing in my understanding of the problem?

EDIT: (Mystery solved) As StriplingWarrior points out, the lower bound holds if only comparisons are used, i.e. no indexing of memory, see also: http://en.wikipedia.org/wiki/Element_distinctness_problem

// Linear time
char computeMode(String input) {
  // initialize currentMode to first char
  char[] chars = input.toCharArray();
  char currentMode = chars[0];
  int currentModeCount = 0;
  HashMap<Character, Integer> counts = new HashMap<Character, Integer>();
  for(char character : chars) {
    int count = putget(counts, character); // occurences so far
    // test whether character should be the new currentMode
    if(count > currentModeCount) {
      currentMode = character;
      currentModeCount = count; // also save the count
    }
  }
  return currentMode;
}

// Constant time
int putget(HashMap<Character, Integer> map, char character) {
  if(!map.containsKey(character)) {
    // if character not seen before, initialize to zero
    map.put(character, 0);
  }
 // increment
  int newValue = map.get(character) + 1;
  map.put(character, newValue);
  return newValue;
}

解决方案

The author seems to be basing his logic on the assumption that comparison is the only operation available to you. Using a Hash-based data structure sort of gets around this by reducing the likelihood of needing to do comparisons in most cases to the point where you can basically do this in constant time.

However, if the numbers were hand-picked to always produce hash collisions, you would end up effectively turning your hash set into a list, which would make your algorithm into O(n²). As the author points out, simply sorting the values into a list first provides the best guaranteed algorithm, even though in most cases a hash set would be preferable.