面试问题:寻找中位数从兆数的整数中位数、整数、问题

2023-09-11 02:50:17 作者:浅色、记忆づ

,其中包含有整数10G(十亿)数的文件时,请找到这些整数的位数。您将得到2G的内存来做到这一点。任何人都可以拿出一个合理的方式?谢谢!

There is a file that contains 10G(1000000000) number of integers, please find the Median of these integers. you are given 2G memory to do this. Can anyone come up with an reasonable way? thanks!

推荐答案

创建了8个字节的多头有2 ^ 16个条目的数组。把你输入的号码,转移断底十六位,并创建一个直方图。

Create an array of 8-byte longs that has 2^16 entries. Take your input numbers, shift off the bottom sixteen bits, and create a histogram.

现在你算在了直方图,直到你达到这个覆盖值的中点垃圾桶。

Now you count up in that histogram until you reach the bin that covers the midpoint of the values.

穿越再次,忽略不具有相同的机顶的比特的所有数字,和使底部比特的直方图。

Pass through again, ignoring all numbers that don't have that same set of top bits, and make a histogram of the bottom bits.

向上计数通过直方图,直到你达到覆盖值(的整个列表)的中点垃圾桶。

Count up through that histogram until you reach the bin that covers the midpoint of the (entire list of) values.

现在你知道中位数,在 O(N)时间和 O(1)空间(在实践中根据1 MB)。

Now you know the median, in O(n) time and O(1) space (in practice, under 1 MB).

下面是一些示例斯卡拉code,做这样的:

Here's some sample Scala code that does this:

def medianFinder(numbers: Iterable[Int]) = {
  def midArgMid(a: Array[Long], mid: Long) = {
    val cuml = a.scanLeft(0L)(_ + _).drop(1)
    cuml.zipWithIndex.dropWhile(_._1 < mid).head
  }
  val topHistogram = new Array[Long](65536)
  var count = 0L
  numbers.foreach(number => {
    count += 1
    topHistogram(number>>>16) += 1
  })
  val (topCount,topIndex) = midArgMid(topHistogram, (count+1)/2)
  val botHistogram = new Array[Long](65536)
  numbers.foreach(number => {
    if ((number>>>16) == topIndex) botHistogram(number & 0xFFFF) += 1
  })
  val (botCount,botIndex) =
    midArgMid(botHistogram, (count+1)/2 - (topCount-topHistogram(topIndex)))
  (topIndex<<16) + botIndex
}

和这里是工作的一个小的输入数据集:

and here it is working on a small set of input data:

scala> medianFinder(List(1,123,12345,1234567,123456789))
res18: Int = 12345

如果您有64位整数存储,您可以使用相同的策略,在4传递来代替。

If you have 64 bit integers stored, you can use the same strategy in 4 passes instead.