算法面试算法

2023-09-11 02:48:58 作者:爱做梦dě年纪

我今天去面试,被问到这样一个问题:

I went to an interview today and was asked this question:

假设你有1个十亿的整数这是排序的一个磁盘上的文件。你将如何确定最大的100号码?

Suppose you have 1 billion integers which are unsorted on one disk file. How would you determine the largest 100 numbers?

任何人谁都有想法,欢迎交流!

Anyone who has ideas, please share!

推荐答案

这是我最初的算法:

create array of size 100 [0..99].
read first 100 numbers and put into array.
sort array in ascending order.
while more numbers in file:
    get next number N.
    if N > array[0]:
        if N > array[99]:
            shift array[1..99] to array[0..98].
            set array[99] to N.
        else
            find, using binary search, first index i where N <= array[i].
            shift array[1..i-1] to array[0..i-2].
            set array[i-1] to N.
        endif
    endif
endwhile

这有(很轻微的)好处是,有没有为O(n ^ 2)重排为第100个元素,只是一个为O​​(n log n)的排序,并且您很快识别并扔掉那些太小。它还采用了二进制搜索(7比较最大值)找到正确的插入点而不是50(平均)为一个简单的线性搜索(不,我建议任何人递上这样的解决方案,只是它可以即时$ P PSS面试官$)。

This has the (very slight) advantage is that there's no O(n^2) shuffling for the first 100 elements, just an O(n log n) sort and that you very quickly identify and throw away those that are too small. It also uses a binary search (7 comparisons max) to find the correct insertion point rather than 50 (on average) for a simplistic linear search (not that I'm suggesting anyone else proffered such a solution, just that it may impress the interviewer).

您甚至可以得到加分情况下,建议使用优化的memcpy C语言提供了可以操作一定的重叠是没有问题的。

You may even get bonus points for suggesting the use of optimised shift operations like memcpy in C provided you can be sure the overlap isn't a problem.

你可能要考虑的一个另一种可能性是保持三个列表(每个高达100整数):

One other possibility you may want to consider is to maintain three lists (of up to 100 integers each):

read first hundred numbers into array 1 and sort them descending.
while more numbers:
    read up to next hundred numbers into array 2 and sort them descending.
    merge-sort lists 1 and 2 into list 3 (only first (largest) 100 numbers).
    if more numbers:
        read up to next hundred numbers into array 2 and sort them descending.
        merge-sort lists 3 and 2 into list 1 (only first (largest) 100 numbers).
    else
        copy list 3 to list 1.
    endif
endwhile

我不知道,但是这可能最终会被比的不断洗牌更有效率。

I'm not sure, but that may end up being more efficient than the continual shuffling.

合并排序是沿着线条简单的选择(对于合并排序表1和2到3):

The merge-sort is a simple selection along the lines of (for merge-sorting lists 1 and 2 into 3):

list3.clear()
while list3.size() < 100:
    while list1.peek() >= list2.peek():
        list3.add(list1.pop())
    endwhile
    while list2.peek() >= list1.peek():
        list3.add(list2.pop())
    endwhile
endwhile

简单地说,凭借他们已经按降序排列的事实拉动百强值出组合列表中。我没有详细检查是否会更有效,我只是提供它作为一种可能性。

Simply put, pulling the top 100 values out of the combined list by virtue of the fact they're already sorted in descending order. I haven't checked in detail whether that would be more efficient, I'm just offering it as a possibility.

我怀疑面试官会pssed与潜在的IM $ P $开箱即用的思想,而且你会说,它应该是业绩评估的事实。

I suspect the interviewers would be impressed with the potential for "out of the box" thinking and the fact that you'd stated that it should be evaluated for performance.

与大多数的访谈,技术技能的之一,他们正在寻找的东西的

As with most interviews, technical skill is one of the the things they're looking at.