什么是最快捷的方式找到[我] =我在一个排序的数组?我在、数组、快捷、方式

2023-09-11 04:54:52 作者:浮梦

由于数组 A [] ,这将是最有效的方式来决定是否至少有一个元素满足条件 A [1] ==我

Given an array a[], what would be the most efficient way to determine whether or not at least one element i satisfies the condition a[i] == i?

所有数组中的元素进行排序和不同的,但它们不一定整数类型(也就是他们可能浮点类型)。

All the elements in the array are sorted and distinct, but they aren't necessarily integer types (i.e. they might be floating point types).

推荐答案

几个人作出有关的排序的相关要求,独特的和不必为整数。实际上,适当选择一个有效的算法来解决这个问题的这些特征的铰链。一种更有效的算法将是可能的,如果我们可以知道数组中的值都是不同的和整体的,而较不有效的算法将需如果值可能是不鲜明的,无论是否它们是一体的。当然,如果阵列是不是已经排序,你可以先排序(按平均复杂度为O(n log n)的),然后用更高效的pre排序算法(即对排序的数组),但在未排序的情况下,这将是更有效只需将数组排序的,并通过它直接比较的线性时间(O(n))的值运行。请注意,无论所选择的算法的是,最好的情况下的性能是O(1)(当第一元件检测包含其索引值);在执行任何算法中的任何时候我们可能会遇到的一个元素,其中 A [1] ==我在这一点上,我们返回true;究竟事项的算法性能方面在这个问题上是如何迅速,我们可以排除所有元素,并声明不存在这样的元素 A [1] ,其中 A [1] ==我

Several people have made claims about the relevance of "sorted", "distinct" and "aren't necessarily integers". In fact, proper selection of an efficient algorithm to solve this problem hinges on these characteristics. A more efficient algorithm would be possible if we could know that the values in the array were both distinct and integral, while a less efficient algorithm would be required if the values might be non-distinct, whether or not they were integral. And of course, if the array was not already sorted, you could sort it first (at average complexity O(n log n)) and then use the more efficient pre-sorted algorithm (i.e. for a sorted array), but in the unsorted case it would be more efficient to simply leave the array unsorted and run through it directly comparing the values in linear time (O(n)). Note that regardless of the algorithm chosen, best-case performance is O(1) (when the first element examined contains its index value); at any point during execution of any algorithm we might come across an element where a[i] == i at which point we return true; what actually matters in terms of algorithm performance in this problem is how quickly we can exclude all elements and declare that there is no such element a[i] where a[i] == i.

这个问题没有说明的排序顺序[] ,这是缺少信息的pretty的关键部分。如果它上升,最坏情况下的复杂性将永远是为O(n),还有我们可以做,使最坏情况的复杂性更好罢了。但是如果排序顺序递减,即使最坏情况下的复杂度为O(log n)的:因为数组中的值是不同的和下降的,只有一个可能的指数,其中 A [1] 可以等于,基本上所有你需要做的是二进制搜索找到交叉点(其中升序索引值越过下降元素值,如果连这样的交叉),并确定是否 A [C] ==ç在交叉点指数值 C 。因为这是pretty的小事,我会进行下去假定为上升排序。有趣的是,如果元件是整数,即使在上升的情况下也存在类似的交叉状的情况(尽管在升的情况下可以有一个以上的 A [I] == I 匹配),因此,如果元素是整数,二进制搜索也将适用于升的情况下,在这种情况下,即使在最坏情况下的性能将是O(log n)的(参见Interview问题 - 在有序阵列型X搜索索引i使得X [我] =我的)。但是,我们不能因为奢侈品在这个版本的问题。

The problem does not state the sort order of a[], which is a pretty critical piece of missing information. If it’s ascending, the worst-case complexity will always be O(n), there’s nothing we can do to make the worst-case complexity better. But if the sort order is descending, even the worst-case complexity is O(log n): since values in the array are distinct and descending, there is only one possible index where a[i] could equal i, and basically all you have to do is a binary search to find the crossover point (where the ascending index values cross over the descending element values, if there even is such a crossover), and determine if a[c] == c at the crossover point index value c. Since that’s pretty trivial, I’ll proceed assuming that the sort order is ascending. Interestingly if the elements were integers, even in the ascending case there is a similar "crossover-like" situation (though in the ascending case there could be more than one a[i] == i match), so if the elements were integers, a binary search would also be applicable in the ascending case, in which case even the worst-case performance would be O(log n) (see Interview question - Search in sorted array X for index i such that X[i] = i). But we aren’t given that luxury in this version of the problem.

下面是我们如何解决这个问题:

Here is how we might solve this problem:

开始的第一个元素, A [0] 。如果其值 == 0 ,你找到了一个元素而满足 A [1] ==我等等返回true。如果其值< 1 ,下一个元素( A [1] )也可能包含值 1 ,让你进入下一个索引。但是,如果 A [0]> = 1 ,你知道(因为这些值是不同的)的条件 A [1] == 1 不可能是真实的,那么你可以跳过指数 1 。但是,你甚至可以做得比这更好的:例如,如果 A [0] == 12 ,你知道(因为这些值按升序排序),有没可能为满足所有元素 A [1] ==我之前元素 A [13] 。因为数组中的值可以是不完整的,我们不能在这一点上的任何进一步的假设,因此下一个元素,我们可以跳过,直接为 A [13] (例如: A [1] A [12] 可能都含有 12.000之间的值。 .. 13.000 ... ,使得 A [13] 仍然完全相等 13 ,所以我们必须检查它)。

Begin with the first element, a[0]. If its value is == 0, you’ve found an element which satisfies a[i] == i so return true. If its value is < 1, the next element (a[1]) could possibly contain the value 1, so you proceed to the next index. If, however, a[0] >= 1, you know (because the values are distinct) that the condition a[1] == 1 cannot possibly be true, so you can safely skip index 1. But you can even do better than that: For example, if a[0] == 12, you know (because the values are sorted in ascending order) that there cannot possibly be any elements that satisfy a[i] == i prior to element a[13]. Because the values in the array can be non-integral, we cannot make any further assumptions at this point, so the next element we can safely skip to directly is a[13] (e.g. a[1] through a[12] may all contain values between 12.000... and 13.000... such that a[13] could still equal exactly 13, so we have to check it).

继续这一过程产生的算法如下:

Continuing that process yields an algorithm as follows:

// Algorithm 1
bool algorithm1(double* a, int len)
{
    for (int i=0; i<len; ++i) // worst case is O(n)
    {
        if (a[i] == i)
            return true; // of course we could also return i here (as an int)...
        if (a[i] > i)
            i = static_cast<size_t>(std::floor(a[i]));
    }
    return false; // ......in which case we’d want to return -1 here (an int)
}

这有pretty的良好表现,如果许多中值 A [] 大于它们的索引值,并具有极佳的性能,如果所有值 A [] 是大于n(只有一个迭代后返回假的!),但它有令人沮丧的性能,如果所有的值都小于它们的索引值(它会返回否后假迭代)。所以我们回到绘图板......但我们需要的是一个轻微的调整。考虑到该算法可以被写入到从n个下降到0向后扫描一样容易,因为它可以向前扫描从0到n。如果我们把迭代来自向中间端的逻辑,我们得到的算法如下:

This has pretty good performance if many of the values in a[] are greater than their index value, and has excellent performance if all values in a[] are greater than n (it returns false after only one iteration!), but it has dismal performance if all values are less than their index value (it will return false after n iterations). So we return to the drawing board... but all we need is a slight tweak. Consider that the algorithm could have been written to scan backwards from n down to 0 just as easily as it can scan forward from 0 to n. If we combine the logic of iterating from both ends toward the middle, we get an algorithm as follows:

// Algorithm 2
bool algorithm2(double* a, int len)
{
    for (int i=0, j=len-1; i<j; ++i,--j) // worst case is still O(n)
    {
        if (a[i]==i || a[j]==j)
            return true;
        if (a[i] > i)
            i = static_cast<size_t>(std::floor(a[i]));
        if (a[j] < j)
            j = static_cast<size_t>(std::ceil(a[j]));
    }
    return false;
}

这具有在两个极端的情况下的优良性能(所有值都大于n小于0或更大),并具有其中pretty的多值的任何其它分布pretty的良好的性能。最坏的情况是,如果所有在阵列的下半部分中的值都小于它们的索引和所有在上半部的值都大于它们的索引,在这种情况下,性能降低到在最坏情况下的O( N)。最好的情况下(无论是极端的情况)为O(1),而平均情况下可能是O(log n)的,但我推迟到有人用数学专业判断与把握。

This has excellent performance in both of the extreme cases (all values are less than 0 or greater than n), and has pretty good performance with pretty much any other distribution of values. The worst case is if all of the values in the lower half of the array are less than their index and all of the values in the upper half are greater than their index, in which case the performance degrades to the worst-case of O(n). Best case (either extreme case) is O(1), while average case is probably O(log n) but I’m deferring to someone with a math major to determine that with certainty.

有几个人提出了分而治之的方法解决问题,而不指定如何这个问题可以划分和什么人会用递归地分割子问题做。当然,这样一个不完整的答案很可能无法满足面试官。上述算法2的幼稚线性算法和最坏情况下的性能均为O(n)的,而算法2提高通过跳过(未审查)的元素时,它可以在平均情况下的性能,以(可能)为O(log n)的。分而治之的做法只能跑赢大盘的算法2,如果在一般情况下,在某种程度上可以跳过比算法2更多的元素可以跳过。假设我们划分问题通过将所述阵列分成两个(几乎)相等的连续的两半,递归,和决定是否,用得到的子问题,我们很可能能够跳过更多的元素比算法2可以跳过,特别是在算法2的最坏情况。对于本次讨论的其余部分,我们假设一个输入,这将是最坏的情况下算法2.第一次分裂之后,我们可以检查两个半顶部和放大器;对于相同的极端情况下导致在O(1)表现为算法2底部的元素,但在与两个半部结合为O(n)的性能结果。这将是这种情况,如果在下半区所有元素都小于0,并在上半部分的所有元素都大于n-1。在这种情况下,我们可以立即排除底部和/或上半部分与O(1)性能,任何一半,我们可以排除。当然,可以不排除由该测试任何一半的性能仍然进一步进行递归,除以一半由半再次直到我们找到任何段,其顶部或底部元件包含其索引值之后确定。这是在算法2相当不错的性能改进,但它发生在算法2的最糟糕的情况下,只有某些特殊情况。所有我们与分而治之的做的是减少的问题空间,唤起最坏情况下的行为(略)的比例。还有最坏情况的分而治之,他们完全匹配大部分的问题空间,唤起最坏情况下的行为的算法2。

Several people have suggested a "divide and conquer" approach to the problem, without specifying how the problem could be divided and what one would do with the recursively divided sub-problems. Of course such an incomplete answer would probably not satisfy the interviewer. The naïve linear algorithm and worst-case performance of algorithm 2 above are both O(n), while algorithm 2 improves the average-case performance to (probably) O(log n) by skipping (not examining) elements whenever it can. The divide-and-conquer approach can only outperform algorithm 2 if, in the average case, it is somehow able to skip more elements than algorithm 2 can skip. Let’s assume we divide the problem by splitting the array into two (nearly) equal contiguous halves , recursively, and decide if, with the resulting sub-problems, we are likely to be able to skip more elements than algorithm 2 could skip, especially in algorithm 2’s worst case. For the remainder of this discussion, let’s assume an input that would be worst-case for algorithm 2. After the first split, we can check both halves’ top & bottom elements for the same extreme case that results in O(1) performance for algorithm2, yet results in O(n) performance with both halves combined. This would be the case if all elements in the bottom half are less than 0 and all elements in the upper half are greater than n-1. In these cases, we can immediately exclude the bottom and/or top half with O(1) performance for any half we can exclude. Of course the performance of any half that cannot be excluded by that test remains to be determined after recursing further, dividing that half by half again until we find any segment whose top or bottom element contains its index value. That’s a reasonably nice performance improvement over algorithm 2, but it occurs in only certain special cases of algorithm 2’s worst case. All we’ve done with divide-and-conquer is decrease (slightly) the proportion of the problem space that evokes worst-case behavior. There are still worst-case scenarios for divide-and-conquer, and they exactly match most of the problem space that evokes worst-case behavior for algorithm 2.

因此​​,考虑到分而治之算法具有较小的最坏的情况,是不是意义的继续使用分而治之的办法?

So, given that the divide-and-conquer algorithm has less worst-case scenarios, doesn’t it make sense to go ahead and use a divide-and-conquer approach?

在一个字,没有。嗯,也许吧。如果你知道了前面,大约有一半的数据小于0,一半是大于n,这种特殊情况下一般会票价与分而治之的方法要好。或者,如果你的系统是多核和你的n是大的,它可能会有所帮助平摊所有内核之间的问题,但一旦它的分裂他们之间,我认为每个核心的子问题可能是最好的解决了上述的算法2,避免问题的​​进一步分化,当然避免递归,因为我认为下面......

In a word, no. Well, maybe. If you know up front that about half of your data is less than 0 and half is greater than n, this special case would generally fare better with the divide-and-conquer approach. Or, if your system is multicore and your ‘n’ is large, it might be helpful to split the problem evenly between all of your cores, but once it’s split between them, I maintain that the sub-problems on each core are probably best solved with algorithm 2 above, avoiding further division of the problem and certainly avoiding recursion, as I argue below....

目前的递归分而治之的方法各递归级别,算法需要一些方法来记住的问题的尚未得到解决下半年,而它递归到了上半场。通常,这是通过使算法递归第一对的一半,然后为其他呼叫本身,这隐含的维护运行时堆栈有关此信息的设计完成的。另一个实现可能通过维持实质上的明确栈同样的信息,避免递归函数调用。在空间增长而言,算法2是O(1),但任何递归实现不可避免地为O(log n)的由于具有上保持某种叠层的这个信息。但是,除了在空间的问题,一个递归实现了记忆,直到时间,因为他们可以递归到的尚未unrecursed-成子问题一半的国家额外的运行时开销。这个运行时开销不是免费的,并且给定的算法2的实现上述的简单起见,我断定这种开销是按比例显著。因此,我认为,上述算法2将全面巴掌任何递归执行案件的绝大多数。

At each recursion level of a recursive divide-and-conquer approach, the algorithm needs some way to remember the as-yet-unsolved 2nd half of the problem while it recurses into the 1st half. Often this is done by having the algorithm recursively call itself first for one half and then for the other, a design which maintains this information implicitly on the runtime stack. Another implementation might avoid recursive function calls by maintaining essentially this same information on an explicit stack. In terms of space growth, algorithm 2 is O(1), but any recursive implementation is unavoidably O(log n) due to having to maintain this information on some sort of stack. But aside from the space issue, a recursive implementation has extra runtime overhead of remembering the state of as-yet-unrecursed-into subproblem halves until such time as they can be recursed into. This runtime overhead is not free, and given the simplicity of algorithm 2’s implementation above, I posit that such overhead is proportionally significant. Therefore I suggest that algorithm 2 above will roundly spank any recursive implementation for the vast majority of cases.