第n个最小的数间大小两个数据库n分别采用分而治之分而治之、最小、大小、两个

2023-09-11 00:03:59 作者:朕笶憶葒顏

我们有一个包含数字不重复大小为n的两个数据库。因此,在总我们有2n个元素。它们可以通过一个查询访问一个数据库的时间。查询是这样的,你给它AK和返回第k个在该数据库中最小的条目。我们需要找到所有的2n个元素中的O(LOGN)查询第n个最小项。这个想法是利用分而治之,但我需要一些帮助,通过这个想法。谢谢!

we have two databases of size n containing numbers without repeats. So, in total we have 2n elements. They can be accessed through a query to one database at a time. The query is such that you give it a k and it returns kth smallest entry in that database. we need to find nth smallest entry among all the 2n elements in O(logn) queries. the idea is to use divide and conquer but i need some help thinking through this. thanks!

推荐答案

下面是我如何通过认为这。因为它是一个教育的问题,我建议你停止阅读,如果任何一个点让你觉得,啊哈,我没有想到的是,我还可以想沿着这些线路由我自己。

Here's how I thought this through. Since it's an educational problem, I suggest you stop reading if any of the points makes you think, "aha, I hadn't thought of that, I can think further along those lines by myself".

1)同样的观察为sdcwc:有可能略有纠缠于它是否开始于0或1,数据库可以被认为是一个排序的数组。我问元素0,我得到了最小。我问12,我得到的第13届最小。等等。我觉得这是更容易显现。

1) Same observation as sdcwc: with a possible slight quibble over whether it starts at 0 or 1, the database can be thought of as a sorted array. I ask for element 0, I get the smallest. I ask for 12, I get the 13th smallest. And so on. I find this easier to visualise.

2)我们知道,我们正在寻找一个O(log n)的算法。这意味着,粗略地说,我们正在寻找一两件事情:

2) We know we're looking for an O(log n) algorithm. That means that roughly speaking we're looking for one of two things:

要么我们开始了通过计算最小的元素,然后计算最小的2号,4号最小,8,等等,直到我们达到我们想要的大小,在固定时间内的每一步。这听起来是不是有希望在我身上。

Either we start out by calculating the smallest element, then calculate the 2nd smallest, 4th smallest, 8th, etc, until we're up to the size we want, each step in constant time. This doesn't sound promising to me.

或者我们开始时规模为n的问题,我们还进行了一些固定时间的操作,它可以让我们通过解决大小为N / 2的问题,解决了原来的问题。很显然,我们就可以解决问题了N = 1在固定时间,以完成该算法。这听起来更可信一点。

Or we start out with a problem of size n then we perform some constant-time operation which allows us to solve the original problem by solving a problem of size n/2. Obviously we can solve the problem for n=1 in constant time, to complete the algorithm. This sounds a bit more plausible.

实际上它并不一定必须是π/ 2各一次。它可以为n / 3,或999 * N / 1000,其结果仍然是O(log n)的。但是,没有任何坏处找N / 2首。

Actually it doesn't necessarily have to be n/2 each time. It could be n/3, or 999*n/1000, and the result will still be O(log n). But there's no harm in looking for n/2 first.

3)我们该如何减少这样的问题呢?那么,如果我们可以折价/从一个阵列的开始或其他为比第k个最小的小删除m个元素,那么我们就可以找到(公里)所产生的对阵列的次最小的元素,这将是第k个最小的原阵列。

3) How are we going to reduce the problem like that? Well, if we can discount/remove m elements from the start of one array or the other as being smaller than the kth smallest, then we can find the (k-m)th smallest element of the resulting pair of arrays, and it will be the kth smallest of the original arrays.

4)最后,突破观察是,如果阵列A的第m个最小的元件比阵列B的第m个最小的元件越小,则A的第m个元素不可能是第(2m)两者的个最小的元素阵列组合。它的小于(等于价值或:我不知道是否不重复的意思是没有重复每个数据库中,或数据库组合之间没有重复),因为最多有2 *(M- 1)元素严格小于它在两个阵列组合。

4) Finally, the breakthrough observation is that if the mth smallest element of array A is smaller than the mth smallest element of array B, then that mth element of A cannot possibly be the (2m)th smallest element of the two arrays combined. It's smaller than that (or of equal value: I'm not sure whether "no repeats" means "no repeats in each database", or "no repeats between the databases combined"), since there are at most 2*(m-1) elements strictly smaller than it in both arrays combined.

除非我犯了一个错误,剩下的就是编码。随着一个小的额外参数占的off-by-1时,k为奇数,这种解决方案实际上是为O(log k)的,这是O(log n)的,因为K< = 2 * N

Unless I've made an error, the rest is coding. With a small extra argument to account for the off-by-1 when k is odd, this solution is actually O(log k), which is O(log n) since k <= 2*n.