从列表中的元素具有的权重选择ķ随机元素元素、有的、权重、列表中

2023-09-10 22:25:47 作者:只為尔一笑

选择没有任何重量(相等的概率)精美描述here.

Selecting without any weights (equal probabilities) is beautifully described here.

我在想,如果有一种方法,以这种方式转换为加权之一。

I was wondering if there is a way to convert this approach to a weighted one.

我也有兴趣在其他的方法为好。

I am also interested in other approaches as well.

更新:采样的不替换

推荐答案

我知道这是一个非常古老的问题,但我觉得有一个绝招,如果你申请一个小数学为此在O(n)的时间!

I know this is a very old question, but I think there's a neat trick to do this in O(n) time if you apply a little math!

借助指数分布有两个非常有用的属性。

The exponential distribution has two very useful properties.

给定n个从以不同的速率的参数不同指数分布的样品,即一个给定的样品是最小的概率等于其速率参数由所有速率参数的总和除以

Given n samples from different exponential distributions with different rate parameters, the probability that a given sample is the minimum is equal to its rate parameter divided by the sum of all rate parameters.

有记忆。因此,如果你已经知道最小值,则概率任何剩余元素的是第二至min是相同的概率,如果真分钟除去(和从不生成)时,该元素将是新的分钟。这似乎是显而易见的,但我认为一些条件概率的问题,因为它可能不是其他的发行也是如此。

It is "memoryless". So if you already know the minimum, then the probability that any of the remaining elements is the 2nd-to-min is the same as the probability that if the true min were removed (and never generated), that element would have been the new min. This seems obvious, but I think because of some conditional probability issues, it might not be true of other distributions.

使用事实1中,我们知道,选择一个单个元件可以通过生成这些指数分布样本速率参数等于重量,然后选择具有最低值来完成。

Using fact 1, we know that choosing a single element can be done by generating these exponential distribution samples with rate parameter equal to the weight, and then choosing the one with minimum value.

使用的事实中,我们知道,我们不必重新生成指数样本。相反,只产生一个用于每个元素,并采取k个元素具有最低的样品

Using fact 2, we know that we don't have to re-generate the exponential samples. Instead, just generate one for each element, and take the k elements with lowest samples.

查找最低。K可在O(N)来完成。使用 Quickselect 算法找出第k个元素,然后简单地采取另一路经的所有要素和产出比所有较低第k

Finding the lowest k can be done in O(n). Use the Quickselect algorithm to find the k-th element, then simply take another pass through all elements and output all lower than the k-th.

一个有用的注意事项: -ln(RAND())/重量

A useful note: if you don't have immediate access to a library to generate exponential distribution samples, it can be easily done by: -ln(rand())/weight