要有效地找到高密度区域的最佳方式要有、高密度、区域、方式

2023-09-11 03:43:14 作者:少爷。

在我的编码过程中,我所遇到的问题如下: 找到固定大小的一个区域在2D空间具有颗粒的密度最高。的颗粒可以被认为大致随机分布在整个空间,但在理论上,应该有具有较高密度一些区​​域。

Over the course of my coding, I have come across a problem as follows: Find the a region of fixed size in a 2D space that has the highest density of particles. The particles can be considered generally distributed randomly over the entire space, but in theory there should be some areas that have a higher density.

例如,100粒子随机放置在2D网格500×500,我需要找到50×50区域最颗粒(密度最高)。

For example, 100 particles are placed randomly in a 2D grid that is 500x500, and I need to find the 50x50 region with the most particles (highest density).

有没有来计算,除了蛮力的最佳区域测试每一个可能的区域(在这种情况下,约20多万地区)一些其他的方式?这将扩大在为O(n ^ 2)为正的长度轴。

Is there some other way to calculate the best region besides brute force testing every possible region (in this case about over 200000 regions)? That would scale up at O(n^2) for an n-length axis.

推荐答案

创建500×500二维数组,其中每个单元包含在该单元格的粒子数的计数。然后,卷积用50×50的内核的数组,结果数组将有粒子计数每个细胞中一个50×50区域。然后找到具有最大值的小区

Algorithm 1

Create a 500x500 2D array, where each cell contains the count of the number of particles in that cell. Then convolve that array with a 50x50 kernel, the resulting array will have the count of particles in a 50x50 region in each cell. Then find the cell with the largest value.

如果您使用的是50×50盒为一个区域,内核可以分解成两个独立的卷积,每个轴。所得算法是O(n ^ 2)的空间和时间,其中n是2D空间要搜索的宽度和高度。

If you are using a 50x50 box as a region, the kernel can be decomposed into two separate convolutions, one for each axis. The resulting algorithm is O(n^2) space and time, where n is the width and height of the 2D space you are searching.

作为提醒,一维卷积用棚车函数可以在O(n)的时间和空间的完成,它可以在适当的位置进行。设x(t)为输入对于t = 1..N,并令y(t)为输出。定义X(t)= 0和y(t)= 0在t&小于1和叔>Ñ。定义内核F(T)为1对0..d-1和0别处。卷积的定义给了我们以下公式:

As a reminder, a one-dimensional convolution with a boxcar function can be completed in O(n) time and space and it can be done in place. Let x(t) be the input for t=1..n, and let y(t) be the output. Define x(t)=0 and y(t)=0 for t<1 and t>n. Define the kernel f(t) to be 1 for 0..d-1 and 0 elsewhere. The definition for convolution gives us the following formula:

Y(t)的总和=九(TI)* F(ⅰ)=总和I = 0..d-1×(TI)

y(t) = sum i x(t-i) * f(i) = sum i=0..d-1 x(t-i)

这看起来像它需要时间为O(n * D),但我们可以把它改写为复发:

This looks like it takes time O(n*d), but we can rewrite it as a recurrence:

γ(T)= Y(T-1)+ X(t) - X(t-d)中

打印预览

y(t) = y(t-1) + x(t) - x(t-d)

这表明,该一维卷积是O(n),独立​​的D。执行二维卷积,则简单地执行一维卷积每个轴。此工作,因为该棚车内核可以被分解:在一般情况下,大多数内核不能被分解。高斯核是另一个内核可以分解,这就是为什么高斯模糊在图像编辑程序是如此之快

This shows that the one-dimensional convolution is O(n), independent of d. To perform the two-dimensional convolution, you simply perform the one-dimensional convolution for each axis. This works because the boxcar kernel can be decomposed: in general, most kernels cannot be decomposed. The Gaussian kernel is another kernel that can be decomposed, which is why Gaussian blur in an image editing program is so fast.

对于指定的,这将是非常快的那种数字。 500×500是一个非常小的数据集,以及您的计算机可以检查202,5​​00地区,在几毫秒的时间最多。你必须问问自己是否值得额外的几小时,几天或几周的时间它会带你进一步优化。

For the kind of numbers you specify, this will be extremely fast. 500x500 is an extremely small data set, and your computer can check 202,500 regions in a few milliseconds at most. You will have to ask yourself whether it is worth the extra hours, days, or weeks of time it will take you to optimize further.

这是相同的justhalf的溶液,除了由于分解卷积,该区域的大小不影响算法的速度。

This is the same as justhalf's solution, except due to the decomposed convolution, the region size does not affect the algorithm's speed.

假定有至少一个点。不失一般性,考虑二维空间成为整个平面。让的ð的是该区域的宽度和高度。令N为点的数量。

Assume there is at least one point. Without loss of generality, consider the 2D space to be the entire plane. Let d be the width and height of the region. Let N be the number of points.

引理:存在最大密度而在其左边的一个点的区域

Lemma: There exists a region of maximum density which has a point on its left edge.

证明:设R是密度最大的区域。令R'是相同的区域中,由R的左边缘和最左边的点在R的所有点R中也必须位于R的距离权翻译',因此R'是也最大密度的区域。

Proof: Let R be a region of maximum density. Let R' be the same region, translated right by the distance between the left edge of R and the leftmost point in R. All points in R must also lie in R', therefore R' is also a region of maximum density.

将所有点到K-D树。这可以在O完成(N日志 2 N)的时间。

对于每一个点,考虑宽度的区域的ð的和高度2 ð,其中点为中心区域的左边缘。调用此区域R。

For each point, consider the region of width d and height 2d where the point is centered on the left edge of the region. Call this region R.

查询KD树在区域R称这组S的点,这可以在O完成(N 1/2 + | S |)时间

Query the K-D tree for the points in region R. Call this set S. This can be done in O(N1/2+|S|) time.

求R含有S的最大点数的DXD分区这可以在O完成(| S |登录| S |)时间由分拣S y时坐标,然后进行线性扫描

Find the d x d subregion of R containing the largest number of points in S. This can be done in O(|S| log |S|) time by sorting S by y-coordinate and then performing a linear scan.

由此产生的算法有邻时间(N 3/2 + N | S |登录| S |)。

The resulting algorithm has a time of O(N3/2 + N |S| log |S|).

算法#1优于算法#2时的密度高。算法#2是仅优于当颗粒的密度非常低,且密度在哪些算法#2是上级减小的总板尺寸增大。

Algorithm #1 is superior to algorithm #2 when the density is high. Algorithm #2 is only superior when the density of particles is very low, and the density at which algorithm #2 is superior decreases as the total board size increases.

请注意,该连续情况下,可以认为具有零密度,此时只有算法#2作品

Note that the continuous case can be considered to have zero density, at which point only algorithm #2 works.

 
精彩推荐
图片推荐