所有的k个最近的邻居在2D,C ++有的、邻居、最近

2023-09-10 23:33:21 作者:✎ℳ生人๓勿进d

我要找到每一个点集所有最邻近的数据。该数据集包含约。千万2D点。该数据接近到电网,但不形成precise网格...

I need to find for each point of the data set all its nearest neighbors. The data set contains approx. 10 million 2D points. The data are close to the grid, but do not form a precise grid...

这个选项排除了(在我看来)使用KD树,这里的基本假设是没有任何点都相同的X坐标和Y坐标。

This option excludes (in my opinion) the use of KD Trees, where the basic assumption is no points have same x coordinate and y coordinate.

我需要一个快速的算法为O(n)或更高(但执行:-)不太难))来解决这个问题......由于这样的事实,提振不规范,我不想用它......

I need a fast algorithm O(n) or better (but not too difficult for implementation :-)) ) to solve this problem ... Due to the fact that boost is not standardized, I do not want to use it ...

谢谢您的回答或code样......

Thanks for your answers or code samples...

推荐答案

我要做到以下几点:

创建上的点之上的大格。

Create a larger grid on top of the points.

通过点线去,并为他们每个人,找出大的细胞,它所属的(并添加指向与该小区相关联的列表)。

Go through the points linearly, and for each one of them, figure out which large "cell" it belongs to (and add the points to a list associated with that cell).

(这可以在固定时间内完成每个点,只是做了点的坐标的整数除法。)

(This can be done in constant time for each point, just do an integer division of the coordinates of the points.)

现在经过点线了。要找到10最近的邻居,你只需要看看点相邻,规模更大,细胞。

Now go through the points linearly again. To find the 10 nearest neighbors you only need to look at the points in the adjacent, larger, cells.

由于您的积分都相当均匀地分散,可以在时间成正比,每个(大)细胞点数做到这一点。

Since your points are fairly evenly scattered, you can do this in time proportional to the number of points in each (large) cell.

下面是一个(丑陋的)PIC描述的情况:

Here is an (ugly) pic describing the situation:

细胞必须是(中心)和相邻小区来包含最接近10分,足够大但小到足以加快计算。你可以把它看作一个哈希函数在那里你会找到最接近的点在同一个桶中。

The cells must be large enough for (the center) and the adjacent cells to contain the closest 10 points, but small enough to speed up the computation. You could see it as a "hash-function" where you'll find the closest points in the same bucket.

(注意,严格来说,它不是的 O(N)的,但通过调整较大的单元格的大小,你应该得到足够接近: - )

(Note that strictly speaking it's not O(n) but by tweaking the size of the larger cells, you should get close enough. :-)