发现最近点的指数x和y坐标的numpy的阵列阵列、标的、指数、发现

2023-09-10 23:28:08 作者:姿态动人

我有两个2d中numpy的阵列:x_array包含在x方向的位置信息,y_array包含在y方向的位置

I have two 2d numpy arrays: x_array contains positional information in the x-direction, y_array contains positions in the y-direction.

我再有X,Y点的一个长长的清单。

I then have a long list of x,y points.

对于列表中的每一个点,我需要找到的位置(在数组指定)的数组索引是最接近这一点。

For each point in the list, I need to find the array index of the location (specified in the arrays) which is closest to that point.

我还天真地产生了一些code,在此基础上的问题而工作: 发现numpy的数组中最接近的数值

I have naively produced some code which works, based on this question: find nearest value in numpy array

import time
import numpy

def find_index_of_nearest_xy(y_array, x_array, y_point, x_point):
    distance = (y_array-y_point)**2 + (x_array-x_point)**2
    idy,idx = numpy.where(distance==distance.min())
    return idy[0],idx[0]

def do_all(y_array, x_array, points):
    store = []
    for i in xrange(points.shape[1]):
        store.append(find_index_of_nearest_xy(y_array,x_array,points[0,i],points[1,i]))
    return store


# Create some dummy data
y_array = numpy.random.random(10000).reshape(100,100)
x_array = numpy.random.random(10000).reshape(100,100)

points = numpy.random.random(10000).reshape(2,5000)

# Time how long it takes to run
start = time.time()
results = do_all(y_array, x_array, points)
end = time.time()
print 'Completed in: ',end-start

我这样做在一个大的数据集,真的想加快了位。 任何人都可以优化吗?

I'm doing this over a large dataset and would really like to speed it up a bit. Can anyone optimize this?

感谢。

更新:解继@silvado和@justin建议(下)

UPDATE: SOLUTION following suggestions by @silvado and @justin (below)

# Shoe-horn existing data for entry into KDTree routines
combined_x_y_arrays = numpy.dstack([y_array.ravel(),x_array.ravel()])[0]
points_list = list(points.transpose())


def do_kdtree(combined_x_y_arrays,points):
    mytree = scipy.spatial.cKDTree(combined_x_y_arrays)
    dist, indexes = mytree.query(points)
    return indexes

start = time.time()
results2 = do_kdtree(combined_x_y_arrays,points_list)
end = time.time()
print 'Completed in: ',end-start

这code以上加速了我的code(在100x100的矩阵寻找5000点)的100倍。有趣的是,使用scipy.spatial.KDTree(而不是scipy.spatial.cKDTree)给媲美定时到我天真的解决方案,所以它使用cKDTree版是绝对值得...

This code above sped up my code (searching for 5000 points in 100x100 matrices) by 100 times. Interestingly, using scipy.spatial.KDTree (instead of scipy.spatial.cKDTree) gave comparable timings to my naive solution, so it is definitely worth using the cKDTree version...

推荐答案

scipy.spatial 也有kd树实现:scipy.spatial.KDTree.

scipy.spatial also has a k-d tree implementation: scipy.spatial.KDTree.

该方法通常是先用,以建立一个kd树点数据。的,该计算复杂度是N为log N的顺序,其中N是数据点的数量上。范围查询以及最近邻搜索可以随后与为log N复杂完成。这不是简单地通过所有点(复杂N)。循环更有效

The approach is generally to first use the point data to build up a k-d tree. The computational complexity of that is on the order of N log N, where N is the number of data points. Range queries and nearest neighbour searches can then be done with log N complexity. This is much more efficient than simply cycling through all points (complexity N).

因此​​,如果你有反复范围或近邻查询中,KD树,强烈推荐。

Thus, if you have repeated range or nearest neighbor queries, a k-d tree is highly recommended.