基于欧氏距离的三维连接的点标记标记、距离

2023-09-11 04:16:57 作者:绕过山河错落

目前,我的工作是从一个数据集试图组三维点通过指定连接为一个最小欧几里得距离的项目。我的算法,现在只不过是天真的颜色填充一个三维的适应。

 为size_t PointSegmenter :: growRegion(为size_t和放大器;种子,为size_t segNumber){
    为size_t numPointsLabeled = 0;

    //别名点,以避免重复输入
    矢量<三维点> &放大器;分= _img.points;
    双端队列<为size_t> ptQueue;
    ptQueue.push_back(种子);
    点[种子] .setLabel(segNumber);
    而(!ptQueue.empty()){
        为size_t currentIdx = ptQueue.front();
        ptQueue.pop_front();
        点[currentIdx] .setLabel(segNumber);
        numPointsLabeled ++;
        矢量< int的> newPoints = _img.queryRad​​ius(currentIdx,SEGMENT_MAX_DISTANCE,MATCH_ACCURACY);
        对(INT I = 0; I&其中;(int)的newPoints.size();我++){
            INT newIdx = newPoints [I]
            三维点和放大器; newPoint =点[newIdx]
            如果(!newPoint.labeled()){
                newPoint.setLabel(segNumber);
                ptQueue.push_back(newIdx);
            }
        }
    }

    //注意对谁写的其他code时,编译器优化I ++
    //为+ +我在这样的情况下,所以请不要改变它们只是速度:)
    用于(为size_t i =种子; I< points.size();我++){
        如果(!点[I] .labeled()){
            //寻找未标记的点作为下一个种子。
            种子=我;
            返回numPointsLabeled;
        }
    }
    返回numPointsLabeled;
}
 

如果这code段再次为新的种子跑去,_img.queryRad​​ius()是一个固定半径搜索与安库:

 矢量< INT>图片:: queryRad​​ius(为size_t指数,双范围,双小量){
    INT K = kdTree-> annkFRSearch(dataPts [指数],范围*范围,0);
    ANNidxArray nnIdx =新ANNidx [K]
    kdTree-> annkFRSearch(dataPts [指数],范围*范围内,K,nnIdx);
    矢量< int的> outPoints;
    outPoints.reserve(k)的;
    的for(int i = 0; I< k;我++){
        outPoints.push_back(nnIdx [I]);
    }
    删除[] nnIdx;
    返回outPoints;
}
 
基于鲁棒的余弦欧氏距离度量降维的图像检索方法

我的问题与此code是它的运行waaaaaaaaaaaaaaaay太慢用于大型数据集。如果我没有记错,这code会做一个搜索每一个点,而搜索是O(NlogN),给这个时间的(N ^ 2 *日志(N))的复杂性。

除此之外,缺失是如果我记得比较昂贵直接从KD树,也不能删除点创建,每个点都可以搜索到上百次的问题,通过每一个邻居接近它。

所以我的问题是,是否有更好的方法来做到这一点?尤其是在一个方式,将线性生长与数据集

感谢您的帮助,您可以提供

修改 我已经使用像短跑,TOM-一声说一个简单的排序列表试过,但效果比我之前使用更慢。我不知道这是否是执行,或者只是简单的速度太慢,遍历每一个点,并检查欧氏距离(即使只用平方距离。

是否有任何其他想法的人可能有?我诚实地难倒了现在。

解决方案

我提出如下算法:

计算3D Delaunay三角的数据点。

删除所有长度超过你的极限距离,O(N),当第3步相结合的边缘。

查找结果图是O(N)的大小,这是为O完成连接的组件(N阿尔法(N))。

的瓶颈是第1步,可以在O完成(N 2 )甚至O(N日志N),根据该页面的http://www.ncgia.ucsb.edu/conf/SANTA_FE_CD-ROM/sf_papers/lattuada_roberto/paper.html.但是它绝对不是一个100行的算法。

Currently, I am working on a project that is trying to group 3d points from a dataset by specifying connectivity as a minimum euclidean distance. My algorithm right now is simply a 3d adaptation of the naive flood fill.

size_t PointSegmenter::growRegion(size_t & seed, size_t segNumber) {
    size_t numPointsLabeled = 0;

    //alias for points to avoid retyping
    vector<Point3d> & points = _img.points;
    deque<size_t> ptQueue;
    ptQueue.push_back(seed);
    points[seed].setLabel(segNumber);
    while (!ptQueue.empty()) {
        size_t currentIdx = ptQueue.front();
        ptQueue.pop_front();
        points[currentIdx].setLabel(segNumber);
        numPointsLabeled++;
        vector<int> newPoints = _img.queryRadius(currentIdx, SEGMENT_MAX_DISTANCE, MATCH_ACCURACY);
        for (int i = 0; i < (int)newPoints.size(); i++) {
            int newIdx = newPoints[i];
            Point3d &newPoint = points[newIdx];
            if(!newPoint.labeled()) {
                newPoint.setLabel(segNumber);
                ptQueue.push_back(newIdx);
            }
        }
    }

    //NOTE to whoever wrote the other code, the compiler optimizes i++ 
    //to ++i in cases like these, so please don't change them just for speed :)
    for (size_t i = seed; i < points.size(); i++) {
        if(!points[i].labeled()) {
            //search for an unlabeled point to serve as the next seed.
            seed = i;
            return numPointsLabeled;
        }
    }
    return numPointsLabeled;
}

Where this code snippet is ran again for the new seed, and _img.queryRadius() is a fixed radius search with the ANN library:

vector<int> Image::queryRadius(size_t index, double range, double epsilon) {
    int k = kdTree->annkFRSearch(dataPts[index], range*range, 0);
    ANNidxArray nnIdx = new ANNidx[k];
    kdTree->annkFRSearch(dataPts[index], range*range, k, nnIdx);
    vector<int> outPoints;
    outPoints.reserve(k);
    for(int i = 0; i < k; i++) {
        outPoints.push_back(nnIdx[i]);
    }
    delete[] nnIdx;
    return outPoints;
}

My problem with this code is that it runs waaaaaaaaaaaaaaaay too slow for large datasets. If I'm not mistaken, this code will do a search for every single point, and the searches are O(NlogN), giving this a time complexity of (N^2*log(N)).

In addition to that, deletions are relatively expensive if I remember right from KD trees, but also not deleting points creates problems in that each point can be searched hundreds of times, by every neighbor close to it.

So my question is, is there a better way to do this? Especially in a way that will grow linearly with the dataset?

Thanks for any help you may be able to provide

EDIT I have tried using a simple sorted list like dash-tom-bang said, but the result was even slower than what I was using before. I'm not sure if it was the implementation, or it was just simply too slow to iterate through every point and check euclidean distance (even when just using squared distance.

Is there any other ideas people may have? I'm honestly stumped right now.

解决方案

I propose the following algorithm:

Compute 3D Delaunay triangulation of your data points.

Remove all the edges that are longer than your threshold distance, O(N) when combined with step 3.

Find connected components in the resulting graph which is O(N) in size, this is done in O(N α(N)).

The bottleneck is step 1 which can be done in O(N2) or even O(N log N) according to this page http://www.ncgia.ucsb.edu/conf/SANTA_FE_CD-ROM/sf_papers/lattuada_roberto/paper.html. However it's definitely not a 100 lines algorithm.