查找重叠的区间重叠时,是罕见的区间、罕见

2023-09-11 04:54:29 作者:劳资就是王道 @

我有一个庞大的数据库表中的 N 的整数区间(例如{1-5},{4-16},{6434-114343}),并需要找出这间隔相互重叠。有丰富的similar在SO 问题,但不同的是,我需要回来,每个间隔分别设置重叠的区间。

I have a huge database table with n integer intervals (for instance {1-5}, {4-16}, {6434-114343}) and need to find out which intervals overlap each other. There's a wealth of similar questions on SO, but the difference is I need returned, for each interval respectively, the set of overlapping intervals.

      ------------------ A -------------------
    ------ B -------               ----- D -----
          --------- C --------- 

在这个例子中,输出将 A:{B,C,D} B:{A,C} C:{A,B} D:{A}

最坏情况下,所有的间隔可以彼此重叠,产生的O号的输出(的 N 的 2 )。这只不过是天真的解决方案(比较每对间隔)更好。然而,在实践中,我知道我的几个区间将覆盖其它的间隔,而在这个时候,最多只有5等间隔。

Worst case, all intervals could overlap each other, producing an output of size O(n2). This is no better than the naïve solution (compare every pair of intervals). In practice however, I know few of my intervals will overlap other intervals, and when they do, only up to 5 other intervals.

鉴于这一信息,我应该怎么解决这个问题? (理想情况下,我想一个SQL查询解决方案,因为该数据是在数据库中,但我认为只是一个普通的算法解决方案是可能的。)

Given this info, how should I solve the problem? (Optimally, I would like a SQL query solution since the data is in the database, but I think only a regular algorithmic solution is possible.)

推荐答案

建立的区间开始和结束的一个排序序列,然后遍历它,每一个更新当前区间的名单时,报告任何新发现的交集。

Build a sorted sequence of interval starts and ends, then traverse it, every time updating list of current intervals, report any new found intersection.

事情是这样的:

std::vector<TBorder> borders;
for(auto i=intervals.begin();i!=intervals.end();++i)
{
    borders.push_back(TBorder(i.Start(),Start));
    borders.push_back(TBorder(i.End(),End));
}
std::sort(borders.begin(),borders.end());
std::set<int> currentIntervals;
for(auto b=borders.begin();b!=borders.end();++b)
{
    if(b.IsEnd())
        currentIntervals.erase(b.IntervalIndex());
    else
    {
        currentIntervals.insert(b.IntervalIndex());
        if(currentIntervals.size()>1)
            ReportIntersection(currentIntervals);
    }
}

通常这是为O(n * log n)的(假设数字交叉点是O(1))。

Generally it's O(n*log n) (assuming number of intersections is O(1) ).

但如果你已经有间隔由如排序开始,有可能排序可以在O(N)来完成(再次假设路口数为O(1))。

But if you already have intervals sorted by e.g. start, likely sorting can be done in O(n) (again assuming number of intersection is O(1)).