事件的两个列表近似匹配(有时间)近似、两个、事件、时间

2023-09-11 04:19:05 作者:画迟

我有一个黑盒子算法,分析时间序列和检测系列中的某些事件。它返回事件列表,每个都包含一个开始时间和结束时间。的事件不重叠。 我也有真实事件的列表,再次开始时间和结束时间为每个事件,而不是重叠的。

I have a black box algorithm that analyses a time series and "detects" certain events in the series. It returns a list of events, each containing a start time and end time. The events do not overlap. I also have a list of the "true" events, again with start time and end time for each event, not overlapping.

我要比较的两个列表和匹配检测和真实事件下降在一定时间内的公差(真阳性)的。复杂之处,该算法可以检测到是不是真的有(误判)事件或可能会错过了在那里的事件(假阴性)。

I want to compare the two lists and match detected and true events that fall within a certain time tolerance (True Positives). The complication is that the algorithm may detect events that are not really there (False Positives) or might miss events that were there (False Negatives).

什么是算法优化,从两个列表对事件并离开了正确的事件未配对?我是pretty的肯定,我不是第一个来解决这个问题,而且这样的方法存在,但我一直没能找到它,也许是因为我不知道正确的术语。

What is an algorithm that optimally pairs events from the two lists and leaves the proper events unpaired? I am pretty sure I am not the first one to tackle this problem and that such a method exists, but I haven't been able to find it, perhaps because I do not know the right terminology.

速度的要求: 列表将包含不超过几百项以上,且速度是不是一个主要因素。精度更重要。任何服用不到几秒钟,一个普通的计算机上就可以了。

Speed requirement: The lists will contain no more than a few hundred entries, and speed is not a major factor. Accuracy is more important. Anything taking less than a few seconds on an ordinary computer will be fine.

推荐答案

下面是一个二次时间的算法,给出了一个最大似然估计相对于下面的模型。让A1中...<上午是真正的间隔,让B1< ...<国阵在报告的时间间隔。量子(I,J)是对数似然艾变得BJ。量德尔(i)是对数似然艾被删除。量项(j)是对数似然该BJ被插入。让独立的假设无处不在!我要选择子,德尔和插件等等,对于每一个I<我和每一个J< J',我们有

Here's a quadratic-time algorithm that gives a maximum likelihood estimate with respect to the following model. Let A1 < ... < Am be the true intervals and let B1 < ... < Bn be the reported intervals. The quantity sub(i, j) is the log-likelihood that Ai becomes Bj. The quantity del(i) is the log-likelihood that Ai is deleted. The quantity ins(j) is the log-likelihood that Bj is inserted. Make independence assumptions everywhere! I'm going to choose sub, del, and ins so that, for every i < i' and every j < j', we have

sub(i, j') + sub(i', j) <= max {sub(i, j )       + sub(i', j')
                               ,del(i) + ins(j') + sub(i', j )
                               ,sub(i, j')       + del(i') + ins(j)
                               }.

这确保了间隔之间的最佳匹配noncrossing,因此,我们可以使用下面的莱文斯坦般的动态程序。

This ensures that the optimal matching between intervals is noncrossing and thus that we can use the following Levenshtein-like dynamic program.

动态程序psented作为memoized递归函数,评分(I,J),即计算匹配的最佳得分$ P $ A1,...,艾与B1,...,BJ。调用树的根是分数(M,N)。可以对其进行修改,以返回子(I,J)操作中的最优解的序列

The dynamic program is presented as a memoized recursive function, score(i, j), that computes the optimal score of matching A1, ..., Ai with B1, ..., Bj. The root of the call tree is score(m, n). It can be modified to return the sequence of sub(i, j) operations in the optimal solution.

score(i, j) | i == 0 && j == 0 =      0
            | i >  0 && j == 0 =      del(i)    + score(i - 1, 0    )
            | i == 0 && j >  0 =      ins(j)    + score(0    , j - 1)
            | i >  0 && j >  0 = max {sub(i, j) + score(i - 1, j - 1)
                                     ,del(i)    + score(i - 1, j    )
                                     ,ins(j)    + score(i    , j - 1)
                                     }

下面是子,德尔和插件的一些可能的定义。我不知道他们会得到什么好处;您可能希望通过常量或使用权力比2。其他繁殖它们的值如果爱= [S,T]和Bj = [U,V],然后定义

Here are some possible definitions for sub, del, and ins. I'm not sure if they will be any good; you may want to multiply their values by constants or use powers other than 2. If Ai = [s, t] and Bj = [u, v], then define

sub(i, j) = -(|u - s|^2 + |v - t|^2)
del(i) = -(t - s)^2
ins(j) = -(v - u)^2.

(对不住谁发表这样的事情在生物信息学文献几十年前的无疑是现存的学术。)

(Apologies to the undoubtedly extant academic who published something like this in the bioinformatics literature many decades ago.)

 
精彩推荐
图片推荐