滚动中位数算法的C中位数、算法

2023-09-10 22:40:13 作者:陌染

我目前工作的一个算法,实现滚动中值滤波器(类似于滚动平均筛选器)的C.从我的搜索文献,似乎有两个合理有效的方式来做到这一点。第一是值的初始窗口进行排序,则执行二进制搜索插入新值,并除去离开一个在每次迭代

I am currently working on an algorithm to implement a rolling median filter (analogous to a rolling mean filter) in C. From my search of the literature, there appear to be two reasonably efficient ways to do it. The first is to sort the initial window of values, then perform a binary search to insert the new value and remove the exiting one at each iteration.

(距离Hardle和Steiger的,1995年,JRSS-C,算法296)第二构建了一个双端堆结构,具有maxheap一端,另一方面一minheap,并在中间的中位数。这将产生一个线性时间算法,而不是一个是O(n log n)的。

The second (from Hardle and Steiger, 1995, JRSS-C, Algorithm 296) builds a double-ended heap structure, with a maxheap on one end, a minheap on the other, and the median in the middle. This yields a linear-time algorithm instead of one that is O(n log n).

下面是我的问题:实现前者是可行的,但我需要上百万的时间序列运行它,所以效率是相当重要的。后者被证明是非常难以实施。我发现code中的code的统计包R的Trunmed.c文件,但比较难以辨认。

Here is my problem: implementing the former is doable, but I need to run this on millions of time series, so efficiency matters a lot. The latter is proving very difficult to implement. I found code in the Trunmed.c file of the code for the stats package of R, but it is rather indecipherable.

有谁知道一个精心编写的C实现的线性时间滚动中位数算法?

Does anyone know of a well-written C implementation for the linear time rolling median algorithm?

编辑:链接到Trunmed.c code http://google.com/$c$csearch/p?hl=en&sa=N&cd=1&ct=rc#mYw3h%5FLb%5Fe0/R-2.2.0/src/library/stats/src/Trunmed.c

Link to Trunmed.c code http://google.com/codesearch/p?hl=en&sa=N&cd=1&ct=rc#mYw3h%5FLb%5Fe0/R-2.2.0/src/library/stats/src/Trunmed.c

推荐答案

我看过的r 的src /库/统计/ src目录/ Trunmed.c 几倍我想也是在一个独立的C ++类/ C子程序类似的东西。请注意,这实际上是两种实现方式于一体,请参阅的src /库/统计/人/ runmed.Rd (帮助文件的源代码),它说

I have looked at R's src/library/stats/src/Trunmed.c a few times as I wanted something similar too in a standalone C++ class / C subroutine. Note that this are actually two implementations in one, see src/library/stats/man/runmed.Rd (the source of the help file) which says

\details{
  Apart from the end values, the result \code{y = runmed(x, k)} simply has
  \code{y[j] = median(x[(j-k2):(j+k2)])} (k = 2*k2+1), computed very
  efficiently.

  The two algorithms are internally entirely different:
  \describe{
    \item{"Turlach"}{is the Härdle-Steiger
      algorithm (see Ref.) as implemented by Berwin Turlach.
      A tree algorithm is used, ensuring performance \eqn{O(n \log
        k)}{O(n * log(k))} where \code{n <- length(x)} which is
      asymptotically optimal.}
    \item{"Stuetzle"}{is the (older) Stuetzle-Friedman implementation
      which makes use of median \emph{updating} when one observation
      enters and one leaves the smoothing window.  While this performs as
      \eqn{O(n \times k)}{O(n * k)} which is slower asymptotically, it is
      considerably faster for small \eqn{k} or \eqn{n}.}
  }
}

这将是很高兴看到这种重新用在更独立的方式。你在义务?我可以帮助一些中的R位。

It would be nice to see this re-used in a more standalone fashion. Are you volunteering? I can help with some of the R bits.

编辑1 的:除了链接Trunmed.c以上的旧版本,这里是目前SVN副本

Edit 1: Besides the link to the older version of Trunmed.c above, here are current SVN copies of

Srunmed。 ç (为Stuetzle版本) Trunmed。 ç (为Turlach版本) runmed。 - [R 为R函数调用这些

编辑2 的:瑞安Tibshirani对的快速中值分档这可能是一个合适的起点,一个窗口的做法。

Edit 2: Ryan Tibshirani has some C and Fortran code on fast median binning which may be a suitable starting point for a windowed approach.