数学快速的2D分级算法算法、快速、数学

2023-09-11 00:00:09 作者：爷那叫范儿

我有一些麻烦，发展数学中的适当快速分选算法。我有一个大的（〜100,000元），数据的形式集 T = {{X1，Y1，Z1}，{X2，Y2，Z2}，....} 我想斌成大约100×100箱一个二维数组，同斌值被给出落入每个仓的Z值的总和。

I am having some trouble developing a suitably fast binning algorithm in Mathematica. I have a large (~100k elements) data set of the form T={{x1,y1,z1},{x2,y2,z2},....} and I want to bin it into a 2D array of around 100x100 bins, with the bin value being given by the sum of the Z values that fall into each bin.

目前，我通过表中的每个元素进行迭代，用选择挑出它应该有bin是基于箱边界的名单，并增加了Z值值占该仓的列表。最后我总映射到垃圾箱的清单，总结它们的内容（我这样做是因为我有时会想去做其他的事情，比如最大化）。

Currently I am iterating through each element of the table, using Select to pick out which bin it is supposed to be in based on lists of bin boundaries, and adding the z value to a list of values occupying that bin. At the end I map Total onto the list of bins, summing their contents (I do this because I sometimes want to do other things, like maximize).

我用收集等这些功能做到这一点，但上面的方法是快的离谱都试过，虽然也许我使用收集不佳。反正它仍然需要几分钟的时间通过我的方法来进行排序，我觉得数学可以做的更好。有没有人有一个很好的有效的算法方便？

I have tried using Gather and other such functions to do this but the above method was ridiculously faster, though perhaps I am using Gather poorly. Anyway It still takes a few minutes to do the sorting by my method and I feel like Mathematica can do better. Does anyone have a nice efficient algorithm handy?

推荐答案

下面是一个基于索博尔奇的帖子的方法也差不多约幅度快一个数量级。

Here is a method based on Szabolcs's post that is about about an order of magnitude faster.

data = RandomReal[5, {500000, 3}];
(*500k values*)
zvalues = data[[All, 3]];

epsilon = 1*^-10;(*prevent 101 index*)
(*rescale and round (x,y) coordinates to index pairs in the 1..100 range*)
indexes = 1 + Floor[(1 - epsilon) 100 Rescale[data[[All, {1, 2}]]]];

res2 = Module[{gb = GatherBy[Transpose[{indexes, zvalues}], First]}, 
    SparseArray[
     gb[[All, 1, 1]] -> 
      Total[gb[[All, All, 2]], {2}]]]; // AbsoluteTiming

给出关于{2.012217，空}

Gives about {2.012217, Null}

AbsoluteTiming[
 System`SetSystemOptions[ 
  "SparseArrayOptions" -> {"TreatRepeatedEntries" -> 1}];
 res3 = SparseArray[indexes -> zvalues];
 System`SetSystemOptions[ 
  "SparseArrayOptions" -> {"TreatRepeatedEntries" -> 0}];
 ]

给出关于{0.195228，空}

Gives about {0.195228, Null}