Mathematica快速2D分箱算法

问题内容：

在Mathematica中开发适当的快速装箱算法时遇到了一些麻烦。我有一个大的（〜100k个元素）数据集，形式为T =
{{x1，y1，z1}，{x2，y2，z2}，....}，我想将其分类为2D数组100x100个bin，bin值由落入每个bin的Z值之和得出。

目前，我正在遍历表中的每个元素，使用“选择”基于bin边界列表挑选出它应该在哪个bin中，然后将z值添加到占用该bin的值列表中。最后，我将Total映射到垃圾箱列表，对垃圾箱的内容求和（之所以这样做，是因为有时我想做其他事情，例如最大化）。

我曾尝试使用Gather和其他此类函数来执行此操作，但是上述方法可笑得更快，尽管也许我使用Gather的效率很差。无论如何，按照我的方法进行排序仍然需要几分钟，我觉得Mathematica可以做得更好。有没有人有一个很好的高效算法方便吗？

问题答案：

这是一种基于Szabolcs帖子的方法，速度大约快一个数量级。

data = RandomReal[5, {500000, 3}];
(*500k values*)
zvalues = data[[All, 3]];

epsilon = 1*^-10;(*prevent 101 index*)
(*rescale and round (x,y) coordinates to index pairs in the 1..100 range*)
indexes = 1 + Floor[(1 - epsilon) 100 Rescale[data[[All, {1, 2}]]]];

res2 = Module[{gb = GatherBy[Transpose[{indexes, zvalues}], First]}, 
    SparseArray[
     gb[[All, 1, 1]] -> 
      Total[gb[[All, All, 2]], {2}]]]; // AbsoluteTiming

给出{2.012217，Null}

AbsoluteTiming[
 System`SetSystemOptions[ 
  "SparseArrayOptions" -> {"TreatRepeatedEntries" -> 1}];
 res3 = SparseArray[indexes -> zvalues];
 System`SetSystemOptions[ 
  "SparseArrayOptions" -> {"TreatRepeatedEntries" -> 0}];
 ]

给出{0.195228，空}

res3 == res2
True

“ TreatRepeatedEntries”-> 1将重复的位置加起来。

Mathematica快速2D分箱算法

微信关注