使用用L的方法更平滑的,以确定K均值群集数平滑、均值、方法

2023-09-11 03:01:00 作者:书生倦气

有没有人尝试应用L-方法确定的k-means聚类数据集中在号码前加应用顺畅的评价指标?如果是的话,它提高了结果?或允许的下数k-装置试验和速度,因此更大的增加?其中平滑算法/方法,你用了?

Has anyone tried to apply a smoother to the evaluation metric before applying the L-method to determine the number of k-means clusters in a dataset? If so, did it improve the results? Or allow a lower number of k-means trials and hence much greater increase in speed? Which smoothing algorithm/method did you use?

在L-法,详见: 确定集群/段的数量在分层聚类/分割算法 ,萨尔瓦多和放大器;陈

The "L-Method" is detailed in: Determining the Number of Clusters/Segments in Hierarchical Clustering/Segmentation Algorithms, Salvador & Chan

该计算评价指标的各种不同的试验集群计数。然后,找到膝盖(发生于簇的最佳数目),两线使用线性回归拟合。一个简单的迭代过程被施加到改善膝拟合 - 这使用现有评价度量计算,并不需要的k均值任何重新运行

This calculates the evaluation metric for a range of different trial cluster counts. Then, to find the knee (which occurs for an optimum number of clusters), two lines are fitted using linear regression. A simple iterative process is applied to improve the knee fit - this uses the existing evaluation metric calculations and does not require any re-runs of the k-means.

有关评价指标,我现在用的邓斯指数的简化版本倒数。简化速度(基本上我的直径和集群间的计算是简单的)。的倒数是使指数工作在正确的方向(即下通常较好)。

For the evaluation metric, I am using a reciprocal of a simplified version of the Dunns Index. Simplified for speed (basically my diameter and inter-cluster calculations are simplified). The reciprocal is so that the index works in the correct direction (ie. lower is generally better).

K均值是一个随机算法,所以典型地它被多次运行和所选择的最合适的。此工程pretty的很好,但是当你这样做是为了1..N集群的时间迅速增加了。因此,它是在我的兴趣,以保持在检查运行的次数。整个处理时间可确定我的执行是否实用与否 - 我可以抛弃这个功能,如果我不能加快速度

K-means is a stochastic algorithm, so typically it is run multiple times and the best fit chosen. This works pretty well, but when you are doing this for 1..N clusters the time quickly adds up. So it is in my interest to keep the number of runs in check. Overall processing time may determine whether my implementation is practical or not - I may ditch this functionality if I cannot speed it up.

推荐答案

我曾问一个similar问题在这里就这样过去。我的问题是关于未来与发现膝盖您所描述的L形的一致方法。在重新$ P $问题曲线psented复杂性和模型的拟合度量之间的权衡。

I had asked a similar question in the past here on SO. My question was about coming up with a consistent way of finding the knee to the L-shape you described. The curves in question represented the trade-off between complexity and a fit measure of the model.

的best解决方案是找到的最大距离 D 点按图所示:

The best solution was to find the point with the maximum distance d according to the figure shown:

注:我没有看到你挂的文件还没有。

Note: I haven't read the paper you linked to yet..

 
精彩推荐