k均值聚类的超过500万矢量矢量、均值

2023-09-11 02:28:01 作者：吔許，伱卟属於硪

我已经打了一个真正的问题。我需要做一些k均值聚类500万矢量，每个约含32 COLS。我尝试了Mahout的需要linux和我在窗口，我是从使用的是Linux操作系统和任何形式的模拟器限制。

I have hit a real problem. I need to do some Kmeans clustering for 5 million vectors, each containing about 32 cols. I tried out Mahout which requires linux and I am on windows, I am restrained from using a Linux OS and any sort of simulator.

任何人都可以提出一个k均值聚类算法，该算法可扩展高达5M的载体，可以快速收敛？

Can anyone suggest a KMeans clustering algorithm that is scalable upto 5M vectors and can converge quickly?

我已经测试了几个，但他们不会规模。这意味着它们是缓慢的，并采取永远完成。

I have tested a few but they wont scale. Which means they are slow and take forever to complete.

感谢

推荐答案

确定，那么，谁曾经想聚集大规模的数据集，这样做的唯一方法是使用Mahout的。它需要一个Linux平台。所以我只好用虚拟的盒子，放在Ubuntu的它，然后使用Mahout的。它是一个漫长的过程来建立Mahout中，但是这两个环节，我用如下：

OK, So who ever wants clustering for large scale datasets, the only way of doing so is to use Mahout. IT requires a linux platform. So I had to use virtual box, placed Ubuntu on it and then used Mahout. Its a lengthy procedure to set up Mahout, but the two links that I used are as follows.

的http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster)

的http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)

上一篇：这是实现质数的算法发现在Java中的最佳方式是什么？我们如何让库类和使用，然后在Java中？这是、质数、算法、现在

下一篇：得到对X独特的人数从一组独特、人数

相关推荐