聚类算法,其中的文档可以在多个群集多个、算法、文档

2023-09-11 05:47:02 作者:拉钩之后为什么要上吊

我正在寻找一个聚类算法,允许每个文件的属于多个集群(如到至少 K 集群)。

I'm looking for a clustering algorithm that allows each document to belong to more than one cluster (eg. to at least Kclusters).

所有集群算法我研究创建的分区的数据集,这意味着每个文件将只在一个集群中。

All the cluster algorithms I studied create a partition of the dataset, which means that every document will be in only one cluster.

任何想法?

推荐答案

使用柔软的概率聚类算法像高斯混合型号。然后,这将给你属于所有可能的集群的每个实例的概率:只挑选顶端-N,或者任何高于某一概率阈值,或某些其它方案以允许多个会员

Use a soft, probabilistic clustering algorithm like a Gaussian Mixture Model. This will then give you a probability of each instance belonging to all possible clusters: just pick the top-N, or any above a certain probability threshold, or some other scheme to allow multiple membership.