我正在寻找一个聚类算法,允许每个文件的属于多个集群(如到至少 K
集群)。
I'm looking for a clustering algorithm that allows each document to belong to more than one cluster (eg. to at least K
clusters).
所有集群算法我研究创建的分区的数据集,这意味着每个文件将只在一个集群中。
All the cluster algorithms I studied create a partition of the dataset, which means that every document will be in only one cluster.
任何想法?
使用柔软的概率聚类算法像高斯混合型号。然后,这将给你属于所有可能的集群的每个实例的概率:只挑选顶端-N,或者任何高于某一概率阈值,或某些其它方案以允许多个会员
Use a soft, probabilistic clustering algorithm like a Gaussian Mixture Model. This will then give you a probability of each instance belonging to all possible clusters: just pick the top-N, or any above a certain probability threshold, or some other scheme to allow multiple membership.