您所在的位置：首页 > 最新热文 > 计算机探索

根据他们的字符集簇的话的话、他们的、字符集、根据

2023-09-11 05:41:32 作者：我们配吗

说有一个词集，我想根据自己的炭包（多集），以集群它们。例如

Say there is a word set and I would like to clustering them based on their char bag (multiset). For example

{喝茶，吃饭，ABBA，AABB，你好}

{tea, eat, abba, aabb, hello}

将聚成

{{茶，吃}，{ABBA，AABB}，{你好}}。

{{tea, eat}, {abba, aabb}, {hello}}.

ABBA 和 AABB 聚集在一起，因为它们具有相同的炭包，即两个在和两个 B 。

abba and aabb are clustered together because they have the same char bag, i.e. two a and two b.

要让它有效，一个天真的方法可以让我想到的是隐蔽的每一个字成一个char-CNT系列，为〔实施例， ABBA 和 AABB 将都转换为 A2B2 ，茶/吃了会被转换为 a1e1t1 。所以，我可以建立与相同的密钥字典和组词。

To make it efficient, a naive way I can think of is to covert each word into a char-cnt series, for exmaple, abba and aabb will be both converted to a2b2, tea/eat will be converted to a1e1t1. So that I can build a dictionary and group words with same key.

两个问题：首先，我要的字符排序来构建的关键;第二，该字符串键看起来很笨拙且性能不如CHAR / INT键。

Two issues here: first I have to sort the chars to build the key; second, the string key looks awkward and performance is not as good as char/int keys.

有没有解决问题的更有效的方法？

Is there a more efficient way to solve the problem?

根据他们的字符集簇的话的话、他们的、字符集、根据

推荐答案