查找给出一个集合的最长的单词单词、最长

2023-09-11 02:52:26 作者:冲刺 最后的希望

这是一个谷歌的面试问题,我在网上找到的使用HashMap的或类似的数据结构最答案。我试图找到利用特里如果可能的解决方案。任何人都可以给我一些提示?

It is a google interview question and I find most answers online using HashMap or similar data structure. I am trying to find a solution using Trie if possible. Anybody could give me some hints?

下面的问题: 给出一个词典,在每行包含一个字的文件的形式。例如,

Here is the question: You are given a dictionary, in the form of a file that contains one word per line. E.g.,

abacus 
deltoid 
gaff 
giraffe 
microphone 
reef 
qar 

您也给出字母的集合。例如,

You are also given a collection of letters. E.g.,

{a, e, f, f, g, i, r, q}. 

的任务是找到最长的字,它可与收集拼写为字典 字母。例如,对于该示例值正确答案以上是长颈鹿。 (注意 礁是不是一个可能的答案,因为该组信件中只包含一个E。)

The task is to find the longest word in the dictionary that can be spelled with the collection of letters. For example, the correct answer for the example values above is "giraffe". (Note that "reef" is not a possible answer, because the set of letters contains only one "e".)

Java的实现将是preferred。

Java implementation would be preferred.

推荐答案

没有Java的code。你可以明白这一点了吧。

No Java code. You can figure that out for yourself.

假设我们需要做到这一点很多次,这里就是我想要做的:

Assuming that we need to do this lots of times, here's what I'd do:

我通过在字典中的每个字包括26位,其中位[信]设置了打造签名开始当且仅当该字包含一个(或多个)信的实例。这些签名可以连接codeD作为一个Java INT

然后创建一个映射签名来一大堆单词与签名的映射。

Then create a mapping that maps signatures to lists of words with that signature.

要使用precomputed地图进行搜索:

To do a search using the precomputed map:

创建签名的一套,你要查找的单词的字母。

Create the signature for the set of letters you want to find the words for.

然后遍历映射的键寻找键,其中(按键及(〜签名)== 0)。这就给了你候选条件不包含任何信件,是不是在为所需字母集的短名单。

Then iterate over the keys of the mapping looking for keys where (key & (~signature) == 0). That gives you a short list of "possibles" that don't contain any letter that is not in the required letter set.

迭代短名单寻找与右边的数字的话的每一个所需的信件,记录时间最长的打击。

Iterate over the short list looking for words with the right number of each of the required letters, recording the longest hit.

注:

虽然主要的搜索大致 O(N)字典中的单词数,测试是非常便宜。

While the primary search is roughly O(N) on the number of words in the dictionary, the test is extremely cheap.

此方法具有需要相对小的存储器内数据结构,即(最有可能的)有良好的局部性的优点。那可能是有利于更快的搜索。

This approach has the advantage of requiring a relatively small in-memory data structure, that (most likely) has good locality. That is likely to be conducive to faster searches.

下面有一个想法,对于加快 O(N)以上搜索步骤。

为什么你读了很多书,却感觉没什么收获

Here's an idea for speeding up the O(N) search step above.

与签名图上面开始,创建(precompute)衍生的地图为的所有的话就的包含特定的双字母;即,一个含有AB,为AC,BC,文字......和YZ。然后,如果你正在寻找含(说)P和Q的话,你可以扫描PQ衍生物地图。这将减少 O(N)一步大约 26 ^ 2 ...在更多的内存为代价额外的地图。

Starting with the signature map above, create (precompute) derivative maps for all words that do contain specific pairs letters; i.e. one for words containing AB, for AC, BC, ... and for YZ. Then if you are looking for words containing (say) P and Q, you can just scan the PQ derivative map. That will reduce O(N) step by roughly 26^2 ... at the cost of more memory for the extra maps.

这可以扩展到3个或多个字母,但不足之处是在内存使用爆炸。

That can be extended to 3 or more letters, but the downside is the explosion in memory usage.

另外一个潜在的调整是(在某种程度上)偏差的首字母对朝信件/是很少出现对的选择。但是,这增加了前期的开销可能比(平均)大于节省您从搜索更短的清单得到。

Another potential tweak is to (somehow) bias the selection of the initial letter pair towards letters/pairs that occur infrequently. But that adds an up-front overhead which could be greater than the (average) saving you get from searching a shorter list.