算法自动完成?算法、自动完成

2023-09-10 22:46:28 作者:撕心不用裂肺

我指的是用来给查询建议的算法,当用户在谷歌搜索词。

I am referring to the algorithm that is used to give query suggestions when a user types a search term in Google.

我主要感兴趣的是如何谷歌的算法是能够显示: 1.最重要的成果(最有可能的查询,而不是任何匹配) 2.匹配子 3.模糊匹配

I am mainly interested in how Google's algorithm is able to show: 1. Most important results (most likely queries rather than anything that matches) 2. Match substrings 3. Fuzzy matches

我知道你可以使用特里或全身线索找到匹配,但它不能满足上述要求...

I know you could use Trie or generalized trie to find matches, but it wouldn't meet the above requirements...

类似的问题问早些时候here

推荐答案

有关(嘿嘿)真棒模糊/部分字符串匹配算法,看看该死的冷静算法:

For (heh) awesome fuzzy/partial string matching algorithms, check out Damn Cool Algorithms:

http://blog.notdot.net/2007/4/Damn-Cool-Algorithms-Part-1-BK-Trees http://blog.notdot.net/2010/07/Damn-Cool-Algorithms-Levenshtein-Automata http://blog.notdot.net/2007/4/Damn-Cool-Algorithms-Part-1-BK-Trees http://blog.notdot.net/2010/07/Damn-Cool-Algorithms-Levenshtein-Automata

这些不替换尝试,而是prevent蛮力查找的尝试 - 这仍然是一个巨大的胜利。接下来,你可能想办法约束线索的大小:

These don't replace tries, but rather prevent brute-force lookups in tries - which is still a huge win. Next, you probably want a way to bound the size of the trie:

在保持全球使用近/前N个字的线索; 为每个用户,近期保持/顶N个字的线索为该用户。

最后,要prevent查找尽可能...

Finally, you want to prevent lookups whenever possible...

在缓存中查找结果:如果用户点击任何搜索结果,你可以成为那些非常快,然后异步获取完整的部分/模糊查找。 precompute查找结果:如果用户输入申请,他们很可能会继续与苹果,应用 prefetch数据:例如,一个web应用程序可以发送一组结果到浏览器的小,小到足以使蛮力搜索的JS可行。