快速近似字符串匹配算法近似、字符串、算法、快速

2023-09-11 04:19:37 作者:Mawkish 自作多情

由于源字符串取值 N 相等长度的字符串,我需要找到一个快速的算法来回报那些有至多 K ,可从每个相应的位置上源字符串取值不同的字符的字符串。

Given a source string s and n equal length strings, I need to find a quick algorithm to return those strings that have at most k characters that are different from the source string s at each corresponding position.

什么是快速算法来做到这一点?

What is a fast algorithm to do so?

PS:我有要求,这是一个学术的问题。我想找到最有效的算法,如果可能的话。

PS: I have to claim that this is a academic question. I want to find the most efficient algorithm if possible.

此外,我错过了一个信息非常重要的一块。该 N 相等长度的字符串形成一本字典,对其中许多源字符串取值会被人质疑。似乎有某种preprocessing步骤,使之更有效率。

Also I missed one very important piece of information. The n equal length strings form a dictionary, against which many source strings s will be queried upon. There seems to be some sort of preprocessing step to make it more efficient.

推荐答案

塞奇威克在他的著作算法写道:的三元搜索树允许找到一个给定的海明内的所有的话距离查询词的。在道博博士的 文章

Sedgewick in his book "Algorithms" writes that Ternary Search Tree allows "to locate all words within a given Hamming distance of a query word". Article in Dr. Dobb's