Levenshtein算法 - 快速失败如果编辑距离大于给定阈值阈值、算法、距离、编辑

2023-09-11 04:46:53 作者:悲伤的季节

有关Levenshtein算法我发现this实施德尔福。

我需要它尽快的最大距离被击中停止一个版本,并返回迄今为止发现的距离。

我的第一个想法是检查当前结果每次迭代后:

 对于i:= 1到n做
    对于j:= 1至M做
    开始
      D [I,J]:=最小值(最小(四〔I-1,j]的1,D [I,J-1] 1)中,d [I-1,J-1] +整数(S [ 1]  - ;> T [J]));

      //检查
      结果:= D [N,M]。
      如果结果> MAX,然后
      开始
        出口;
      结束;

    结束;
 

解决方案

我猜想你想要的是找到莱文施泰因距离,如果是低于 MAX ,对不对?

最小编辑距离算法

如果是这样,达到了一个比 MAX 大是不够的,因为它只是意味着的部分的路径是长于,但不不存在任何较短的路径。为了确保比 MAX 没有路径短,可以发现,一个人来监视路径的最小可能长度,直到目前点位,即最小过远处的列表中。

我不擅长德尔福,但我认为,code应该是这个样子:

 对于i:= 1到n做
开始;
    分钟:= MAX + 1
    对于j:= 1至M做
    开始;
      D [I,J]:=最小值(最小(四〔I-1,j]的1,D [I,J-1] 1)中,d [I-1,J-1] +整数(S [ 1]  - ;> T [J]));
      分钟:=最小(min时,D [I,J])
    结束;
    如果分> = MAX,然后
        出口;
结束;
 

For the Levenshtein algorithm I have found this implementation for Delphi.

I need a version which stops as soon as a maximum distance is hit, and return the distance found so far.

My first idea is to check the current result after every iteration:

for i := 1 to n do
    for j := 1 to m do
    begin
      d[i, j] := Min(Min(d[i-1, j]+1, d[i,j-1]+1), d[i-1,j-1]+Integer(s[i] <> t[j]));

      // check   
      Result := d[n, m];
      if Result > max then
      begin
        Exit;
      end; 

    end;

解决方案

I gather what you want is to find the levenstein distance, if it is below MAX, right?

If so, reaching a value larger than MAX is not enough, since it only means that some path is longer than that, but not that there exists no shorter path. To make sure no path shorter than MAX can be found, one has to monitor the minimum possible length of a path until the current point, i.e. the minimum over a column in the distance table.

I'm not good at Delphi, but I think the code should look something like this:

for i := 1 to n do
begin;
    min := MAX + 1
    for j := 1 to m do
    begin;
      d[i, j] := Min(Min(d[i-1, j]+1, d[i,j-1]+1), d[i-1,j-1]+Integer(s[i] <> t[j]));
      min := Min(min, d[i,j])
    end;
    if min >= MAX then
        Exit;
end;