我在回答自己的问题精神张贴这一点。
我的问题是:如何实现Levenshtein算法计算两个字符串之间的编辑距离,因为这里描述,德尔福?
业绩刚一说明: 这件事情是非常快的。在我的桌面(2.33 GHz双核,2GB内存,WinXP的),我可以通过100K的字符串数组在不到一秒钟的运行。
解决方案 函数EditDistance(S,T:字符串):整数;
变种
D:整数数组的数组;
I,J,成本:整数;
开始
{
计算两个字符串之间的编辑距离。
算法和描述可以在这两种链接中找到:
http://en.wikipedia.org/wiki/Levenshtein_distance
http://www.google.com/search?q=Levenshtein+distance
}
//初始化我们的成本阵
SetLength(D,长度(S)+1);
对于i:=低(D)到高(D)也开始
SetLength(四[I]中,长度(t)的1);
结束;
对于i:=低(D)到高(D)也开始
D [我,0] =我;
对于j:=低(D [I])到高(D [I])也开始
D [0,J] = j的;
结束;
结束;
//保存我们的成本在2-D网格
对于i:=低(D)+1到高(D)也开始
对于j:=低(D [I])+ 1为高(D [I])也开始
如果s [i] = T [J]。然后开始
成本:= 0;
结束
别人开始
成本:= 1;
结束;
//使用最小,增加了数学到你使用条款!
D [I,J] =最小值(最小(
D [I-1,j]的1,//删除
D [I,J-1] 1),//插入
D [I-1,J-1] +成本//取代
);
结束; //对于j
结束; //为我
//现在我们已经存储的成本,返回最后一节
结果:= D [长度(多个),长度(吨)];
//动态数组的引用计数。
//没有必要释放他们
结束;
I'm posting this in the spirit of answering your own questions.
The question I had was: How can I implement the Levenshtein algorithm for calculating edit-distance between two strings, as described here, in Delphi?
Just a note on performance: This thing is very fast. On my desktop (2.33 Ghz dual-core, 2GB ram, WinXP), I can run through an array of 100K strings in less than one second.
解决方案function EditDistance(s, t: string): integer;
var
d : array of array of integer;
i,j,cost : integer;
begin
{
Compute the edit-distance between two strings.
Algorithm and description may be found at either of these two links:
http://en.wikipedia.org/wiki/Levenshtein_distance
http://www.google.com/search?q=Levenshtein+distance
}
//initialize our cost array
SetLength(d,Length(s)+1);
for i := Low(d) to High(d) do begin
SetLength(d[i],Length(t)+1);
end;
for i := Low(d) to High(d) do begin
d[i,0] := i;
for j := Low(d[i]) to High(d[i]) do begin
d[0,j] := j;
end;
end;
//store our costs in a 2-d grid
for i := Low(d)+1 to High(d) do begin
for j := Low(d[i])+1 to High(d[i]) do begin
if s[i] = t[j] then begin
cost := 0;
end
else begin
cost := 1;
end;
//to use "Min", add "Math" to your uses clause!
d[i,j] := Min(Min(
d[i-1,j]+1, //deletion
d[i,j-1]+1), //insertion
d[i-1,j-1]+cost //substitution
);
end; //for j
end; //for i
//now that we've stored the costs, return the final one
Result := d[Length(s),Length(t)];
//dynamic arrays are reference counted.
//no need to deallocate them
end;