鲜为人知的字符串相似度鲜为人知、字符串、相似

2023-09-11 04:50:34 作者:怂孬

这可能是一个很难回答的问题,但我研究的东西,我想知道是否有人知道鲜为人知的字符串相似度(见的此页熟知的人的例子)。我去过维基百科和Sourceforge上有一个很好的图书馆称为 Simmetrics ,提供的一串一串的度量算法的。有没有人做了一些调查或者已经发现了一些字符串的算法称为你的关注,因为没有多少用?

This may be a hard question to answer but I'm researching something and I was wondering if anyone knew of "lesser known" string similarity metrics (see this page for examples of well-known ones). I've been to wikipedia and Sourceforge has a nice library called Simmetrics with a bunch of string metric algorithms. Has anyone done some research or has found some string algorithm that called your attention as not much used?

感谢你。

推荐答案

本的页(LingPipe)给出了关于字符串比较的一些技巧。它谈论Damerau - 莱文施泰因距离,Needlman-Wunsch算法,杰卡德距离,哈罗 - 温克勒距离,TF / IDF的距离。距离理解为类似两个字符串之间。

This page (LingPipe) gives some tips about string comparisons. It talks about Damerau-Levenstein distance, Needlman-Wunsch algorithm, Jaccard distance, Jaro-Winkler distance, TF/IDF distance. Distance understood as similarity between two strings.

目前的页面的最后,它给引用,它也提供了Java执行准备使用(下载和放大器;授权)

At the end of the page, it gives references and it also provides a Java implementation ready to be used (download & license)