我怎么可能做出类似的词搜索匹配类似、我怎么

2023-09-11 05:40:11 作者:萌萌的小丸子

我正在尝试自动分类短文,我试图找出如何匹配类似的话 - 比如,搁板式货架或绘画和重绘

I'm working trying to automatically categorize short articles and I'm trying to figure out how to match similar words - eg, shelf shelves or painting and repaint

我使用的是波特所产生的算法,但它不仅有助于对某些情况下,只用单词的末尾(包括上面的例子不能使用它)。

I'm using the Porter stemming algorithm but it only helps for certain situations and only with the end of the word (both examples above don't work with it).

有一个算法或相关的单词列表,这将有助于像这样(做我自己之外?)

Is there an algorithm or related word lists that would help with something like this (outside of making my own?)

(我在PHP工作,所以在语言的任何解决方案会更有益。)

(I'm working in php so any solutions in that language would be more helpful.)

推荐答案

在 Levenshtein距离是你正在寻找的东西。

The Levenshtein Distance is what you are looking for.

有关的任​​何两个字符串,它计算的插入,突变和缺失的最小数量需要发生变化一根弦的其他

For any two strings, it calculates the minimum number of insertions, mutations and deletions that need to occur to changes one string to the other.

如果距离为低电平时,两个词都差不多。

If the distance is low then the two words are similar.

您也可以使用探测法的算法来确定两个词听起来很相似。

You could also use the Soundex algorithm to determine if two words sound similar.

另请参阅: PHP莱文斯坦功能 PHP同音功能

See also: PHP levenshtein function PHP soundex function