Python的:找到最接近的字符串(从列表中)到另一个字符串字符串、最接近、列表中、Python

2023-09-11 03:38:21 作者:别等时光非礼了梦想.

让我们说我有一个字符串 你好和一个列表

Let's say I have a string "Hello" and a list

words = ['hello', 'Hallo', 'hi', 'house', 'key', 'screen', 'hallo','question', 'Hallo', 'format']

我如何找到 n个字这是最接近你好和present在列表

How can I find the n words that are the closest to "Hello" and present in the list words ?

在这种情况下,我们将有 ['你好','你好','你好','喜','格式'...]

In this case, we would have ['hello', 'hallo', 'Hallo', 'hi', 'format'...]

所以,策略是从最接近的单词列表字排序,最远的。

So the strategy is to sort the list words from the closest word to the furthest.

我想过这样的事情

word = 'Hello'
for i, item in enumerate(words):
    if lower(item) > lower(word):
      ...

但它在大型列表非常慢。

but it's very slow in large lists.

更新 difflib 的作品,但它的速度很慢也。 (单词列表内部有630000+字(排序,每行一个))。因此,检查清单需要5到7秒,每寻找最接近的词!

UPDATE difflib works but it's very slow also. (words list has 630000+ words inside (sorted and one per line)). So checking the list takes 5 to 7 seconds for every search for closest word!

推荐答案

使用difflib.get_close_matches.

>>> words = ['hello', 'Hallo', 'hi', 'house', 'key', 'screen', 'hallo', 'question', 'format']
>>> difflib.get_close_matches('Hello', words)
['hello', 'Hallo', 'hallo']

请看看文档,因为该函数返回3个或更少最接近的匹配默认情况下。

Please look at the documentation, because the function returns 3 or less closest matches by default.