让我们说我有一个字符串
你好
和一个列表
Let's say I have a string
"Hello"
and a list
words = ['hello', 'Hallo', 'hi', 'house', 'key', 'screen', 'hallo','question', 'Hallo', 'format']
我如何找到 n个字
这是最接近你好
和present在列表字
?
How can I find the n words
that are the closest to "Hello"
and present in the list words
?
在这种情况下,我们将有 ['你好','你好','你好','喜','格式'...]
In this case, we would have ['hello', 'hallo', 'Hallo', 'hi', 'format'...]
所以,策略是从最接近的单词列表字排序,最远的。
So the strategy is to sort the list words from the closest word to the furthest.
我想过这样的事情
word = 'Hello'
for i, item in enumerate(words):
if lower(item) > lower(word):
...
但它在大型列表非常慢。
but it's very slow in large lists.
更新
difflib
的作品,但它的速度很慢也。 (单词列表
内部有630000+字(排序,每行一个))。因此,检查清单需要5到7秒,每寻找最接近的词!
UPDATE
difflib
works but it's very slow also. (words list
has 630000+ words inside (sorted and one per line)). So checking the list takes 5 to 7 seconds for every search for closest word!
使用difflib.get_close_matches.
>>> words = ['hello', 'Hallo', 'hi', 'house', 'key', 'screen', 'hallo', 'question', 'format']
>>> difflib.get_close_matches('Hello', words)
['hello', 'Hallo', 'hallo']
请看看文档,因为该函数返回3个或更少最接近的匹配默认情况下。
Please look at the documentation, because the function returns 3 or less closest matches by default.