说我有一组字符串:
Set<String> things = new HashSet<String>();
things.add("coffee cup");
things.add("smartphone");
things.add("inkjet printer");
// :
// list could be quite large (100K or so, perhaps loaded from a database)
// :
现在我要检查,如果另一个字符串完全包含字符串在上面的设置。所以:
Now I want to check if another string completely contains any of the Strings in the above set. So:
"a coffee cup" - matches
"android smartphone" - matches
"inkjet printer for sale" - matches
"laser printer" - does not match
"printer" - does not match
我能想到的是通过一系列的迭代(及盈亏荷兰国际集团如果找到)的唯一方法。是否有一个更高效,更优雅的方式来做到这一点?
The only way I can think of is iterating through the set (and break-ing if found). Is there a more efficient and elegant way to do this?
您需要阿霍Corasick算法。 http://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_string_matching_algorithm
You need Aho-Corasick algorithm. http://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_string_matching_algorithm
https://github.com/raymanrt/aho-corasick
时间复杂度是O(米)为preprocessing(其中m是串集合中的总长度)和O(n)的匹配(其中n是匹配的字符串的长度)。因此,它是渐近最优的。
Time complexity is O(m) for preprocessing (where m is total length of strings in the set) and O(n) for matching (where n is length of matched string). So it's asymptotically optimal.