如何测试对大量的普通恩pressions快速知道哪一个匹配?快速、普通、测试、pressions

2023-09-05 02:18:58 作者:明媚

我在使用.NET编写一个程序,用户可以提供大量的普通恩pressions。对于一个给定的字符串,我需要找出哪些定期EX pression该字符串(如果有一个以上的比赛,我只需要第一个匹配)匹配。但是,如果有大量的常规前pressions这个操作可能需要一个很长的时间。

I'm writing a program in .net where the user may provide a large number of regular expressions. For a given string, I need to figure out which regular expression matches that string (if more than one matches, I just need the first one that matches). However, if there are a large number of regular expressions this operation can take a very long time.

我有点希望会有的。NET类似弯曲的东西(的快速词汇分析(不是Adobe的Flex)),这将让我又迅速指定了大量的普通恩pressions(O(N )根据维基百科对n = LEN(输入字符串)),找出哪些定期EX pression比赛。

I was somewhat hoping there would be something similar to flex (The Fast Lexical Analyzer (not Adobe Flex)) for .net that would allow me to specify a large number of regular expressions yet quickly (O(n) according to Wikipedia for n = len(input string)) figure out which regular expression matches.

另外,我想preFER没有实现我自己经常EX pression引擎:。)

Also, I would prefer not to implement my own regular expression engine :).

推荐答案

查找文本常量的最大的一块在每个正则表达式(如果超过一定的阈值长度)和使用卡普 - 拉宾算法同时搜索任何这些字符串。对于每场比赛,运行正则表达式来看看整个事情相匹配。对于每一个正则表达式中的多字符串搜索不包括,搜索该正则表达式直接。

Find the biggest chunk of constant text in each regex (if above a certain threshold length) and use the Karp-Rabin algorithm to search for any of those strings simultaneously. For each match, run that regex to see if the whole thing matches. For each regex not included in the multi string search, search that regex directly.

这应该给你不错的表现为大量的经常EX pressions,如果他们有合理的长度不变子,假设你有preprocessing时间可用于常规的前pressions。

This should give you good performance for a large number of regular expressions if they have reasonable-length constant substrings, assuming you have preprocessing time available for the regular expressions.