检测一个正则表达式是指数指数、正则表达式

2023-09-11 04:01:34 作者:不敢奢望你会被我感动

表明,有一些正则表达式是O(这文章2 ^ n)的回溯的时候。 这个例子是(X + X +)+ Y 。 当尝试匹配像XXXX串... P这与去想出办法来的,它无法匹配之前,原路返回了一会儿。

This article show that there is some regexp that is O(2^n) when backtracking. The example is (x+x+)+y. When attempt to match a string like xxxx...p it going to backtrack for a while before figure it out that it couldn't match.

有没有一种方法来检测这样的正则表达式?

Is there a way to detect such regexp?

感谢

推荐答案

如果您的正则表达式引擎公开运行时的指数行为(X + X +)+ Y,那么它的破的,因为DFA或NFA可以在线性时间认识这个模式:

If your regexp engine exposes runtime exponential behavior for (x+x+)+y ,then it is broken because a DFA or NFA can recognize this pattern in linear time:

echo "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" | egrep "(x+x+)+y"
echo "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxy" | egrep "(x+x+)+y"

这两种立刻回答。

both answer immediately.

事实上,只有少数情况下(如反向引用),其中回溯真正需要的(主要是,因为有一个反向引用一个正则表达式是的没有的一个普通的前pression在理论上的语言意义了)。一种能够实现应改用只回溯这些角案件时给予。

In fact, there are only a few cases (like backreferences) where backtracking is really needed (mainly, because a regexp with a backreference is not a regular expression in the language theoretic sense anymore). A capable implementation should switch to backtracking only when these corner cases are given.

在公平,DFA的有黑暗的一面也是如此,因为有些正则表达式的有指数型的尺寸要求,但尺寸约束上是更容易执行比一个时间限制和庞大的DFA运行线性输入,所以它比一个更好的交易小backtracker窒息的一对夫妇X的。

In fairness, DFA's have a dark side too, because some regexp's have exponential size requirements, but a size contraints is easier to enforce than a time constraint and the huge DFA runs linear on the input, so it's a better bargain than a small backtracker choking on a couple of X's.

您应该的真的阅读拉斯考克斯出色的系列文章中有关正则表达式的实现(和回溯的病态行为)的

 
精彩推荐
图片推荐