我怎么超时正则表达式操作prevent挂在.NET 4.5?挂在、操作、我怎么、正则表达式

2023-09-03 02:08:30 作者:我命由我不由天,

有能够限制正则表达式匹配操作的持续时间可能是有用的模式时次。特别是,与用户提供的模式中工作时,以匹配数据,该模式可能会表现出因嵌套量词和过度回跟踪(参见灾难性回溯)。施加超时的一种方法是异步运行正则表达式,但是这可能是很乏味和杂波的code

There are times when being able to limit the pattern matching duration of regex operations could be useful. In particular, when working with user supplied patterns to match data, the pattern might exhibit poor performance due to nested quantifiers and excessive back-tracking (see catastrophic backtracking). One way to apply a timeout is to run the regex asynchronously, but this can be tedious and clutters the code.

根据有什么新的.NET框架4.5开发preVIEW 它看起来像有一个新的内置方法来支持这一点:

According to what's new in the .NET Framework 4.5 Developer Preview it looks like there's a new built-in approach to support this:

能够限制常规EX pression发动机将花费多长时间尝试   之前超时,以解决常规的前pression。

Ability to limit how long the regular expression engine will attempt to resolve a regular expression before it times out.

我如何使用这个功能吗?此外,什么我需要知道何时使用它吗?

How can I use this feature? Also, what do I need to be aware of when using it?

注:我询问的和的回答这个问题,因为它鼓励

Note: I'm asking and answering this question since it's encouraged.

推荐答案

我最近在研究这个话题,因为它让我感兴趣的,并会在这里盖的要点。有关MSDN文档这里提供,你可以检查出在正则表达式类,以看到新的重载的构造函数和静态方法。在code可对样品的Visual Studio 11开发preVIEW运行。

I recently researched this topic since it interested me and will cover the main points here. The relevant MSDN documentation is available here and you can check out the Regex class to see the new overloaded constructors and static methods. The code samples can be run with Visual Studio 11 Developer Preview.

正则表达式类接受 时间跨度 指定超时时间。您可以在宏观和微观层面的应用程序指定超时,它们可以一起使用:

The Regex class accepts a TimeSpan to specify the timeout duration. You can specify a timeout at a macro and micro level in your application, and they can be used together:

设置使用REGEX_DEFAULT_MATCH_TIMEOUT属性AppDomain.SetData方法(宏应用范围广) 通过 matchTimeout 参数(微局部范围) Set the "REGEX_DEFAULT_MATCH_TIMEOUT" property using the AppDomain.SetData method (macro application-wide scope) Pass the matchTimeout parameter (micro localized scope)

的AppDomain 属性设置,所有的正则表达式操作将使用该值作为默认的超时时间。要覆盖应用程序范围默认情况下,你简单地传递一个 matchTimeout 值的正则表达式构造函数或静态方法。如果的AppDomain 默认是没有设置,而 matchTimeout 未指定,那么模式匹配不会超时(即,原来的pre-.NET 4.5的行为)。

When the AppDomain property is set, all Regex operations will use that value as the default timeout. To override the application-wide default you simply pass a matchTimeout value to the regex constructor or static method. If an AppDomain default isn't set, and matchTimeout isn't specified, then pattern matching will not timeout (i.e., original pre-.NET 4.5 behavior).

有两个主要的例外处理:

There are 2 main exceptions to handle:

RegexMatchTimeoutException:当发生超时异常。 ArgumentOutOfRangeException :抛出时, matchTimeout 比约为24天负或大于此外,零时间跨度值将导致此抛出。 RegexMatchTimeoutException: thrown when a timeout occurs. ArgumentOutOfRangeException: thrown when "matchTimeout is negative or greater than approximately 24 days." In addition, a TimeSpan value of zero will cause this to be thrown.

尽管没有被允许负值,有一个例外:-1毫秒的值被接受。在内部,正则表达式类接受-1毫秒,这是的Regex.InfiniteMatchTimeout现场,以表示比赛应该不会超时(即原来的pre-.NET 4.5的行为)。

Despite negative values not being allowed, there's one exception: a value of -1 ms is accepted. Internally the Regex class accepts -1 ms, which is the value of the Regex.InfiniteMatchTimeout field, to indicate that a match should not timeout (i.e., original pre-.NET 4.5 behavior).

在下面的例子中,我将演示有效和无效的超时方案以及如何处理它们:

In the following example I'll demonstrate both valid and invalid timeout scenarios and how to handle them:

string input = "The quick brown fox jumps over the lazy dog.";
string pattern = @"([a-z ]+)*!";
var timeouts = new[]
{
    TimeSpan.FromSeconds(4),     // valid
    TimeSpan.FromSeconds(-10)    // invalid
};

foreach (var matchTimeout in timeouts)
{
    Console.WriteLine("Input: " + matchTimeout);
    try
    {
        bool result = Regex.IsMatch(input, pattern,
                                    RegexOptions.None, matchTimeout);
    }
    catch (RegexMatchTimeoutException ex)
    {
        Console.WriteLine("Match timed out!");
        Console.WriteLine("- Timeout interval specified: " + ex.MatchTimeout);
        Console.WriteLine("- Pattern: " + ex.Pattern);
        Console.WriteLine("- Input: " + ex.Input);
    }
    catch (ArgumentOutOfRangeException ex)
    {
        Console.WriteLine(ex.Message);
    }
    Console.WriteLine();
}

在使用正则表达式类的一个实例可以访问的MatchTimeout物业:

When using an instance of the Regex class you have access to the MatchTimeout property:

string input = "The English alphabet has 26 letters";
string pattern = @"\d+";
var matchTimeout = TimeSpan.FromMilliseconds(10);
var sw = Stopwatch.StartNew();
try
{
    var re = new Regex(pattern, RegexOptions.None, matchTimeout);
    bool result = re.IsMatch(input);
    sw.Stop();

    Console.WriteLine("Completed match in: " + sw.Elapsed);
    Console.WriteLine("MatchTimeout specified: " + re.MatchTimeout);
    Console.WriteLine("Matched with {0} to spare!",
                         re.MatchTimeout.Subtract(sw.Elapsed));
}
catch (RegexMatchTimeoutException ex)
{
    sw.Stop();
    Console.WriteLine(ex.Message);
}

使用的AppDomain属性

REGEX_DEFAULT_MATCH_TIMEOUT属性用于设置应用程序范围默认值:

Using the AppDomain property

The "REGEX_DEFAULT_MATCH_TIMEOUT" property is used set an application-wide default:

AppDomain.CurrentDomain.SetData("REGEX_DEFAULT_MATCH_TIMEOUT",
                                TimeSpan.FromSeconds(2));

如果这个属性被设置为无效时间跨度值或无效的对象,一个TypeInitializationException试图用一个正则表达式时,会被抛出。

If this property is set to an invalid TimeSpan value or an invalid object, a TypeInitializationException will be thrown when attempting to use a regex.

例使用有效的属性值:

// AppDomain default set somewhere in your application
AppDomain.CurrentDomain.SetData("REGEX_DEFAULT_MATCH_TIMEOUT",
                                TimeSpan.FromSeconds(2));

// regex use elsewhere...
string input = "The quick brown fox jumps over the lazy dog.";
string pattern = @"([a-z ]+)*!";

var sw = Stopwatch.StartNew();
try
{
    // no timeout specified, defaults to AppDomain setting
    bool result = Regex.IsMatch(input, pattern);
    sw.Stop();
}
catch (RegexMatchTimeoutException ex)
{
    sw.Stop();
    Console.WriteLine("Match timed out!");
    Console.WriteLine("Applied Default: " + ex.MatchTimeout);
}
catch (ArgumentOutOfRangeException ex)
{
    sw.Stop();
}
catch (TypeInitializationException ex)
{
    sw.Stop();
    Console.WriteLine("TypeInitializationException: " + ex.Message);
    Console.WriteLine("InnerException: {0} - {1}",
        ex.InnerException.GetType().Name, ex.InnerException.Message);
}
Console.WriteLine("AppDomain Default: {0}",
    AppDomain.CurrentDomain.GetData("REGEX_DEFAULT_MATCH_TIMEOUT"));
Console.WriteLine("Stopwatch: " + sw.Elapsed);

使用上面的例子具有无效(负)值将导致抛出异常。在code处理它写入下面的消息到控制台:

Using the above example with an invalid (negative) value would cause the exception to be thrown. The code that handles it writes the following message to the console:

TypeInitializationException:的类型初始   System.Text.RegularEx pressions.Regex'引发了异常。

正则表达式匹配网页内容

TypeInitializationException: The type initializer for 'System.Text.RegularExpressions.Regex' threw an exception.

的InnerException:ArgumentOutOfRangeException - 指定参数   超出有效值的范围。参数名:AppDomain中的数据   REGEX_DEFAULT_MATCH_TIMEOUT'包含了无效值或对象   指定一个默认匹配超时   System.Text.RegularEx pressions.Regex。

InnerException: ArgumentOutOfRangeException - Specified argument was out of the range of valid values. Parameter name: AppDomain data 'REGEX_DEFAULT_MATCH_TIMEOUT' contains an invalid value or object for specifying a default matching timeout for System.Text.RegularExpressions.Regex.

在这两个例子中的 ArgumentOutOfRangeException 不抛出。为了完整的code显示了所有的异常,你能处理好与新的.NET 4.5 正则表达式超时功能工作时。

In both examples the ArgumentOutOfRangeException isn't thrown. For completeness the code shows all the exceptions you can handle when working with the new .NET 4.5 Regex timeout feature.

重写的AppDomain 默认情况下是通过指定一个 matchTimeout 值完成。在下面的例子比赛超时2秒而不是5秒的默认。

Overriding the AppDomain default is done by specifying a matchTimeout value. In the next example the match times out in 2 seconds instead of the default of 5 seconds.

AppDomain.CurrentDomain.SetData("REGEX_DEFAULT_MATCH_TIMEOUT",
                                TimeSpan.FromSeconds(5));

string input = "The quick brown fox jumps over the lazy dog.";
string pattern = @"([a-z ]+)*!";

var sw = Stopwatch.StartNew();
try
{
    var matchTimeout = TimeSpan.FromSeconds(2);
    bool result = Regex.IsMatch(input, pattern,
                                RegexOptions.None, matchTimeout);
    sw.Stop();
}
catch (RegexMatchTimeoutException ex)
{
    sw.Stop();
    Console.WriteLine("Match timed out!");
    Console.WriteLine("Applied Default: " + ex.MatchTimeout);
}

Console.WriteLine("AppDomain Default: {0}",
    AppDomain.CurrentDomain.GetData("REGEX_DEFAULT_MATCH_TIMEOUT"));
Console.WriteLine("Stopwatch: " + sw.Elapsed);

结束语

MSDN建议设置的所有常规前pression模式匹配操作超时值。然而,他们不请你注意要注意的问题这样做的时候。我不建议设置一个AppDomain默认情况下,把它一天。你需要知道你的输入,知道你的模式。如果输入是大的,或模式是复杂的,一个适当的超时值应该被使用。这也可能意味着测量你的批判性的表演正则表达式的用途分配理智的默认值。任意指定超时值以一个正则表达式用来做工精细可能会导致它打破,如果该值不够长。如果你认为它可能太早放弃匹配尝试分配一个值之前测量现有用途。

Closing Remarks

MSDN recommends setting a time-out value in all regular expression pattern-matching operations. However, they don't draw your attention to issues to be aware of when doing so. I don't recommend setting an AppDomain default and calling it a day. You need to know your input and know your patterns. If the input is large, or the pattern is complex, an appropriate timeout value should be used. This might also entail measuring your critically performing regex usages to assign sane defaults. Arbitrarily assigning a timeout value to a regex that used to work fine may cause it to break if the value isn't long enough. Measure existing usages before assigning a value if you think it might abort the matching attempt too early.

此外,处理用户提供的模式时,此功能非常有用。然而,学习如何编写表现良好正确的方式是很重要的。拍打就可以了超时,以弥补缺乏知识在适当的模式建设是不是好的做法。

Moreover, this feature is useful when handling user supplied patterns. Yet, learning how to write proper patterns that perform well is important. Slapping a timeout on it to make up for a lack of knowledge in proper pattern construction isn't good practice.