正则表达式的特定标签标签、正则表达式

2023-09-06 10:11:03 作者:-/不管怎样活着就好

您好!

我的工作经常EX pression在.NET项目中得到一个特定的标记。我想整个DIV标签,其内容匹配:

I'm working on a regular expression in a .NET project to get a specific tag. I would like to match the entire DIV tag and its contents:

<html>
   <head><title>Test</title></head>
   <body>
     <p>The first paragraph.</p>
     <div id='super_special'>
        <p>The Store paragraph</p>
     </div>
     </body>
  </head>

code:

Code:

    Regex re = new Regex("(<div id='super_special'>.*?</div>)", RegexOptions.Multiline);


    if (re.IsMatch(test))
        Console.WriteLine("it matches");
    else
        Console.WriteLine("no match");

我要匹配这样的:

I want to match this:

<div id="super_special">
   <p>Anything could go in here...doesn't matter.  Let's get it all</p>
</div>

我想。应该让所有的字符,但它似乎有与cariage回报麻烦。什么是我的正则表达式失踪了?

I thought . was supposed to get all characters, but it seems to having trouble with the cariage returns. What is my regex missing?

感谢。

推荐答案

外的开箱即用,无需特殊改性剂,大多数正则表达式实现不超越结束的行来匹配文本。你或许应该看看你使用这样的修改正则表达式引擎的文档。

Out-of-the-box, without special modifiers, most regex implementations don't go beyond the end-of-line to match text. You probably should look in the documentation of the regex engine you're using for such modifier.

我有另外一个建议:提防贪婪!传统上,正则表达式的是贪婪的,这意味着你的正则表达式可能会匹配这样的:

I have one other advice: beware of greed! Traditionally, regex are greedy which means that your regex would probably match this:

<div id="super_special">
  I'm the wanted div!
</div>
<div id="not_special">
  I'm not wanted, but I've been caught too :(
</div>

您应该检查是否有不贪婪修改器,让你的正则表达式将停止在第一 occurence &LT匹配的文本; / DIV&GT; ,而不是在最后之一。

You should check for a "not-greedy" modifier, so that your regex would stop matching text at the first occurence of </div>, not at the last one.

此外,正如其他人所说,考虑使用正则表达式的一个HTML解析器来代替。它将为您节省大量的头痛。

Also, as others have said, consider using an HTML parser instead of regexes. It will save you a lot of headache.

编辑:即使是一个非贪婪正则表达式不会按预期或者,如果&LT; D​​IV&GT; s的嵌套!另一个原因考虑使用一个HTML解析器。的

even a non-greedy regex wouldn't work as expected either, if <div>s are nested! Another reason to consider using an HTML parser.