正则表达式来查找定位标记包括新行的C＃.NET标记、正则表达式、NET

2023-09-06 15:20:27 作者：最佳搭档

我要寻找从achore标签的HREF。所以，我已经使用正则表达式为

I want to find the href from an achore tag. So I have used regex as

 <a\s*[^>]*\s*href\s*\=\s*([^(\s*|\>)]*)\s*[^>]*>\s*Text\s*<\/a>
 Options = Ignorecase + singleline

示例

    <a href="/abc/xzy/pqr.com" class="m">Text</a>
So Group[1]="/abc/xzy/pqr.com"

但是如果该含量如

But If the content is like

     <a href="/abc/xzy/                     //Contains new line
    pqr.com" class="m">Text</a>  


so Group[1]="/abc/xzy/

所以，我想知道如何得到/abc/xzy/pqr.com如果内容包含新行（\ r \ n）的

So I want to know how to get "/abc/xzy/pqr.com" if the content contains new line(\r\n)

推荐答案

您捕获组是有点不可思议： [^（\ s * | \＆GT;）] * 是一个字符类，它会匹配任何字符不是（，ROR一个字符类 \ S ，也不是一个星号 * 等。

Your capture group is a bit weird: [^(\s*|\>)]* is a character class and it will match any character not (, ror a character class \s, nor an asterisk *, etc.

有什么可以做的不过是之前把报价和捕获组后：

What you can do however is to put quotes before and after the capture group:

<a\s*[^>]*\s*href\s*\=\s*"([^(\s*|\>)]*)"\s*[^>]*>\s*Text\s*<\/a>
                         ^              ^

然后是字符类改为 [^] （没有引号）：

<a\s*[^>]*\s*href\s*\=\s*"([^"]*)"\s*[^>]*>\s*Text\s*<\/a>
                           ^^^^

regex101演示。

这表示，这将是更好地使用正则表达式的一个适当的HTML解析器来代替。只是，它更乏味，使合适的正则表达式，因为你可以忘掉很多不同的情况，但如果你一定怎么样的数据来通过，正则表达式可能是一个快速的方法来得到你所需要的。

This said, it would be better to use a proper html parser instead of regex. It's just that it's more tedious to make a suitable regex because you can forget about a lot of different scenarios, but if you're certain of how your data comes through, regex might be a quick way to get what you need.

如果你要考虑单引号和没有引号在所有在某些情况下，你可以试试这个：

If you want to consider single quotes and no quotes at all in some cases, you might try this instead:

<a\s*[^>]*\s*href\s*=\s*((?:[^ ]|[\n\r])+)\s*[^>]*>\s*Text\s*<\/a>

更新regex101 。

这正则表达式有这一部分，而不是（：[^] | [\ñ\ r]）+ ，它接受的非空间和换行符（和回车就在外壳）。需要注意的是 \ S 包含空格，制表符，换行符和换页。

This regex has this part instead (?:[^ ]|[\n\r])+ which accepts non-spaces and newlines (and carriage returns just in case). Note that \s contains white spaces, tabs, newlines and form-feed.

上一篇：JEE 无法运行 JAX-RS WebService 框架应用程序应用程序、框架、JAX、JEE

下一篇：Jersey 和 Google Guice 的集成Jersey、Google、Guice

相关推荐

精彩图集

精彩推荐

图片推荐