在运行一些测试这个答案,我注意到以下意外的行为。这将删除&LT所有出现;标签>
后的第一个:
While running some tests for this answer, I noticed the following unexpected behavior. This will remove all occurrences of <tag>
after the first:
var input = "<text><text>extra<words><text><words><something>";
Regex.Replace(input, @"(<[^>]+>)(?<=\1.*\1)", "");
// <text>extra<words><something>
但这不会:
Regex.Replace(input, @"(?<=\1.*)(<[^>]+>)", "");
// <text><text>extra<words><text><words><something>
同样,这将删除所有出现的&LT;标签&GT;
前的最后一个:
Regex.Replace(input, @"(<[^>]+>)(?=.*\1)", "");
// extra<text><words><something>
但这不会:
Regex.Replace(input, @"(?=\1.*\1)(<[^>]+>)", "");
// <text><text>extra<words><text><words><something>
因此,这让我思考和hellip;
So this got me thinking…
在.NET正EX pression引擎,并反向引用需要显示的在的组它的参考?还是有别的东西用这些模式是造成他们不工作怎么回事?
In the .NET regular expression engine, does a backreference need to appear after the group it's referencing? Or is there something else going on with these patterns that's causing them not to work?
你的问题让我思考过,所以我跑了几个测试用的使用RegexBuddy ,并让我吃惊的第二个正则表达式(小于?= \ 1 *)(小于[^&GT;] +&GT;)
你说,实际工作没有工作,和其他人完全工作就像你说的。然后我试图同前pression的 - 第二个 - 的在C#code,但它没有工作,像什么事你
Your question got me thinking too, so I ran a few tests with RegexBuddy and to my surprise the second regex (?<=\1.*)(<[^>]+>)
which you said didn't work actually worked and the others worked exactly like you said. I then tried the same expression - the second one - in C# code but it didn't work like what happened with you.
这让我感到困惑,然后我发现我使用RegexBuddy版本的历史可以追溯到2008年所以一定是如何在.NET发动机的工作原理有些变化,但是这揭示了一个事实,我虽然是理性的光芒,似乎之前的前pression匹配后剩下2008 lookbehinds进行了评价。我觉得,因为你需要匹配的东西,你看前身后才匹配的东西这种行为是有点接受与lookbehinds。
This got me confused, then I noticed that my RegexBuddy version dates back to 2008 so there must have been some change in how the .NET engine works, but this shed the light on a fact I though is rational, it seems that before 2008 lookbehinds were evaluated after the rest of the expression matched. I felt this behavior is a bit acceptable with lookbehinds since you need to match something before you look behind to match something before it.
不过,引擎这些天似乎评价lookarounds遇到他们,我能发现这一点通过以下EX pression它就像你的情况相反的情况时:
Nevertheless, the engines these days seem to evaluate lookarounds when it encounters them and I was able to find this out by using the following expression which is like the reverse situation of your case:
(?<=(\w))\1
正如你可以看到我抓住了正则表达式内的单词字符和参考它在它之外。我测试的字符串你好
这一点,它在第二→
字符如预期一致,这证明了后向被尝试的前pression剩下的比赛之前执行。
As you can see I captured a word character inside the regex and referenced it outside it. I tested this on the string hello
and it matched at the second l
character as expected and this proves that the lookbehind was executed before attempting to match the rest of the expression.
结论:是一个反向引用需要显示它引用的组后,或将有不匹配的语义
Conclusion: Yes, a back reference need to appear after the group it references or it will have no match semantics.