正则表达式来捕获的可选组中输入的块的中间可选、组中、正则表达式

2023-09-03 04:36:00 作者:幸福圈

我被困在一个正则表达式问题看似很简单,但我不能让它的工作。

假设我有输入这样的:

 某些文本%interestingbit%,大量的随机文字很多,很多更多的%anotherinterestingbit%
某些文本%interestingbit%,大量的随机文字OPTIONAL_THING很多很多更多的%anotherinterestingbit%
某些文本%interestingbit%,大量的随机文字很多,很多更多的%anotherinterestingbit%
 

有输入,并在每个块许多重复的块我想捕捉一些事情,总是在那里(%interestingbit%和%anotherinterestingbit%),但也有一点可能会或可能不会出现在该文-between他们(OPTIONAL_THING),我想捕捉它,如果它的存在。

像这样的正则表达式匹配只有OPTIONAL_THING它块(和命名捕捉作品):

 %interestingbit%+((< OptionalCapture> OPTIONAL_THING))。?。?+%anotherinterestingbit%
 

所以看起来它是使整个集团可选的只是一个问题,对不对?这就是我想:

 %interestingbit%+((< OptionalCapture> OPTIONAL_THING))。?。?+%anotherinterestingbit%
 
正则表达式

不过,我发现,虽然这符合所有3个街区的命名捕获(OptionalCapture)为空,在所有这些!我如何得到这个工作?

请注意,有可能每个块,包括换行,这就是为什么我把内大量的文字。+?而不是更具体的东西。我使用.NET正EX pressions,与监管机构的测试。

解决方案

我的想法是按照类似的思路来尼科的想法。不过,我建议把第二。+?

:可选的组代替第一,如下内 %interestingbit%.+?(?:(?<optionalCapture>OPTIONAL_THING).+?)?%anotherinterestingbit%

这避免了不必要的回溯。如果第一+?是可选的组内和OPTIONAL_THING不会在搜索字符串存在,则正则表达式将不知道这一点,直到它到达所述字符串的末尾。那么需要原路返回,也许颇有几分,以配合%anotherinterestingbit%,这是你说的会一直存在。

此外,由于OPTIONAL_THING,当它的存在,总是会之前%anotherinterestingbit%,再经过这实际上是可选的,以及和更自然地适合选购的群组中的文字。

I'm stuck on a RegEx problem that's seemingly very simple and yet I can't get it working.

Suppose I have input like this:

Some text %interestingbit% lots of random text lots and lots more %anotherinterestingbit%
Some text %interestingbit% lots of random text OPTIONAL_THING lots and lots more %anotherinterestingbit%
Some text %interestingbit% lots of random text lots and lots more %anotherinterestingbit%

There are many repeating blocks in the input and in each block I want to capture some things that are always there (%interestingbit% and %anotherinterestingbit%), but there is also a bit of text that may or may not occur in-between them (OPTIONAL_THING) and I want to capture it if it's there.

A RegEx like this matches only blocks with OPTIONAL_THING in it (and the named capture works):

%interestingbit%.+?((?<OptionalCapture>OPTIONAL_THING)).+?%anotherinterestingbit%

So it seems like it's just a matter of making the whole group optional, right? That's what I tried:

%interestingbit%.+?((?<OptionalCapture>OPTIONAL_THING))?.+?%anotherinterestingbit%

But I find that although this matches all 3 blocks the named capture (OptionalCapture) is empty in all of them! How do I get this to work?

Note that there can be a lot of text within each block, including newlines, which is why I put in ".+?" rather than something more specific. I'm using .NET regular expressions, testing with The Regulator.

解决方案

My thoughts are along similar lines to Niko's idea. However, I would suggest placing the 2nd .+? inside the optional group instead of the first, as follows:

%interestingbit%.+?(?:(?<optionalCapture>OPTIONAL_THING).+?)?%anotherinterestingbit%

This avoids unnecessary backtracking. If the first .+? is inside the optional group and OPTIONAL_THING does not exist in the search string, the regex won't know this until it gets to the end of the string. It will then need to backtrack, perhaps quite a bit, to match %anotherinterestingbit%, which as you said will always exist.

Also, since OPTIONAL_THING, when it exists, will always be before %anotherinterestingbit%, then the text after it is effectively optional as well and fits more naturally into the optional group.