正则表达式 - 捕获重复组正则表达式

2023-09-03 04:31:07 作者:简单也极端

好吧,我读过的教程和炒我的头太多要能看清楚了。

Alright, I've read the tutorials and scrambled my head too much to be able to see clearly now.

我试图捕捉参数和函数签名的类型信息。因此,考虑签名是这样的:

I'm trying to capture parameters and their type info from a function signature. So given a signature like this:

function(/*string*/a,b,c)

我想要得到的部分是这样的:

I want to get the parts like this:

type: string
param:a
param:b
param:c

这是好太多:

type: string
param:a
type: null (or whitespace)
param:b
type: null (or whitespace)
param:c

于是我想出了这个正则表达式这是做重复采集的常见错误(我已经明确捕捉开启):

So I came up with this regex which is doing the common mistake of repeating the capture (I've explicit capture turned on):

function\(((\/\*(?<type>[a-zA-Z]+)\*\/)?(?<param>[0-9a-zA-Z_$]+),?)*\)

但问题是,我不能纠正错误。 :(请帮帮忙!

Problem is, I can't correct the mistake. :(. Please help!

推荐答案

通常情况下,你需要两个步骤获得的所有数据。 首先,匹配/验证整个功能:

Generally, you'd need two steps to get all data. First, match/validate the whole function:

function\((?<parameters>((\/\*[a-zA-Z]+\*\/)?[0-9a-zA-Z_$]+,?)*)\)

请注意,现在你有一个参数组所有参数。你可以搭配一些样式再次得到所有的参数匹配,或在这种情况下,拆分对

Note that now you have a parameters group with all parameters. You can match some of the pattern again to get all matches of parameters, or in this case, split on ,.

如果你使用的.Net,不管怎样,你很幸运。净保留每个组的所有捕获的全部记录,这样你就可以使用集合:

If you're using .Net, by any chance, you're in luck. .Net keeps full record of all captures of each group, so you can use the collection:

match.Groups["param"].Captures

一些注意事项:

Some notes:

如果你想捕捉多个类型,您一定要空场比赛,这样你就可以很容易地结合比赛(虽然你可以进行排序,但1:1的捕捉整洁)。在这种情况下,你想要的可选组的在的捕获的组:(小于型&GT;(\ / \ * [A-ZA-Z] + \ * \ / )?) 您不必逃避斜线净模式 - / 有没有什么特别的意义有(C#/。网络不具有正则表达式的分隔符) If you do want to capture more than one type, you definitely want empty matches, so you can easily combine the matches (though you can sort, but a 1-to-1 capture is neater). In that case, you want the optional group inside your captured group: (?<type>(\/\*[a-zA-Z]+\*\/)?) You don't have to escape slashes in .Net patterns - / has no special meaning there (C#/.Net doesn't have regex delimiters).

下面是一个使用捕获的一个例子。此外,主要的一点是保持在关系类型参数:要捕捉空的类型,所以你不'T失去计数。 图案:

Here's an example of using the captures. Again, the main point is maintaining the relation between type and param: you want to capture empty types, so you don't lose count. Pattern:

function
\(
(?:
    (?:
        /\*(?<type>[a-zA-Z]+)\*/    # type within /* */
        |                           # or
        (?<type>)                   # capture an empty type.
    )
    (?<param>
        [0-9a-zA-Z_$]+
    )
    (?:,|(?=\s*\)))     # mandatory comma, unless before the last ')'
)*
\)

code:

Code:

Match match = Regex.Match(s, pattern, RegexOptions.IgnorePatternWhitespace);
CaptureCollection types = match.Groups["type"].Captures;
CaptureCollection parameters = match.Groups["param"].Captures;
for (int i = 0; i < parameters.Count; i++)
{
    string parameter = parameters[i].Value;
    string type = types[i].Value;
    if (String.IsNullOrEmpty(type))
        type = "NO TYPE";
    Console.WriteLine("Parameter: {0}, Type: {1}", parameter, type);
}