C#扩展方法 - 字符串分割这也接受一个转义字符这也、字符串、字符、方法

2023-09-03 00:17:09 作者:Wry smile .苦笑

我想写的.NET String类的扩展方法。我想这是对分割方法的特殊varation - 一个接受一个转义字符,以prevent分割字符串时,转义字符分隔符之前使用。

I'd like to write an extension method for the .NET String class. I'd like it to be a special varation on the Split method - one that takes an escape character to prevent splitting the string when a escape character is used before the separator.

什么是写这个的最好方法是什么?我很好奇的最佳非正则表达式的方式来对待它。 大一些的签名一样......

What's the best way to write this? I'm curious about the best non-regex way to approach it. Something with a signature like...

public static string[] Split(this string input, string separator, char escapeCharacter)
{
   // ...
}

更新: 因为它提出了一个意见,排出的...

UPDATE: Because it came up in one the comments, the escaping...

在C#中逃脱非特殊字符时,你得到的错误 - CS1009:无法识别的转义序列

In C# when escaping non-special characters you get the error - CS1009: Unrecognized escape sequence.

在IE浏览器JScript中的转义字符都扔了出去。除非你尝试\ u和然后您会获得一个预期十六进制数字的错误。我测试了Firefox和它有相同的行为。

In IE JScript the escape characters are throw out. Unless you try \u and then you get a "Expected hexadecimal digit" error. I tested Firefox and it has the same behavior.

我想这个方法是pretty的宽容,并按照JavaScript的模式。如果你逃跑的非分离应该只是好心删除转义字符。

I'd like this method to be pretty forgiving and follow the JavaScript model. If you escape on a non-separator it should just "kindly" remove the escape character.

推荐答案

怎么样:

public static IEnumerable<string> Split(this string input, 
                                        string separator,
                                        char escapeCharacter)
{
    int startOfSegment = 0;
    int index = 0;
    while (index < input.Length)
    {
        index = input.IndexOf(separator, index);
        if (index > 0 && input[index-1] == escapeCharacter)
        {
            index += separator.Length;
            continue;
        }
        if (index == -1)
        {
            break;
        }
        yield return input.Substring(startOfSegment, index-startOfSegment);
        index += separator.Length;
        startOfSegment = index;
    }
    yield return input.Substring(startOfSegment);
}

这似乎工作(与一些快速测试字符串),但它不会删除转义字符 - 这将取决于您的具体情况,我怀疑

That seems to work (with a few quick test strings), but it doesn't remove the escape character - that will depend on your exact situation, I suspect.