如何解析XML随着节点名称无效字符?节点、字符、名称、XML

2023-09-05 02:44:52 作者:贱然一笑■

所以我试图解析一些XML,创建这是不是我的控制之下。麻烦的是,他们已经不知怎么看起来像这样的节点:

So I'm trying to parse some XML, the creation of which is not under my control. The trouble is, they've somehow got nodes that look like this:

<ID_INTERNAL_FEAT_FOCUSED_EXPERTISE_(MORNINGSTAR) />
<ID_INTERNAL_FEAT_FOCUSED_EXPERTISE_(QUARTERSTAFF) />
<ID_INTERNAL_FEAT_FOCUSED_EXPERTISE_(SCYTHE) />
<ID_INTERNAL_FEAT_FOCUSED_EXPERTISE_(TRATNYR) />
<ID_INTERNAL_FEAT_FOCUSED_EXPERTISE_(TRIPLE-HEADED_FLAIL) />
<ID_INTERNAL_FEAT_FOCUSED_EXPERTISE_(WARAXE) />

Visual Studio和.NET两个觉得'('和')'字符,如上面使用,是完全无效的。不幸的是,我需要处理这些文件!有没有什么办法让XML阅读器类没有发飙了,在看到这些文字,或者动态地逃脱他们的东西?我可以做一些对整个文件pre处理的,但我想,如果他们出现在某些有效的方式在节点内的'('和')'字,所以我不希望只是删除他们都...

Visual Studio and .NET both feel that the '(' and ')' characters, as used above, are totally invalid. Unfortunately, I need to process these files! Is there any way to get the Xml Reader classes to not freak out at seeing these characters, or dynamically escape them or something? I could do some sort of pre-processing on the whole file, but I DO want the '(' and ')' characters if they appear inside the node in some valid way, so I don't want to just remove them all...

推荐答案

这根本是无效的。 pre-处理是你最好的赌注,也许正则表达式 - 是这样的:

That simply isn't valid. Pre-processing is your best-bet, perhaps with regex - something like:

string output = Regex.Replace(input, @"(<\w+)\((\w+)\)([ >/])", "$1$2$3");

编辑:更复杂一点,以取代 - 括号内:

a bit more complex to replace the "-" inside the brackets:

string output = Regex.Replace(input, @"(<\w+)\(([-\w]+)\)([ >/])",
    delegate(Match match) {
        return match.Groups[1].Value + match.Groups[2].Value.Replace('-', '_')
             + match.Groups[3].Value;
    });