XML解析的好奇心 - "大于"在属性好奇心、属性、XML、QUOT

2023-09-04 01:17:26 作者:傲似你祖宗

我有一些XML看起来是这样的:

I have some xml that looks like this:

<rootElement attribute=' > '/>

这是接受良好的XML由我试过它的解析器,并在RFC的相关部分也表明这是有效的,但我个人不相信这是直到我检查(有趣的是,这将是无效的,如果它是一个开放的三角形支架,但它是作为一个右大括号)。

This is accepted as well-formed xml by the parsers I've tried it on, and the relevant part of the RFC also suggests this is valid, although I personally wasn't convinced it was until I checked (interestingly enough this wouldn't be valid if it was a opening triangular brace, but it is as a closing brace).

我有用于pretty的打印XML一些code - 它应该只改变线路长度和新的生产线 - 它应该不会改变任何内容。但是,不管我如何努力来解析这个XML,它总是最终被实体取代:

I have some code that is used to "pretty print" xml - it should only change line-lengths and new lines - it shouldn't change any content. However, no matter how I try to parse this xml, it always ends up being entity replaced:

<rootElement attribute=' &gt; '/>

这并不完全出乎意料,任何XML解析器应该把两者看作是相同的,但我的目的,我不希望这种行为,因为这是code意味着改变一个XML文件的格式只,而不是其内容。

This isn't entirely unexpected, and any xml parser should treat the two as identical, but for my purposes I don't want this behaviour as this is code meant to change the formatting of an xml file only, not its contents.

如果我打开我的XML到XmlDocument没关系:

It doesn't matter if I load my xml into an XmlDocument:

var xml = "<rootElement attribute=' > '/>";
var doc = new XmlDocument();
doc.LoadXml(xml);
Console.WriteLine(doc.OuterXml);

或的XElement:

Or an XElement:

var xElement = XElement.Parse(xml);
xElement.Save(Console.Out);

或者通过读取器/写入器对通过它

Or pass it through a reader/writer pair:

using (var ms = new MemoryStream())
using (var streamWriter = new StreamWriter(ms))
{
    streamWriter.Write(xml);
    streamWriter.Flush();
    ms.Position = 0;

    using (var xmlReader = XmlReader.Create(ms))
    {
        xmlReader.Read();
        Console.WriteLine(xmlReader.ReadOuterXml());
    }
}

他们都替换&GT; 与实体&放大器; GT; ,事件虽然前者是可以接受的好-formed XML。我试着用不同的 XmlReaderSettings 播放,或者的XElement的 LoadOptions 等,但都无济于事。

They all replace the > entity with a &gt;, event though the former is acceptable well-formed xml. I've tried playing with the various XmlReaderSettings, or XElement's LoadOptions, etc, but all to no avail.

有谁知道有什么办法prevent呢?

Does anyone know of any way to prevent this?

这更多的是一种好奇心比实际的问题,但我很感兴趣,看看是否任何人有任何解决方案。

This is more of a curiosity than an actual issue, but I am interested to see if anyone has any solutions.

[修改,以澄清,在一些评论/答案的光]

我真的不知道这是正常现象。就我而言,也许我不希望使用内置的XML API在所有一(尽管无论我用的需求,了解XML的结构,以免行不恰当的地方休息的地方改变的语义文档。)

I really do realise that this behaviour is expected. In my case, maybe I don't want to use one of the built in xml APIs at all (although whatever I use needs to understand the structure of xml so as not to line break in inappropriate places where it changes the semantic meaning of the document.)

我真的只是想知道如果任何人知道的一种方式来改变这些解析器的行为(我希望你能不能不想通,如果有人知道,他们很可能被如此),如果任何人有任何其他的想法。

I'm really just interested to know if anyone knows of a way to change the behaviour in these parsers (I expect you can't but figured if anyone knew, they'd probably be on SO), or if anyone has any other ideas.

推荐答案

我的猜到的是,你会发现没有办法改变这一点 - 因为我强烈怀疑内部重新presentation加载后,无论是最初将是相同的&GT; &放大器; GT;

My guess is that you'll find there isn't a way to change this - as I strongly suspect that the internal representation after loading will be the same whether it's originally > or &gt;.