如何改变的XmlReader的字符编码字符、XmlReader

2023-09-03 03:12:09 作者:擦不掉的是瘋狂

我有一个简单的XmlReader:

I have a simple XmlReader:

XmlReader r = XmlReader.Create(fileName);

while (r.Read())
{
    Console.WriteLine(r.Value);
}

现在的问题是,XML文件 ISO-8859-9 人物在里面,这使的XmlReader扔无效字符在给定的编码异常。我可以解决这个问题,添加< XML版本=1.0编码=ISO-8859-9> 在开始的时候行,但我想解决这个以另一种方式的情况下,我不能修改源文件。如何更改的XmlReader的编码?

The problem is, the Xml file has ISO-8859-9 characters in it, which makes XmlReader throw "Invalid character in the given encoding." exception. I can solve this problem with adding <?xml version="1.0" encoding="ISO-8859-9" ?> line in the beginning but I'd like to solve this in another way in case I can't modify the source file. How can I change the encoding of XmlReader?

推荐答案

要强制.NET读取为ISO-8859-9文件,只需要使用许多XmlReader.Create重载之一,例如:

To force .NET to read the file in as ISO-8859-9, just use one of the many XmlReader.Create overloads, e.g.

using(XmlReader r = XmlReader.Create(new StreamReader(fileName, Encoding.GetEncoding("ISO-8859-9")))) {
    while(r.Read()) {
        Console.WriteLine(r.Value);
    }
}

不过,这可能无法工作,因为,IIRC,W3C的XML标准说一些关于在XML声明行都被读取了,一个兼容的解析器,应立即切换到XML声明中指定的编码,无论什么编码它使用之前。你的情况,如果XML文件没有XML声明,该编码将是UTF-8,它仍然会失败。我可能会胡说在这里这么尝试一下,看看。 : - )

However, that may not work because, IIRC, the W3C XML standard says something about when the XML declaration line has been read, a compliant parser should immediately switch to the encoding specified in the XML declaration regardless of what encoding it was using before. In your case, if the XML file has no XML declaration, the encoding will be UTF-8 and it will still fail. I may be talking nonsense here so try it and see. :-)