我正在写为Android RSS阅读器应用程序,现在我需要知道什么是XML的编码之前,我开始分析它(窗口-1251或UTF-8)。这是XML声明头即说明<?XML版本=1.0编码=UTF-8>
。我怎样才能解析之前得到这个头?我用SAX解析器的android.sax实施,并通过编码为字符串参数的InputStreamReader。 我发现了一个相关的问题:SAX解析器不能识别Windows-1255编码 - 但解决的办法有向CP-1251转换为UTF-8,这太麻烦了,要求上的资源。我认为必须有更好的解决办法,因为我只需要知道头编码值<?XML版本=1.0编码=UTF-8>
。但我不能设法得到XML这个头,解析器&LT启动; RSS>
标记。我应该怎么做呢?
I'm writing a rss reader app for android and now i need to know what is the encoding of xml before i start parsing it (windows-1251 or utf-8). This is described in xml declaration header i.e. <?xml version="1.0" encoding="UTF-8"?>
. How can i get this header before parsing? I use android.sax implementation of sax parser and pass encoding as string parameter to InputStreamReader.
I found a related question:
SAX Parser doesn't recognize windows-1255 encoding - but the solution there is to convert cp-1251 to utf-8, which is too cumbersome and demanding on resources. I think there must be better solution, as i only need to know encoding value from header <?xml version="1.0" encoding="UTF-8"?>
. But i can't manage to get this header from xml, parser starts from <rss>
tag. How should i get it?
好了,问题是pretty明显:)这里是code,它的工作的基础上,湿眶客的评论:
Well, the question was pretty obvious :) Here is the code that worked, based on Squonk's comment:
byte[] data = new byte[50];
try{
bs.mark(60);
bs.read(data, 0, data.length);
String value = new String(data,"UTF-8");
if(value.toLowerCase().contains("utf-8"))
return "UTF-8";
else if(value.contains("1251"))
return "windows-1251";
} catch (IOException e) {
Log.d("debug", "Exception: " + e);
return "XML not found";
}
然后,只需重新BS(的BufferedInputStream),并在任何需要的字符集与它的工作。
Then just reset bs (BufferedInputStream) and work with it in any needed charset.