于是,我用这code拿到一个网站的整体HTML。但我不似乎得到非ASCII字符和我在一起。我得到的是有问号的钻石。结果像这样的字符:a,看起来就像这样:结果我怀疑它的字符集,因为,有什么事情就那么是什么?
Log.e(HTML,H输入htmlen ..); 字符串URL =http://beep.tv2.dk; HttpClient的客户端=新DefaultHttpClient(); client.getParams()的setParameter(CoreProtocolPNames.PROTOCOL_VERSION, HttpVersion.HTTP_1_1); client.getParams()的setParameter(CoreProtocolPNames.HTTP_ELEMENT_CHARSET,UTF-8); HTTPGET请求=新HTTPGET(URL); HTT presponse响应= client.execute(请求); 标题H = HeaderValueFormatter response.addHeader(头) 串的html =; 。InputStream的时间= response.getEntity()的getContent(); 读者的BufferedReader =新的BufferedReader(新的InputStreamReader(中)); StringBuilder的海峡=新的StringBuilder(); 串线= NULL; 而((行= reader.readLine())!= NULL) { str.append(线); } 附寄(); // B = FALSE; 的HTML = str.toString();
解决方案 使用新的InputStreamReader(在UTF-8)
构造将接收字符集
请求头,说,接收字符集:ISO-8859-5,UNI code-1 -1; q = 0.8
确保页面正常打开在浏览器中。如果没有,那么它可能是一个服务器端的问题。如果以上都没有效果,用萤火虫(或类似工具)检查其他头文件
So, i am using this code to get the whole HTML of a website. But i dont seem to get non-ascii characters with me. all i get is diamonds with question mark. characters like this: å, appears like this: � I doubt its because of the charset, what could it then be?
Log.e("HTML", "henter htmlen..");
String url = "http://beep.tv2.dk";
HttpClient client = new DefaultHttpClient();
client.getParams().setParameter(CoreProtocolPNames.PROTOCOL_VERSION,
HttpVersion.HTTP_1_1);
client.getParams().setParameter(CoreProtocolPNames.HTTP_ELEMENT_CHARSET, "UTF-8");
HttpGet request = new HttpGet(url);
HttpResponse response = client.execute(request);
Header h = HeaderValueFormatter
response.addHeader(header)
String html = "";
InputStream in = response.getEntity().getContent();
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
StringBuilder str = new StringBuilder();
String line = null;
while((line = reader.readLine()) != null)
{
str.append(line);
}
in.close();
//b = false;
html = str.toString();
解决方案 use the
new InputStreamReader(in, "UTF-8")
constructor
Set the Accept-Charset
request header to, say, Accept-Charset: iso-8859-5, unicode-1-1;q=0.8
Make sure the page opens properly in a browser. If it does not, then it might be a server-side issue.
If none of the above works, check other headers using firebug (or similar tool)