显示使用非ASCII字符的HttpClient字符、ASCII、HttpClient

2023-09-08 09:10:32 作者:风流爱人

于是,我用这code拿到一个网站的整体HTML。但我不似乎得到非ASCII字符和我在一起。我得到的是有问号的钻石。结果像这样的字符:a,看起来就像这样:结果我怀疑它的字符集,因为,有什么事情就那么是什么?

  Log.e(HTML,H输入htmlen ..);            字符串URL =htt​​p://beep.tv2.dk;            HttpClient的客户端=新DefaultHttpClient();            client.getParams()的setParameter(CoreProtocolPNames.PROTOCOL_VERSION,                    HttpVersion.HTTP_1_1);            client.getParams()的setParameter(CoreProtocolPNames.HTTP_ELEMENT_CHARSET,UTF-8);            HTTPGET请求=新HTTPGET(URL);            HTT presponse响应= client.execute(请求);            标题H = HeaderValueFormatter            response.addHeader(头)            串的html =;            。InputStream的时间= response.getEntity()的getContent();            读者的BufferedReader =新的BufferedReader(新的InputStreamReader(中));            StringBuilder的海峡=新的StringBuilder();            串线= NULL;            而((行= reader.readLine())!= NULL)            {                str.append(线);            }            附寄();        // B = FALSE;        的HTML = str.toString(); 

解决方案 使用新的InputStreamReader(在UTF-8)构造将接收字符集请求头,说,接收字符集:ISO-8859-5,UNI code-1 -1; q = 0.8 确保页面正常打开在浏览器中。如果没有,那么它可能是一个服务器端的问题。如果以上都没有效果,用萤火虫(或类似工具)检查其他头文件

So, i am using this code to get the whole HTML of a website. But i dont seem to get non-ascii characters with me. all i get is diamonds with question mark. characters like this: å, appears like this: � I doubt its because of the charset, what could it then be?

Log.e("HTML", "henter htmlen..");
            String url = "http://beep.tv2.dk";
            HttpClient client = new DefaultHttpClient();
            client.getParams().setParameter(CoreProtocolPNames.PROTOCOL_VERSION, 
                    HttpVersion.HTTP_1_1);
            client.getParams().setParameter(CoreProtocolPNames.HTTP_ELEMENT_CHARSET, "UTF-8");
            HttpGet request = new HttpGet(url);
            HttpResponse response = client.execute(request);
            Header h = HeaderValueFormatter
            response.addHeader(header)
            String html = "";
            InputStream in = response.getEntity().getContent();
            BufferedReader reader = new BufferedReader(new InputStreamReader(in));
            StringBuilder str = new StringBuilder();
            String line = null;
            while((line = reader.readLine()) != null)
            {
                str.append(line);
            }
            in.close();
        //b = false;
        html = str.toString();
字符 0 的ASCII码为48,则字符 9 的ASCII码为多少

解决方案

use the new InputStreamReader(in, "UTF-8") constructor Set the Accept-Charset request header to, say, Accept-Charset: iso-8859-5, unicode-1-1;q=0.8 Make sure the page opens properly in a browser. If it does not, then it might be a server-side issue. If none of the above works, check other headers using firebug (or similar tool)