什么是刮从Android应用程序的网页的最佳途径应用程序、途径、网页、Android

2023-09-07 10:42:10 作者:你知道我长短

我工作在Android应用程序从网页的HTML得到一些数据,并分析它在应用程序中使用。我试图用网络的收获,但似乎没有与Android完全兼容。应用程序应该得到的网页,解析它,得到所需要的数据,并在应用程序中使用它。有啥标准和推荐的方式凑在Android的html页面?

I am working on android application to get some data from html webpage and parse it to be used in the application. I tried to use Web-harvest, but it seems not fully compatible with android. The Application should get the webpage, parse it, get the needed data, and use it in the app. so whats the standard and recommended way to scrape html pages in android ?

推荐答案

我一直在快乐的使用TagSoup和XOM在Android解析网页。随着双方在类路径中,你会做这样的事情:

I've been happy with using TagSoup and XOM to parse webpages on Android. With both in your classpath, you'd do something like:

XMLReader tagsoup = XMLReaderFactory.createXMLReader("org.ccil.cowan.tagsoup.Parser");
Builder bob = new Builder(tagsoup);
Document html = bob.build("http://www.yahoo.com");
Nodes images = html.query("//img");

for (int index = 0; index < images.size(); index++) {
    Element image = (Element) images.get(index);
    String src = image.getAttribute("src").getValue();
    // do something with it...
}

如果你刮HTML有一个命名空间,你会做以下代替:

If the HTML you're scraping has a namespace, you'd do the below instead:

XPathContext context = new XPathContext("html", "http://www.w3.org/1999/xhtml");
Nodes images = html.query("//html:img", context);

链接:

XOM - > http://www.xom.nu

XOM --> http://www.xom.nu

TagSoup - > http://ccil.org/~cowan/XML/tagsoup/

TagSoup --> http://ccil.org/~cowan/XML/tagsoup/

当然,你必须赶上从网页构建XML文档可能出现的异常。

Of course, you'll have to catch possible exceptions on building the XML document from the Web page.