通过抓取动态的HtmlUnit网页网页、动态、HtmlUnit

2023-09-10 13:43:21 作者:泪、最后一次

我使用的HtmlUnit从动态的网页,它使用无限滚动动态地获取数据,就像Facebook的新闻源抓取数据。我用下面的句子来模拟向下滚动事件:

I am crawling data using HtmlUnit from a dynamic webpage, which uses infinite scrolling to fetch data dynamically, just like facebook's newsfeed. I used the following sentence to simulate the scrolling down event:

webclient.setJavaScriptEnabled(true);
webclient.setAjaxController(new NicelyResynchronizingAjaxController());
ScriptResult sr=myHtmlPage.executeJavaScript("window.scrollBy(0,600)");
webclient.waitForBackgroundJavaScript(10000);
myHtmlPage=(HtmlPage)sr.getNewPage();

但似乎myHtmlPage保持不变的previous之一,也就是说,新的数据是不是在myHtmlPage追加,结果我只能抓取网页上的头几个数据。感谢您的帮助!

But it seems myHtmlPage stays the same with the previous one, i.e., new data is not appended in myHtmlPage, as a result I can only crawl the first few data on the web page. Thanks for your help!

推荐答案

我有类似的问题,其中的内容进行后装在页面滚动。我解决了它使用:

I had similiar problem where the content were post-loaded during page scrolling. I solved it using:

webClient.getCurrentWindow()setInnerHeight(Integer.MAX_VALUE的);