如何JSOUP具有多个表的页面多个、页面、JSOUP

2023-09-07 11:15:56 作者:后悔当初

如何用刮多个表的网页,你知道吗?我连接到网页

这是一个表,但同一网页上有多个表

我也无法弄清楚如何读取表...

XML:

 < P>< A HREF =/?fantasy_news /功能/ ID = 49818><强> 300强总体幻想排名< / STRONG>< /一个与GT;&下; / p>< D​​IV CLASS =storyStats><表><&THEAD GT;&所述; TR><第i RANK< /第i<第i个中心及LT; /第i<第i个小组和LT; /第i<第i个POS&L​​T; /第i<第i GP< /第i<第i G< /第i百分位>一种< /第i<第i PTS< /第i<第i + /  - < /第i<第i PIM< /第i<第i PPP< /第i< / TR>< / THEAD><&TBODY GT;&所述; TR类=BG1>< TD> 1< / TD>< TD>< A HREF =/ NHL /团队/播放机/名称=史蒂芬+ stamkos?>史蒂芬&安培; NBSP; Stamkos< / A>< / TD>< TD>坦帕湾< / TD>&所述; TD>℃下; / TD>< TD ALIGN =右> 81 LT; / TD>&所述; TD对齐=右→50&下; / TD>< TD ALIGN =右> 51 LT; / TD>< TD ALIGN =右> 101 LT; / TD>< TD ALIGN =右> -2 LT; / TD>< TD ALIGN =右> 56 LT; / TD>< TD ALIGN =右> 38 LT; / TD>< / TR>迭代&所述;组件> trSIter = doc.select(表)            .iterator();    而(trSIter.hasNext()){        元素TREL = trSIter.next()子(0);        元件tdEls = trEl.children();        迭代&所述;组件> 。tdIter = tdEls.select(TR)迭代();        的System.out.println(>&所述1为卤素;&下;>&下;+ tdIter);        布尔FIRSTROW = TRUE;        而(tdIter.hasNext()){            元件TR =(元)tdIter.next();            而(tdIter.hasNext()){                INT tdCount = 1;                元件TDEL = tdIter.next();                //名称= tdEl.getElementsByClass(playertablePlayerName)得到(0)的.text()。                元素tdsEls = tdEl.select(TD);                的System.out.println(> 2>&下;>&下;+ tdsEls);                迭代&所述;组件> columnIt = tdsEls.iterator();                而(columnIt.hasNext()){                    要素列= columnIt.next();                    开关(tdCount ++){                    情况1:                        名称= column.select(A)第一()文本()。                        打破;                    案例2:                        STAT2 = Double.parseDouble(column.text());                        打破;                    案例3:                        STAT3 = Double.parseDouble(column.text());                        打破;                    情况4:                        STAT4 = Double.parseDouble(column.text());                        打破;                    情况5:                        STAT5 = Double.parseDouble(column.text());                        打破;                    情况6:                        STAT6 = Double.parseDouble(column.text());                        打破;                    案例7:                        stat7 = Double.parseDouble(column.text());                        打破;                    案例8:                        stat8 = Double.parseDouble(column.text());                        打破; 

解决方案 Java开发教程之如何用Jsoup实现爬虫技术

这应该让你开始。每个表都有一个空白记录,你将不得不考虑。您还可以找出你想和他们都在表,统计。你与 tds.get的统计数据()。让我知道它是如何为你工作。

 文档的DOC = Jsoup.connect(http://www.tsn.ca/fantasy_news/feature/?ID=49815)获得();    对于(单元表:doc.select(div.storyStats)中进行选择(表)){        对(件行:表格。选取(TR)){            元素TDS = row.select(TD);            如果(tds.size()大于0){                的System.out.println(tds.get(1)的.text()+:+ tds.get(5)的.text());            }        }    } 

Any idea on how to scrape a web page with multiple tables? I am connecting to the web page

This is one table but on the same web page there are multiple tables

I also cant figure out how to read the table...

XML:

    <p><a href="/fantasy_news/feature/?ID=49818"><strong>Top 300 Overall Fantasy Rankings</strong></a></p> 
<div class="storyStats"> 
<table> 
<thead> 
<tr> 
<th>RANK</th> 
<th>CENTRES</th> 
<th>TEAM</th> 
<th>POS</th> 
<th>GP</th> 
<th>G</th> 
<th>A</th> 
<th>PTS</th> 
<th>+/-</th> 
<th>PIM</th> 
<th>PPP</th> 
</tr> 
</thead> 
<tbody> 
<tr class="bg1"> 
<td>1.</td> 
<td><a href="/nhl/teams/players/?name=steven+stamkos">Steven&nbsp;Stamkos</a></td> 

<td>Tampa Bay</td> 
<td>C</td> 
<td align="right">81</td> 
<td align="right">50</td> 
<td align="right">51</td> 
<td align="right">101</td> 
<td align="right">-2</td> 
<td align="right">56</td> 
<td align="right">38</td> 
</tr> 


Iterator<Element> trSIter = doc.select("table")
            .iterator();
    while (trSIter.hasNext()) {
        Element trEl = trSIter.next().child(0);
        Elements tdEls = trEl.children();
        Iterator<Element> tdIter = tdEls.select("tr").iterator();
        System.out.println("><1><><"+tdIter);
        boolean firstRow = true;
        while (tdIter.hasNext()) {

            Element tr = (Element) tdIter.next();


            while (tdIter.hasNext()) {
                int tdCount = 1;
                Element tdEl = tdIter.next();
                //name = tdEl.getElementsByClass("playertablePlayerName").get(0).text();

                Elements tdsEls = tdEl.select("td");
                System.out.println("><2><><"+tdsEls);
                Iterator<Element> columnIt = tdsEls.iterator();

                while (columnIt.hasNext()) {

                    Element column = columnIt.next();
                    switch (tdCount++) {
                    case 1:
                        name =column.select("a").first().text();

                        break;
                    case 2:
                        stat2 = Double.parseDouble(column.text());
                        break;
                    case 3:
                        stat3 = Double.parseDouble(column.text());
                        break;
                    case 4:
                        stat4 = Double.parseDouble(column.text());
                        break;
                    case 5:
                        stat5 = Double.parseDouble(column.text());
                        break;
                    case 6:
                        stat6 = Double.parseDouble(column.text());
                        break;
                    case 7:
                        stat7 = Double.parseDouble(column.text());
                        break;
                    case 8:
                        stat8 = Double.parseDouble(column.text());
                        break;

解决方案

This should get you started. Each table has a blank record you will have to account for. You will also have to figure out which stats you want and where they are in the table. You get the stats with tds.get(). Let me know how it works for you.

    Document doc = Jsoup.connect("http://www.tsn.ca/fantasy_news/feature/?ID=49815").get();

    for (Element table : doc.select("div.storyStats").select("table")) {
        for (Element row : table.select("tr")) {
            Elements tds = row.select("td");
            if (tds.size() > 0) {
                System.out.println(tds.get(1).text() + ":" + tds.get(5).text());
            }
        }
    }

 
精彩推荐
图片推荐