如何网页刮AJAX更新面板与C#?面板、网页、AJAX

2023-09-11 01:15:53 作者:九公里浅绿

我期待网页刮一个网站,有一个AJAX更新面板。我已经能够使用正确构造HTTP请求登录到该网站(HttpWebRequest的),我可以发送POST请求得到的UpdatePanel的内容,但它有占位符的文本,而不是实际的数据。

I am looking to web scrape a site that has an AJAX update panel. I have been able to login to the website using properly constructed HTTP requests (HttpWebRequest) and I am able to send a POST request to get the contents of the UpdatePanel, but it has placeholder text rather than actual data.

下面是code,我提出请求,以获得UpdatePanel的数据:

Here is the code where I make the request to get the UpdatePanel data:

// Already sent POST request with username and password to get session id, cookie etc
// Create POST data and convert it to a byte array. This includes viewstate, eventvalidation etc.
postData = String.Format("ctl00%24ScriptManager1=ctl00%24uxContentPlaceHolder%24Panel%7Cctl00%24uxContentPlaceHolder%24uxTimer&__EVENTTARGET=ctl00%24uxContentPlaceHolder%24uxTimer");
postData = hiddenFields.Aggregate(postData, (current, field) => current + ("&" + Uri.EscapeDataString(field.Key) + "=" + Uri.EscapeDataString(field.Value)));

byteArray = Encoding.UTF8.GetBytes(postData);

// Set the ContentType property of the WebRequest.
request.Headers.Add("X-MicrosoftAjax", "Delta=true");
request.ContentType = "application/x-www-form-urlencoded";
request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101 Safari/537.36";
request.Referer = "https://www.example.com/Registered/MyAcount.aspx?menu=My%20account";
request.Host = "www.example.com";
// Set the ContentLength property of the WebRequest.
request.ContentLength = byteArray.Length;
// Get the request stream.
dataStream = request.GetRequestStream();
// Write the data to the request stream.
dataStream.Write(byteArray, 0, byteArray.Length);
// Close the Stream object.
dataStream.Close();
// Get the response.

response = (HttpWebResponse)request.GetResponse();
_container.Add(response.Cookies);

using (var reader = new StreamReader(response.GetResponseStream()))
{
    // Read the content.
    responseFromServer = reader.ReadToEnd();
}

response.Close();

下面是响应的摘要版本,我得到:

Here is a summarised version of the response i get:

6259|updatePanel|ctl00_uxContentPlaceHolder_uxUpdatePnl|
<table cellpadding="0" cellspacing="0" border="0" width="100%" id="transtable">
    <tr>
        <td>
            <p>
                <div id="ctl00_uxContentPlaceHolder_UpdateProgress2" style="display:none;">

                    <div>
                        <img src="../Include/Images/loading.gif" alt="progressImg" />
                        <span id="ProgressMsg" style="font-size: small">Please, wait ... </span>
                    </div>

                </div>
            </p>
        </td>
    </tr>
    <tr>
        <td></td>
    </tr>
    <tr>
        <td></td>
    </tr>
</table>

下面是预期的结果:

2577|updatePanel|ctl00_uxContentPlaceHolder_uxUpdatePnl|
<table cellspacing="0" border="0" id="ctl00_uxContentPlaceHolder_uxMyCards" style="width:100%;border-collapse:collapse;">
    <tr>
        <th align="left" scope="col" style="font-size:12px;font-weight:bold;height:40px;">Card number</th>
        <th align="left" scope="col" style="font-size:12px;font-weight:bold;">Account holder</th>
        <th align="left" scope="col" style="font-size:12px;font-weight:bold;">Balance money</th>
        <th align="left" scope="col" style="font-size:12px;font-weight:bold;">Type</th>
    </tr>
    <tr>
        <td valign="top" style="font-size:12px;width:110px;">
            <a id="ctl00_uxContentPlaceHolder_uxMyCards_ctl02_uxManageAccount" href="ManageMyCard.aspx?menu=Manage my card&amp;cno=GgxQxwWICtY4hnlrIZfFzdqc8KMXxVp9" style="font-size:11px;">308425020219083</a>
        </td>
        <td valign="top" style="font-size:12px;width:130px;">
            My Name
        </td>
        <td align="left" valign="top" style="font-size:12px;width:100px;">
            $1.50
        </td>
        <td valign="top" style="font-size:12px;width:110px;"></td>
    </tr>
    <tr>
        <td valign="top" style="font-size:12px;width:110px;">
            <a id="ctl00_uxContentPlaceHolder_uxMyCards_ctl03_uxManageAccount" href="ManageMyCard.aspx?menu=Manage my card&amp;cno=hkbnmVzj%2ftrs%2fVLXK0rBQhB0enOO%7b4Uf" style="font-size:11px;">308425026724813</a>
        </td>
        <td valign="top" style="font-size:12px;width:130px;">
            My Name
        </td>
        <td align="left" valign="top" style="font-size:12px;width:100px;">
            $4.04
        </td>
        <td valign="top" style="font-size:12px;width:110px;"></td>
    </tr>
</table>

这看起来是请求的页面,并在实际加载数据之前发送响应。有没有什么办法让一个HttpWebRequest的等待,直到所有的数据都发送一个响应之前加载?

It looks the the page is requested and the response is sent before the data is actually loaded. Is there any way to make a HttpWebRequest wait until all data is loaded before sending a response?

我可以张贴实际的HTTP请求,如果这会有所帮助,但它看起来pretty的多少等同于一个在浏览器中进行。而之前人们跳,问,对于我在做什么没有API,也不以任何方式非法的:)

I can post the actual HTTP request if that would help, but it looks pretty much identical to the one made in the browser. And before people jump in and ask, there is no API for what I'm doing, nor is it in any way illegal :)

编辑:请问preFER坚持HttpWebRequest的这个,而不是像硒第三方工具

Would prefer to stick to HttpWebRequest for this, rather than a 3rd party tool like selenium

推荐答案

我工作了这一点,我将派遣__EVENTTARGET在HTTP请求的两倍。 UpdatePanel中加载所有的数据正确了。

I worked this out, I was sending __EVENTTARGET in the HTTP request twice. UpdatePanel loads all data correctly now.