WebClient.DownloadString()不产生精确的HTML精确、WebClient、DownloadString、HTML

2023-09-06 23:59:23 作者:借我、你的一辈子

所以这里的交易。我创建一个蜘蛛机器人一个网站,扫描所有的产品页面和记录产品数据。我使用C#和Web客户端库下载HTML字符串。站点我爬行必须特制因为从WebClient.DownloadString()接收到的HTML比本人得到当我上的浏览器查看的,当它查看HTML源的HTML不同。这似乎是故意的,因为唯一的信息,我不能得到的是价格。

So here's the deal. I'm creating a spider bot for a website that scans all the product pages and records the product data. I'm using C# and the WebClient library to download the HTML string. The site I'm crawling must be specially made because the HTML that is received from WebClient.DownloadString() is different than the HTML that I get when I view the source of the HTML when visiting it on a browser. This seems intentional because the only info I can't get is the price.

有谁知道一个办法解决这个问题,或者任何人都可以解释发生了什么?谢谢你。

Does anyone know a workaround for this problem or can anyone explain what is happening? Thanks.

推荐答案

这可能是使用用户代理字符串来决定哪些内容发送。这个例子这里说明如何设置用户代理头。

It is probably using the the user agent string to decide what content to send. The example here shows how to set the user agent header.