如何使用HTML敏捷性包拿到IMG / src目录或/的HREF?如何使用、目录、敏捷性、IMG

2023-09-03 00:36:54 作者:枝头花几许

我要使用的HTML敏捷包来分析从一个HTML页面的图像和HREF链接,但我不知道很多关于XML或XPath.Though有抬头望着他,许多网站的帮助文档,我只是'牛逼解决problem.In另外,我使用的VisualStudio 2005.And C#我不能讲一口流利的英语,所以,我会给我真诚地感谢一个可以写一些有用的codeS。

I want to use the HTML agility pack to parse image and href links from a HTML page,but I just don't know much about XML or XPath.Though having looking up help documents in many web sites,I just can't solve the problem.In addition,I use C# in VisualStudio 2005.And I just can't speak English fluently,so,I will give my sincere thanks to the one can write some helpful codes.

推荐答案

在第一个例子上主页做一些非常相似,但考虑到:

The first example on the home page does something very similar, but consider:

 HtmlDocument doc = new HtmlDocument();
 doc.Load("file.htm"); // would need doc.LoadHtml(htmlSource) if it is not a file
 foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
 {
    string href = link["href"].Value;
    // store href somewhere
 }

所以,你可以想像,对于IMG @ SRC,只需更换每个 A IMG 的href 的src 。 你甚至可以简化为:

So you can imagine that for img@src, just replace each a with img, and href with src. You might even be able to simplify to:

 foreach(HtmlNode node in doc.DocumentElement
              .SelectNodes("//a/@href | //img/@src")
 {
    list.Add(node.Value);
 }

有关相对URL处理,看看乌里类。

For relative url handling, look at the Uri class.