目前我使用的.Net WebBrowser.Document.Images()
做到这一点。它要求 Webrowser
加载文档。它的混乱,占用资源。
Currently I use .Net WebBrowser.Document.Images()
to do this. It requires the Webrowser
to load the document. It's messy and takes up resources.
据this XPath的问题比一个正则表达式,在这更好的。
According to this question XPath is better than a regex at this.
任何人都知道如何做到这一点在C#?
Anyone know how to do this in C#?
如果你输入的字符串是有效的XHTML,你可以把是为XML,将其加载到一个XmlDocument,并做XPath的魔法:)但它并非总是如此。
If your input string is valid XHTML you can treat is as xml, load it into an xmldocument, and do XPath magic :) But it's not always the case.
否则,你可以试试这个功能,这将返回来自HtmlSource所有图片链接:
Otherwise you can try this function, that will return all image links from HtmlSource :
public List<Uri> FetchLinksFromSource(string htmlSource)
{
List<Uri> links = new List<Uri>();
string regexImgSrc = @"<img[^>]*?srcs*=s*[""']?([^'"" >]+?)[ '""][^>]*?>";
MatchCollection matchesImgSrc = Regex.Matches(htmlSource, regexImgSrc, RegexOptions.IgnoreCase | RegexOptions.Singleline);
foreach (Match m in matchesImgSrc)
{
string href = m.Groups[1].Value;
links.Add(new Uri(href));
}
return links;
}
和您可以使用它是这样的:
And you can use it like this :
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://www.example.com");
request.Credentials = System.Net.CredentialCache.DefaultCredentials;
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode == HttpStatusCode.OK)
{
using( StreamReader sr = new StreamReader( response.GetResponseStream() )
{
List<Uri> links = FetchLinksFromSource( sr.ReadToEnd() );
}
}