我试图使用 System.Windows.Forms.HTMLDocument
在一个控制台应用程序。首先,这甚至可能吗?如果是这样,我怎么能在网上把它加载一个网页?我试图用 web浏览器
,但它告诉我:
未处理的异常: System.Threading.ThreadStateException: ActiveX控件885 6f961-340a-11D0-A96B-00c04fd705a2 不能被实例化,因为 当前次的读不处于 单线程单元。
似乎有一个严重缺乏的 HTMLDocument的
对象教程(或谷歌只是转向了无用的结果)。
刚发现 mshtml.HTMLDocument.createDocumentFromUrl
,但抛出我
未处理的异常: System.Runtime.InteropServices.COMException (0x80010105):吨他服务器引发 例外。 (从HRESULT异常: 0x80010105(RPC_E_SERVERF AULT))在 System.RuntimeType.ForwardCallToInvokeMember(字符串 成员名,BindingFla GS标志, 对象目标的Int32 [] aWrapperTypes, MessageData和放大器; MSGDATA)在 mshtml.HTMLDocumentClass.createDocumentFromUrl(字符串 bstrUrl,串BSTR办法) iget.Program.Main(字串[] args)
什么鬼?我要的是℃的列表;一>
在页面上标记。这是为什么这么难?
对于那些好奇的,这里是我想出了,感谢解决方案TrueWill:
使用系统;
使用System.Collections.Generic;
使用System.Linq的;
使用System.Text;
使用System.Net;
使用System.IO;
使用HtmlAgilityPack;
命名空间的iget
{
类节目
{
静态无效的主要(字串[] args)
{
Web客户端WC =新的Web客户端();
的HTMLDocument DOC =新的HTMLDocument();
doc.Load(wc.OpenRead(http://google.com));
的foreach(HtmlNode一个在doc.DocumentNode.SelectNodes(//一个[@href]))
{
Console.WriteLine(a.Attributes [HREF]值);
}
}
}
}
解决方案
作为替代方案,你可以使用免费的的HTML敏捷性包库。这可以解析HTML,将让你使用LINQ查询。我用的是旧版本在家项目和它的工作太棒了。
编辑:您可能还需要使用Web客户端或WebRequest的类下载网页。请参见的Web刮.NET 在我的博客文章。 (请注意,我没有尝试过这在一个控制台应用程序。)
I'm trying to use System.Windows.Forms.HTMLDocument
in a console application. First, is this even possible? If so, how can I load up a page from the web into it? I was trying to use WebBrowser
, but it's telling me:
Unhandled Exception: System.Threading.ThreadStateException: ActiveX control '885 6f961-340a-11d0-a96b-00c04fd705a2' cannot be instantiated because the current th read is not in a single-threaded apartment.
There seems to be a severe lack of tutorials on the HTMLDocument
object (or Google is just turning up useless results).
Just discovered mshtml.HTMLDocument.createDocumentFromUrl
, but that throws me
Unhandled Exception: System.Runtime.InteropServices.COMException (0x80010105): T he server threw an exception. (Exception from HRESULT: 0x80010105 (RPC_E_SERVERF AULT)) at System.RuntimeType.ForwardCallToInvokeMember(String memberName, BindingFla gs flags, Object target, Int32[] aWrapperTypes, MessageData& msgData) at mshtml.HTMLDocumentClass.createDocumentFromUrl(String bstrUrl, String bstr Options) at iget.Program.Main(String[] args)
What the heck? All I want is a list of <a>
tags on a page. Why is this so hard?
For those that are curious, here's the solution I came up with, thanks to TrueWill:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Net;
using System.IO;
using HtmlAgilityPack;
namespace iget
{
class Program
{
static void Main(string[] args)
{
WebClient wc = new WebClient();
HtmlDocument doc = new HtmlDocument();
doc.Load(wc.OpenRead("http://google.com"));
foreach(HtmlNode a in doc.DocumentNode.SelectNodes("//a[@href]"))
{
Console.WriteLine(a.Attributes["href"].Value);
}
}
}
}
解决方案
As an alternative, you could use the free Html Agility Pack library. That can parse HTML and will let you query it with LINQ. I used an older version for a project at home and it worked great.
EDIT: You may also want to use the WebClient or WebRequest classes to download the web page. See my blog post on Web scraping in .NET. (Note that I haven't tried this in a console app.)