从控制台C＃.NET中使用HTMLDocument的？控制台、NET、HTMLDocument

2023-09-03 04:22:01 作者：是我抚不平的伤

我试图使用 System.Windows.Forms.HTMLDocument 在一个控制台应用程序。首先，这甚至可能吗？如果是这样，我怎么能在网上把它加载一个网页？我试图用 web浏览器，但它告诉我：

未处理的异常： System.Threading.ThreadStateException： ActiveX控件885 6f961-340a-11D0-A96B-00c04fd705a2 不能被实例化，因为当前次的读不处于单线程单元。

似乎有一个严重缺乏的 HTMLDocument的对象教程（或谷歌只是转向了无用的结果）。

刚发现 mshtml.HTMLDocument.createDocumentFromUrl ，但抛出我

未处理的异常： System.Runtime.InteropServices.COMException （0x80010105）：吨他服务器引发例外。（从HRESULT异常： 0x80010105（RPC_E_SERVERF AULT））在 System.RuntimeType.ForwardCallToInvokeMember（字符串成员名，BindingFla GS标志，对象目标的Int32 [] aWrapperTypes， MessageData和放大器; MSGDATA）在 mshtml.HTMLDocumentClass.createDocumentFromUrl（字符串 bstrUrl，串BSTR办法） iget.Program.Main（字串[] args）

什么鬼？我要的是℃的列表;一＆GT; 在页面上标记。这是为什么这么难？

对于那些好奇的，这里是我想出了，感谢解决方案TrueWill:

 使用系统;
使用System.Collections.Generic;
使用System.Linq的;
使用System.Text;
使用System.Net;
使用System.IO;
使用HtmlAgilityPack;

命名空间的iget
{
    类节目
    {
        静态无效的主要（字串[] args）
        {
            Web客户端WC =新的Web客户端（）;
            的HTMLDocument DOC =新的HTMLDocument（）;
            doc.Load（wc.OpenRead（http://google.com））;
            的foreach（HtmlNode一个在doc.DocumentNode.SelectNodes（//一个[@href]））
            {
                Console.WriteLine（a.Attributes [HREF]值）;
            }
        }
    }
}

解决方案

作为替代方案，你可以使用免费的的HTML敏捷性包库。这可以解析HTML，将让你使用LINQ查询。我用的是旧版本在家项目和它的工作太棒了。

编辑：您可能还需要使用Web客户端或WebRequest的类下载网页。请参见的Web刮.NET 在我的博客文章。（请注意，我没有尝试过这在一个控制台应用程序。）

I'm trying to use System.Windows.Forms.HTMLDocument in a console application. First, is this even possible? If so, how can I load up a page from the web into it? I was trying to use WebBrowser, but it's telling me:

Unhandled Exception: System.Threading.ThreadStateException: ActiveX control '885 6f961-340a-11d0-a96b-00c04fd705a2' cannot be instantiated because the current th read is not in a single-threaded apartment.

There seems to be a severe lack of tutorials on the HTMLDocument object (or Google is just turning up useless results).

Just discovered mshtml.HTMLDocument.createDocumentFromUrl, but that throws me

Unhandled Exception: System.Runtime.InteropServices.COMException (0x80010105): T he server threw an exception. (Exception from HRESULT: 0x80010105 (RPC_E_SERVERF AULT)) at System.RuntimeType.ForwardCallToInvokeMember(String memberName, BindingFla gs flags, Object target, Int32[] aWrapperTypes, MessageData& msgData) at mshtml.HTMLDocumentClass.createDocumentFromUrl(String bstrUrl, String bstr Options) at iget.Program.Main(String[] args)

What the heck? All I want is a list of <a> tags on a page. Why is this so hard?

For those that are curious, here's the solution I came up with, thanks to TrueWill:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Net;
using System.IO;
using HtmlAgilityPack;

namespace iget
{
    class Program
    {
        static void Main(string[] args)
        {
            WebClient wc = new WebClient();
            HtmlDocument doc = new HtmlDocument();
            doc.Load(wc.OpenRead("http://google.com"));
            foreach(HtmlNode a in doc.DocumentNode.SelectNodes("//a[@href]"))
            {
                Console.WriteLine(a.Attributes["href"].Value);
            }
        }
    }
}

解决方案

As an alternative, you could use the free Html Agility Pack library. That can parse HTML and will let you query it with LINQ. I used an older version for a project at home and it worked great.

EDIT: You may also want to use the WebClient or WebRequest classes to download the web page. See my blog post on Web scraping in .NET. (Note that I haven't tried this in a console app.)

上一篇：有没有更好的方式来implment等于与很多领域的对象？对象、领域、方式、implment

下一篇：不能获得价值从组合框组合、价值

相关推荐

精彩图集

精彩推荐

图片推荐