C#是有一个LINQ到HTML或其他一些好的净HTML操作的API?是有、或其他、操作、HTML

2023-09-02 01:46:49 作者:身野战场浪

我有一个需要消耗已在网页上为HTML表格暴露数据的C#WPF应用程序。

I have a C# WPF application that needs to consume data that is exposed on a webpage as a HTML table.

后从该网址获得灵感我试图使用LINQ to XML来解析HTML文档,但这只有在HTML文档非常好,形成工作(并没有任何意见或HTML实体里面)。我已成功地获得使用该技术的工作溶液,但它是很不理想。

After getting inspiration from this url I tried using Linq to Xml to parse the Html document, but this only works if the HTML document is extremely well formed (and doesn't have any comments or HTML entities inside it). I have managed to get a working solution using this technique, but it is far from ideal.

我是打算用来解析HTML的解决方案之后。我已经破解了解决方案之前,但他们脆弱。我解析/操纵文档的可靠的方法之后。我最想要的东西,使任务容易,因为它是从Javascript / JQuery的。

I am after a solution that is intended for parsing HTML. I have hacked "solutions" before, but they are brittle. I am after a robust way of parsing/manipulating the document. I'd ideally like something that makes the task as easy as it would be from Javascript/JQuery.

有谁知道一个良好的。NET库或工具来解析/操作HTML?

Does anyone know of a good .Net library or utility for parsing/manipulating HTML?

推荐答案

虽然它不是LINQ基础,我建议在研究的 HTML敏捷性包从$ 的C $ CPLEX。

Even though it's not LINQ based, I suggest researching the HTML Agility Pack from CodePlex.

注:HTML敏捷性包现在支持LINQ到对象(通过LINQ到XML一样的界面)的

从HTML敏捷包页:

这是一个灵活的HTML解析器,构建了一个读/写DOM和支持纯XPath或XSLT(你居然没有理解XPATH也不XSLT使用它,不用担心...)。这是一个.NET code库,使您解析出网的HTML文件。解析器很强的包容性与现实世界恶意的HTML。对象模型是非常相似,提出的System.Xml,但为HTML文档(或流)。

This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams).