加载DOM和执行JavaScript,服务器端,与.net服务器端、加载、JavaScript、DOM

2023-09-02 23:59:10 作者:守護、內紛情

我想加载使用文档(以字符串形式),或URL的DOM,然后反对执行JavaScript函数(包括jQuery选择器)。这将是完全的服务器侧,在过程中,没有客户端/浏览器

I would like to load a DOM using a document (in string form) or a URL, and then Execute javascript functions (including jquery selectors) against it. This would be totally server side, in process, no client/browser.

基本上,我需要加载的DOM,然后使用jQuery选择器和文本()及键入VAL()函数来从中提取的字符串。我并不真的需要操作DOM。

Basically I need to load the dom and then use jquery selectors and text() & type val() functions to extract strings from it. I don't really need to manipulate the dom.

我已经看过净的JavaScript引擎,如侏罗纪和Jint,但也支持加载DOM,等于是不能做什么,我需要。

I have looked at .Net javascript engines such as Jurassic and Jint, but neither support loading a DOM, and so therefore can't do what I need.

我会愿意考虑非净解决方案(node.js的,红宝石等),如果他们存在,但倒很preFER .NET。

I would be willing to consider non .Net solutions (node.js, ruby, etc) if they exist, but would really prefer .Net.

修改 下面是一个很好的答案,但目前我正在尝试不同的路线,我试图端口envjs侏罗纪。如果我可以做到这一点的工作,我认为它会做我想做的,敬请期待......

edit The below is a good answer, but currently I'm trying a different route, I'm attempting to port envjs to jurassic. If I can get that working I think it will do what I want, stay tuned....

推荐答案

答案取决于你正在努力做的事情。如果你的目标基本上是一个完整的Web浏览器模拟,还是一个无头的浏览器,存在着清晰的.NET一些解决方案,但没有人(据我所知)。为了模拟一个浏览器,你需要一个JavaScript引擎和DOM。你已经确定了几个发动机;我发现侏罗纪既最强大,速度最快。谷歌Chrome的V8发动机也很受欢迎;在 Neosis Javascript.NET 项目提供了一个.NET包装它。这不是很纯粹的.NET,因为你有一个non-.NET的依赖,但它集成了干净,没有太多的麻烦使用。

The answer depends on what you are trying to do. If your goal is basically a complete web browser simulation, or a "headless browser," there are a number of solutions, but none of them (that I know of) exist cleanly in .NET. To mimic a browser, you need a javascript engine and a DOM. You've identified a few engines; I've found Jurassic to be both the most robust and fastest. The google chrome V8 engine is also very popular; the Neosis Javascript.NET project provides a .NET wrapper for it. It's not quite pure .NET since you have a non-.NET dependency, but it integrates cleanly and is not much trouble to use.

但是,正如你提到的,你还需要一个DOM。在纯C#有 Xbrowser中,但它看起来有点陈旧。还有像 jsdom 整个浏览器的DOM基于JavaScript的重新presentations了。你可以在侏罗纪可能运行jsdom,给你一个模拟的DOM没有浏览器,都在C#(虽然可能很慢!),这肯定会运行得很好的V8引擎。如果您在.NET领域之外得到的,还有其他更好的支持解决方案。 这个问题讨论的HtmlUnit。再有就是硒自动化实际的Web浏览器。

But as you've noted, you still need a DOM. In pure C# there is XBrowser, but it looks a bit stale. There are javascript-based representations of the entire browser DOM like jsdom, too. You could probably run jsdom in Jurassic, giving you a DOM simulation without a browser, all in C# (though likely very slowly!) It would definitely run just fine in V8. If you get outside the .NET realm, there are other better-supported solutions. This question discusses HtmlUnit. Then there's Selenium for automating actual web browsers.

另外,请记住,有很多围绕这些工具所做的工作是进行测试。虽然这并不意味着你不能使用他们别的东西,他们可能不执行或很好地集成任何种类的在线生产code稳定使用。如果您正试图从根本上做到实时的HTML操作,然后加入混合了大量的技术不是没有被广泛使用,除非进行测试可能是一个糟糕的选择。

Also, bear in mind that a lot of the work done around the these tools is for testing. While that doesn't mean you couldn't use them for something else, they may not perform or integrate well for any kind of stable use in inline production code. If you are trying to basically do real-time HTML manipulation, then a solution mixing a lot of technologies not that aren't widely used except for testing might be a poor choice.

如果您需要的是实际HTML操作,它并不真正需要使用JavaScript,但你想更多的丰富的JS提供这样的工具,那么我想看看为此而设计的C#的工具。例如 HTML敏捷性包,还是我自己的项目的 CsQuery 时,这是一个C#jQuery的端口

If your need is actually HTML manipulation, and it doesn't really need to use Javascript but you are thinking more about the wealth of such tools available in JS, then I would look at C# tools designed for this purpose. For example HTML Agility Pack, or my own project CsQuery, which is a C# jQuery port.

如果你基本上试图获取可用于客户端写了一些code,但在服务器上运行 - 例如,对于复杂的/加速网页抓取 - 我会用这些词搜索周围。例如this问题讨论这一点,有答案包括PhantomJS,一个无头的WebKit浏览器堆栈,以及一些我已经提到的测试工具。对于网络刮,我会想象你生活中可以没有这一切在.NET中存在,而且可能是唯一合理的答案呢。

If you are basically trying to take some code that was written for the client, but run it on a server -- e.g. for sophisticated/accelerated web scraping -- I'd search around using those terms. For example this question discusses this, with answers including PhantomJS, a headless webkit browser stack, as well as some of the testing tools I have already mentioned. For web scraping, I would imagine you can live without it all being in .NET, and that may be the only reasonable answer anyway.