C#版本的HTML整洁的?整洁、版本、HTML

2023-09-03 23:40:23 作者:爱人你曾在我心上

我只是寻找一个非常简单的方法来清理一些HTML(可能带有嵌入的JavaScript code)。我想 2 的不同 HTML精简 .NET港口无一不是抛出异常......

对不起,通过清洁我的意思是缩进。的HTML是不是畸形,在所有。这是 XHTML 严格。

我的最后的得到的东西与 SGML 的工作,但是这是严重的code最可笑块以往任何时候都缩进一些HTML。

 私有静态字符串FormatHtml(字符串输入)
{
    VAR SGML =新SgmlReader {的DocType =HTML的InputStream =新StringReader(输入)};
    使用(VAR SW =新的StringWriter())
    使用(VAR XW =新的XmlTextWriter(SW){缩进= 2,格式= Formatting.Indented})
    {
        sgml.Read();
        而(!sgml.EOF)
            xw.WriteNode(SGML,真正的);
    }
    返回sw.ToString();
}
 

解决方案

在最新的C#包装HTML整洁由Mark比顿,这似乎相当做得更多了最新的比(2003年),你所引用的链接。另外值得说明的是,马克提供可执行文件引用为好,而不是从官方网站拉他们。这应该做的很好的组织和验证自己的 HTML 。

诀窍 TidyManaged(源) TidyManaged / libtidy建立

I am just looking for a really easy way to clean up some HTML (possibly with embedded JavaScript code). I tried two different HTML Tidy .NET ports and both are throwing exceptions...

HTML处理控件Aspose.Html 功能演示 将 URL 转换为 PDF

Sorry, by "clean" I mean "indent". The HTML is not malformed, at all. It's XHTML strict.

I finally got something working with SGML, but this is seriously the most ridiculous chunk of code ever to indent some HTML.

private static string FormatHtml(string input)
{
    var sgml = new SgmlReader {DocType = "HTML", InputStream = new StringReader(input)};
    using (var sw = new StringWriter())
    using (var xw = new XmlTextWriter(sw) { Indentation = 2, Formatting = Formatting.Indented })
    {
        sgml.Read();
        while (!sgml.EOF)
            xw.WriteNode(sgml, true);
    }
    return sw.ToString();
}

解决方案

The latest C# wrapper for HTML Tidy was done by Mark Beaton, which seems rather more up-to-date than the links you've referenced (2003). Also worth of note is that Mark provides executables for referencing as well, rather than pulling them from the official site. That should do the trick of nicely organising and validating your HTML.

TidyManaged (source) TidyManaged/libtidy builds