我如何从.NET文本条HTML?文本、NET、HTML

2023-09-03 00:11:58 作者:拥之则安伴之则暖

我有一个asp.net的网页,有一个TinyMCE的框。用户可以格式化文本和发送HTML以被存储在数据库中。

I have an asp.net web page that has a TinyMCE box. Users can format text and send the HTML to be stored in a database.

在服务器上,我想借此剥离从文本的HTML这样我就可以搜索存储在只有一个全文索引的列中的文本。

On the server, I would like to take strip the html from the text so I can store only the text in a Full Text indexed column for searching.

这是一件轻而易举剥夺客户端上的HTML使用jQuery的文本()函数,但我真的宁愿做这在服务器上。是否有我可以使用这个任何现有的事业吗?

It's a breeze to strip the html on the client using jQuery's text() function, but I would really rather do this on the server. Are there any existing utilities that I can use for this?

请参阅我的答案。

推荐答案

我下载了 HtmlAgilityPack 并创造了这个功能

I downloaded the HtmlAgilityPack and created this function:

string StripHtml(string html)
{
    // create whitespace between html elements, so that words do not run together
    html = html.Replace(">","> ");

    // parse html
    var doc = new HtmlAgilityPack.HtmlDocument();	
    doc.LoadHtml(html);

    // strip html decoded text from html
    string text = HttpUtility.HtmlDecode(doc.DocumentNode.InnerText);	

    // replace all whitespace with a single space and remove leading and trailing whitespace
    return Regex.Replace(text, @"\s+", " ").Trim();
}