我有一个这样的HTML字符串:
I have an HTML string like this:
<p>First Sentence is this. Second sentence is this.</p>
我可以删除&LT; P&GT;
使用正则表达式从上面的字符串代码
功能
但是,如何删除&放大器;#160;
- EN从上面的字符串中的codeD字符的的WinForms
?
But, how to remove  
- encoded characters from the above string in winforms
?
我不希望&放大器;#160;
是present输出
I don't want  
to be present in the output.
您可以使用 XElement.Parse
来得到这样的节点值:
You can use XElement.Parse
to get the node value like this:
var htmlString = "<p>First Sentence is this. Second sentence is this.</p>";
var result = System.Xml.Linq.XElement.Parse(htmlString).Value;
如果不是所有的字符串包含有效的XML结构,或者可能没有任何标签的一切,你可以添加虚假标签是这样的:
If not all the strings contain valid XML structure, or may have no tags at all, you can add fake tags like this:
var htmlString = "<p>First Sentence is this. Second sentence is this.</p>";
var result = System.Xml.Linq.XElement.Parse("<root>" + htmlString + "</root>").Value;
结果:
您可能需要添加错误处理的问题,但是这显然比使用正则表达式这更好的。
You might want to add error handling for this, but this is clearly better than using a regex for this.
编辑:
在此情况下,仍无法正常工作,而且你想只处理实体,您可以利用 System.Web.HttpUtility.HtmlDe code
方法来替代与文字HTML实体:
In case this is still not working, and you want to just handle the entities, you can leverage System.Web.HttpUtility.HtmlDecode
method to replace HTML entities with literals:
var final_result = System.Web.HttpUtility.HtmlDecode(result);