C#HtmlEn code - ISO-8859-1实体名称VS号码实体、号码、名称、code

2023-09-04 00:14:44 作者:S丶ky丿神之队

根据以下表的ISO-8859-1标准,似乎有一个实体的名称和与每个保留的HTML字符相关联的实体数目

According to the following table for the ISO-8859-1 standard, there seems to be an entity name and an entity number associated with each reserved HTML character.

因此​​,例如,为字符电子

So for example, for the character é :

实体名称:&放大器; eacute;

实体编号:&放大器;#233;

同样,对于字符>

实体名称:&放大器; GT;

实体编号:&放大器;#62;

对于给定的字符串,则 HttpUtility.HtmlEn code 返回一个HTML连接codeD字符串,但我无法弄清楚它是如何工作的。以下是我的意思是:

For a given string, the HttpUtility.HtmlEncode returns an HTML encoded String, but I can't figure out how it works. Here is what I mean :

Console.WriteLine(HtmlEncode("é>"));
//Outputs é>

这似乎是使用实体号为电子字符,但为&GT的实体名称; 字符

正因为此,在HtmlEn code方法真的与ISO-8859-1标准的工作?如果是这样,有一个原因,它有时会使用实体名称和其他时间的实体号?更重要的是,我可以强制它给我的实体名称可靠?

So does the HtmlEncode method really work with the ISO-8859-1 standard? If it does, is there a reason why it sometimes uses the entity name and other times the entity number? More importantly, can I force it to give me the entity name reliably?

编辑: 感谢您的答案家伙。我不能去$ C C字符串$之前,我虽然执行搜索。没有进入太多的细节,所述文本存储在一个SharePoint列表和搜索是由SharePoint本身完成(使用CAML查询)。所以基本上,我不能。

EDIT : Thanks for the answers guys. I cannot decode the string before I perform the search though. Without getting into too many details, the text is stored in a SharePoint List and the "search" is done by SharePoint itself (using a CAML query). So basically, I can't.

我试图想办法给实体数字转换成的名字,有没有在.NET中的函数,它是什么?或者有什么其他的想法?

I'm trying to think of a way to convert the entity numbers into names, is there a function in .NET that does that? Or any other idea?

推荐答案

这就是该方法已得到落实。对于一些已知的字符,它使用相应的实体和其他一切它使用对应的十六进制值,没有太多你可以做修改此行为。摘自 System.Net.WebUtility.HtmlEn $ C $了C 的实施(如看到的反射):

That's how the method has been implemented. For some known characters it uses the corresponding entity and for everything else it uses the corresponding hex value and there is not much you could do to modify this behavior. Excerpt from the implementation of System.Net.WebUtility.HtmlEncode (as seen with reflector):

...
if (ch <= '>')
{
    switch (ch)
    {
        case '&':
        {
            output.Write("&amp;");
            continue;
        }
        case '\'':
        {
            output.Write("&#39;");
            continue;
        }
        case '"':
        {
            output.Write("&quot;");
            continue;
        }
        case '<':
        {
            output.Write("&lt;");
            continue;
        }
        case '>':
        {
            output.Write("&gt;");
            continue;
        }
    }
    output.Write(ch);
    continue;
}
if ((ch >= '\x00a0') && (ch < 'Ā'))
{
    output.Write("&#");
    output.Write(((int) ch).ToString(NumberFormatInfo.InvariantInfo));
    output.Write(';');
}
...

这是说,你不应该关心,因为这方法总是会产生有效的,安全的,并正确连接codeD的HTML。

This being said you shouldn't care as this method will always produce valid, safe and correctly encoded HTML.