您所在的位置：首页 > 最新热文 > 计算机探索

为什么.NET使用字符串的UTF16编码，但使用UTF8作为默认保存文件？字符串、保存文件、NET

2023-09-02 11:58:02 作者：遗失的过去

从这里

从本质上讲，字符串使用UTF-16字符编码表

Essentially, string uses the UTF-16 character encoding form

但节能VS 的StreamWriter 时：

此构造函数使用UTF-8编码使用StreamWriter没有字节顺序标记（BOM），

This constructor creates a StreamWriter with UTF-8 encoding without a Byte-Order Mark (BOM),

我已经看到这个样本：

和它看起来像 UTF8 是当某些字符串UTF-16 是在其他一些字符串较小的小。

And it looks like utf8 is smaller for some strings while utf-16 is smaller in some other strings.

那么，为什么.NET使用 UTF16 为字符串，而 UTF8 保存文件的默认编码？ So Why .net uses utf16 as default encoding for string while utf8 for saving file ?

感谢你。

P.S。香港专业教育学院已经阅读著名的文章的

推荐答案

如果的你快乐忽略代理对（或者，您的应用程序需要的基本多文种平面之外字符的可能性）， UTF-16有一些不错的性能，基本都来自于不断的被每code单元的大小。你知道有多少空间分配为code单位给定的号码，你可以直接索引到该空间访问第n code单元。这些都不是的一般的一个文本文件中的重要环节 - 虽然他们肯定是，如果你想使用随机访问 - 但规模普遍的是的重要的文本文件。

If you're happy ignoring surrogate pairs (or equivalently, the possibility of your app needing characters outside the basic multilingual plane), UTF-16 has some nice properties, basically due to the size per code unit being constant. You know how much space to allocate for a given number of code units, and you can index directly into that space to access the nth code unit. Those aren't usually important aspects for a text file - although they certainly are if you want to use random access - but size generally is important for text files.

考虑原始类型字符。如果我们使用UTF-8作为内存重新presentation，想应付的所有的统一code字，有多大应该有多大呢？这可能是多达6个字节......这意味着我们总是必须分配6个字节。在这一点上，我们还不如用UTF-32！

Consider the primitive type char. If we use UTF-8 as the in-memory representation and want to cope with all Unicode characters, how big should that be? It could be up to 6 bytes... which means we'd always have to allocate 6 bytes. At that point we might as well use UTF-32!

当然，我们可以使用UTF-32作为字符重presentation，但UTF-8在字符串重presentation，将作为我们走了。

Of course, we could use UTF-32 as the char representation, but UTF-8 in the string representation, converting as we go.

如果UTF-16落在当然下来就是每统一code字符code单元的数量是可变的......但我的经验相对较少的应用程序的实际上的处理非-BMP字符正确反正。

Where UTF-16 falls down of course is that the number of code units per Unicode character is variable... but in my experience relatively few apps actually handle non-BMP characters correctly anyway.

（此外，我相信Windows使用UTF-16统一code数据，它是有道理的，.NET跟风互操作的原因。这只是虽然推一万步的问题。）

(Additionally, I believe Windows uses UTF-16 for Unicode data, and it makes sense for .NET to follow suit for interop reasons. That just pushes the question on one step though.)

上一篇：System.Web.Extensions程序组件无法解析组件、程序、System、Web

下一篇：C＃中的字符串比较忽略空格，回车或换行符空格、字符串、换行符

相关推荐

精彩图集

精彩推荐

图片推荐