为什么.NET使用字符串的UTF16编码,但使用UTF8作为默认保存文件?字符串、保存文件、NET

2023-09-02 11:58:02 作者:遗失的过去

从这里

从本质上讲,字符串使用UTF-16字符编码表

Essentially, string uses the UTF-16 character encoding form

但节能VS 的StreamWriter 时:

此构造函数使用UTF-8编码使用StreamWriter没有   字节顺序标记(BOM),

This constructor creates a StreamWriter with UTF-8 encoding without a Byte-Order Mark (BOM),

我已经看到这个样本:

和它看起来像 UTF8 是当某些字符串UTF-16 是在其他一些字符串较小的小。

And it looks like utf8 is smaller for some strings while utf-16 is smaller in some other strings.

那么,为什么.NET使用 UTF16 为字符串,而 UTF8 保存文件的默认编码? So Why .net uses utf16 as default encoding for string while utf8 for saving file ?

感谢你。

P.S。香港专业教育学院已经阅读著名的文章 的

推荐答案

如果的你快乐忽略代理对(或者,您的应用程序需要的基本多文种平面之外字符的可能性), UTF-16有一些不错的性能,基本都来自于不断的被每code单元的大小。你知道有多少空间分配为code单位给定的号码,你可以直接索引到该空间访问第n code单元。这些都不是的一般的一个文本文件中的重要环节 - 虽然他们肯定是,如果你想使用随机访问 - 但规模普遍的是的重要的文本文件。

If you're happy ignoring surrogate pairs (or equivalently, the possibility of your app needing characters outside the basic multilingual plane), UTF-16 has some nice properties, basically due to the size per code unit being constant. You know how much space to allocate for a given number of code units, and you can index directly into that space to access the nth code unit. Those aren't usually important aspects for a text file - although they certainly are if you want to use random access - but size generally is important for text files.

考虑原始类型字符。如果我们使用UTF-8作为内存重新presentation,想应付的所有的统一code字,有多大应该有多大呢?这可能是多达6个字节......这意味着我们总是必须分配6个字节。在这一点上,我们还不如用UTF-32!

Consider the primitive type char. If we use UTF-8 as the in-memory representation and want to cope with all Unicode characters, how big should that be? It could be up to 6 bytes... which means we'd always have to allocate 6 bytes. At that point we might as well use UTF-32!

当然,我们可以使用UTF-32作为字符重presentation,但UTF-8在字符串重presentation,将作为我们走了。

Of course, we could use UTF-32 as the char representation, but UTF-8 in the string representation, converting as we go.

如果UTF-16落在当然下来就是每统一code字符code单元的数量是可变的......但我的经验相对较少的应用程序的实际上的处理非-BMP字符正确反正。

Where UTF-16 falls down of course is that the number of code units per Unicode character is variable... but in my experience relatively few apps actually handle non-BMP characters correctly anyway.

(此外,我相信Windows使用UTF-16统一code数据,它是有道理的,.NET跟风互操作的原因。这只是虽然推一万步的问题。)

(Additionally, I believe Windows uses UTF-16 for Unicode data, and it makes sense for .NET to follow suit for interop reasons. That just pushes the question on one step though.)

 
精彩推荐
图片推荐