什么" NET框架默认使用&QUOT的UTF-16编码标准;意味着?框架、标准、NET、QUOT

2023-09-03 09:35:29 作者:痞子爱人

我的学习指南(适用于70-536考试)中的文字和编码的章,这是IO章之后说,这两次。

My study guide (for 70-536 exam) says this twice in the text and encoding chapter, which is right after the IO chapter.

所有的例子迄今使用的FileStream和StreamWriter简单的文件访问的事情。

All the examples so far are to do with simple file access using FileStream and StreamWriter.

据aslo说的东西,如如果你不知道,当你创建一个文件要使用的编码,没有指定和.NET将使用UTF16和使用流构造函数重载指定不同的编码。

It aslo says stuff like "If you don't know what encoding to use when you create a file, don't specify one and .NET will use UTF16" and "Specify different encodings using Stream constructor overloads".

没关系的事实,实际重载在StreamWriter类但嘿,等等。

Never mind the fact that the actual overloads are on the StreamWriter class but hey, whatever.

我在看的StreamWriter现在的反射,我敢肯定,我可以看到,默认为actaully UTF8NoBOM。

I am looking at StreamWriter right now in reflector and I am certain I can see that the default is actaully UTF8NoBOM.

不过,这一切都不是列在勘误表。这是一本老书(cheked两种版本的errat),所以如果它是错的我还以为有人捡到就可以了.....

But none of this is listed in the errata. It's an old book (cheked the errat of both editions) so if it was wrong I would have thought someone had picked up on it.....

让我觉得,也许我不理解这一点。

Makes me think maybe I didn't understand it.

所以.....任何想法它是什么说什么?其他一些地方,有一个默认的?

So.....any ideas what it is talking about? Some other place where there is a default?

这只是完全搞糊涂了。

推荐答案

UTF-16是一个恼人的词,因为它有两层含义这是很容易混淆。

"UTF-16" is an annoying term, as it has two meanings which are easily confused.

第一个意义是一系列16位codepoints。大多数这些直接为同一个号码的UNI code字对应;基本多文种平面(U + 10000以上)以外的字符存储为两个16位的codepoints,对每一个Surrogates.

The first meaning is a series of 16-bit codepoints. Most of these correspond directly to the Unicode character of the same number; characters outside the Basic Multilingual Plane (U+10000 upwards) are stored as two 16-bit codepoints, each one of the Surrogates.

很多语言在这个意义上使用UTF-16内部存储的目的,包括为本地字符串类型。这是类似的短语通常源。NET(或Java)使用UTF-16作为其默认编码。 .NET正在访问这样的UTF-16串16位的元素在时间(即,在执行层面,作为一个UINT16)。

Many languages use UTF-16 in this sense for internal storage purposes, including as a native string type. This is the usual source of phrases like ".NET (or Java) uses UTF-16 as its default encoding". .NET is accessing the elements of such a UTF-16 string 16 bits at a time (ie, at the implementation level, as a uint16).

接下来要考虑的就是这样一个UTF-16字符串的编码为线性字节,以存储在一个文件或网络流。与往常一样,当你存储更大的数字为字节,有两种可能的编码:小端还是大端。所以,你可以使用UTF-16LE,UTF-16的小端编码成字节,或者UTF-16BE,大端编码。

The next thing to consider is the encoding of such a UTF-16 string into linear bytes, for storage in a file or network stream. As always when you store larger numbers into bytes, there are two possible encodings: little-endian or big-endian. So you can use "UTF-16LE", the little-endian encoding of UTF-16 into bytes, or "UTF-16BE", the big-endian encoding.

(UTF-16LE是较为常用的,只是为了更多的混乱添加到火焰时,Windows给它深深地误导和不明确的编码名称为统一code,在现实中,它几乎总是更好使用UTF-8的文件存储和网络流比任何UTF-16LE的/ BE)

("UTF-16LE" is the more commonly used. Just to add more confusion to the flames, Windows gives it the deeply misleading and ambiguous encoding name "Unicode". In reality it is almost always better to use UTF-8 for file storage and network streams than either of UTF-16LE/BE.)

但是,如果你不知道一堆字节是否包含UTF-16LE和UTF-16BE,你可以用看的第一个code点的伎俩去解决它。这code点,字节顺序标记(BOM),当读取一张倒过来,所以你不能错一个编码为另一种是唯一有效的。

But if you don't know whether a bunch of bytes contains "UTF-16LE" or "UTF-16BE", you can use the trick of looking at the first code point to work it out. This code point, the Byte Order Mark (BOM), is only valid when read one way around, so you can't mistake one encoding for the other.

这个做法,不关心你有什么字节顺序,但使用的BOM,以表示它,通常是在编码名称所指的...UTF-16。

This approach, of not caring what byte order you have but using a BOM to signal it, is usually referred to under the encoding name... "UTF-16".

所以,当有人说UTF-16,你看不出来它们是否指的是序列的短INT统一code code点,或者未指定的顺序,将取消字节序列code到之一。

So, when someone says "UTF-16", you can't tell whether they mean a sequence of short-int Unicode code points, or a sequence of bytes in unspecified order that will decode to one.

(UTF-32有同样的问题。)

("UTF-32" has the same problem.)

如果你不知道,当你创建一个文件要使用的编码,没有指定和.NET将使用UTF16

If you don't know what encoding to use when you create a file, don't specify one and .NET will use UTF16

如果这是实际的直接引用它是一个谎言。构建一个StreamWriter的不编码参数明确规定给您UTF-8。

If that's the actual direct quote it is a lie. Constructing a StreamWriter without an encoding argument is explicitly specified to give you UTF-8.