为什么BCL GZipStream(与StreamReader的)不可靠的检测数据错误与CRC32?不可靠、错误、数据、GZipStream

2023-09-02 10:29:10 作者:顾你周全;

在有一天,我遇到了这个问题GZipStream没有检测到损坏的数据(甚至CRC32通过)?其中(这很可能是一个重复,我百感交集关于这个问题,我是谁也加入了CRC32的标题之一,但现在回想起来,感觉出来的地方用的后余数)。探索这个问题有点我自己,我认为这个问题是更大的比其他的问题开始描绘。

The the other day I ran into the question GZipStream doesn't detect corrupt data (even CRC32 passes)? (Of which this might very well be a "duplicate", I have mixed feelings on the subject. I was also the one who added the CRC32 to the title, but in retrospect that feels out of place with the remainder of the post). After exploring the problem a bit on my own, I think that the issue is far greater than the other question initially portrays.

我扩大在其他的问题,并提出LINQPad下的测试code可运行,并试图更好地展示 CRC32(循环冗余检查)问题,如果它确实存在。 (由于code是只有一个的稍微修改的基于原可能的是测试设置/方法是有缺陷的或有另一个奇怪夸克/ PEBCAK两者。当)

I expanded upon the other question and made the test code runnable under LINQPad and attempt to better showcase the CRC32 (Cyclic Redundancy Check) issue, if it does indeed exist. (Since the code is only a slight modification based upon the original it is possible that the test setup/methodology is flawed or there is another strange quirk/PEBCAK both.)

结果是奇怪,因为损坏数据的不一定的引起(任意!)异常即可得到提升。注意的只是有时的做了CRC32校验似乎实际上是工作。导致上述指标外的范围/坏头的损坏字节/坏页脚可以忽略不计,因为我们可以假设这些被杀害DECOM pression的之前应用于CRC32校验(这是的完全可以理解的的,即使IndexOutOfRangeException应当可能被一个InvalidDataException包裹),所以,

The results are odd because the corrupt data is not always causing an (any!) Exception to be raised. Note that only sometimes does the CRC32 check seem to actually be "working". The corrupt bytes that cause the index-out-of-range/bad header/bad footer can be ignored because we can assume these are killing the decompression prior to the CRC32 check (which is perfectly understandable, even if the IndexOutOfRangeException should likely be wrapped by a InvalidDataException) so,

为什么CRC32校验显著不太可靠的比它应该是?(为什么如此有无效数据(也不例外)下面?)

Why is the CRC32 check significantly less reliable than it ought to be? (Why is it the case that there is "Invalid data (No Exception)" below?)

由于的GZip页脚包含的两个的的CRC32而 pssed uncom $ P $的数据的是长度似乎错误检测率应该是显著高于 的 - 也就是说,I 不会的期望的单个失败情况下的下方,一个数未被发现损坏流的要少得多。 (当然,这是很好的检测损坏的蒸汽尽快。但最终的安全防护装置的校验似乎是彻头彻尾的忽略的的情况下)

Since the GZip footer contains both the CRC32 and length of the uncompressed data is seems that the error detection rate should be "significantly higher" -- that is, I would not expect a single failing case below, much less a number of undetected corrupted streams. (Of course it's nice to detect a corrupt steam ASAP: but the final safe-guard checksum seems to be downright ignored in cases.)

格式为: CorruptByteIndex + FailedDetections:信息

0+0: System.IO.InvalidDataException:The magic number in GZip header is not correct. Make sure you are passing in a GZip stream.
1+0: System.IO.InvalidDataException:The magic number in GZip header is not correct. Make sure you are passing in a GZip stream.
2+0: System.IO.InvalidDataException:The compression mode specified in GZip header is unknown.
3+0: Good data (No Exception)
4+0: Good data (No Exception)
5+0: Good data (No Exception)
6+0: Good data (No Exception)
7+0: Good data (No Exception)
8+0: Good data (No Exception)
9+0: Good data (No Exception)
10+0: System.IO.InvalidDataException:Unknown block type. Stream might be corrupted.
11+1: Invalid data (No Exception)
12+1: System.IO.InvalidDataException:Found invalid data while decoding.
13+1: System.IO.InvalidDataException:Found invalid data while decoding.
14+1: System.IO.InvalidDataException:Found invalid data while decoding.
15+1: System.IO.InvalidDataException:Found invalid data while decoding.
16+1: System.IO.InvalidDataException:Found invalid data while decoding.
17+2: Invalid data (No Exception)
18+2: System.IO.InvalidDataException:Found invalid data while decoding.
19+2: System.IndexOutOfRangeException:Index was outside the bounds of the array.
20+2: System.IndexOutOfRangeException:Index was outside the bounds of the array.
21+3: Invalid data (No Exception)
22+3: System.IndexOutOfRangeException:Index was outside the bounds of the array.
23+3: System.IndexOutOfRangeException:Index was outside the bounds of the array.
24+4: Invalid data (No Exception)
25+4: System.IndexOutOfRangeException:Index was outside the bounds of the array.
26+4: System.IndexOutOfRangeException:Index was outside the bounds of the array.
27+4: System.IndexOutOfRangeException:Index was outside the bounds of the array.
28+4: System.IndexOutOfRangeException:Index was outside the bounds of the array.
29+5: Invalid data (No Exception)
30+5: System.IndexOutOfRangeException:Index was outside the bounds of the array.
31+6: Invalid data (No Exception)
32+7: Invalid data (No Exception)
33+7: System.IndexOutOfRangeException:Index was outside the bounds of the array.
34+7: System.IndexOutOfRangeException:Index was outside the bounds of the array.
35+7: System.IndexOutOfRangeException:Index was outside the bounds of the array.
36+8: Invalid data (No Exception)
37+8: System.IndexOutOfRangeException:Index was outside the bounds of the array.
38+8: System.IndexOutOfRangeException:Index was outside the bounds of the array.
39+9: Invalid data (No Exception)
40+9: System.IndexOutOfRangeException:Index was outside the bounds of the array.
41+9: System.IndexOutOfRangeException:Index was outside the bounds of the array.
42+10: Invalid data (No Exception)
43+10: System.IO.InvalidDataException:Found invalid data while decoding.
44+10: System.IndexOutOfRangeException:Index was outside the bounds of the array.
45+10: System.IO.InvalidDataException:Found invalid data while decoding.
46+11: Invalid data (No Exception)
47+11: System.IndexOutOfRangeException:Index was outside the bounds of the array.
48+11: System.IndexOutOfRangeException:Index was outside the bounds of the array.
49+11: System.IndexOutOfRangeException:Index was outside the bounds of the array.
50+12: Invalid data (No Exception)
51+12: System.IndexOutOfRangeException:Index was outside the bounds of the array.
52+12: System.IndexOutOfRangeException:Index was outside the bounds of the array.
53+13: Invalid data (No Exception)
54+13: System.IndexOutOfRangeException:Index was outside the bounds of the array.
55+14: Invalid data (No Exception)
56+14: System.IndexOutOfRangeException:Index was outside the bounds of the array.
57+15: Invalid data (No Exception)
58+15: System.IndexOutOfRangeException:Index was outside the bounds of the array.
59+15: System.IndexOutOfRangeException:Index was outside the bounds of the array.
60+16: Invalid data (No Exception)
61+17: Invalid data (No Exception)
62+18: Invalid data (No Exception)
63+19: Invalid data (No Exception)
64+19: System.IndexOutOfRangeException:Index was outside the bounds of the array.
65+19: System.IndexOutOfRangeException:Index was outside the bounds of the array.
66+19: System.IndexOutOfRangeException:Index was outside the bounds of the array.
67+19: System.IndexOutOfRangeException:Index was outside the bounds of the array.
68+19: System.IndexOutOfRangeException:Index was outside the bounds of the array.
69+19: System.IndexOutOfRangeException:Index was outside the bounds of the array.
70+19: System.IO.InvalidDataException:Found invalid data while decoding.
71+19: System.IndexOutOfRangeException:Index was outside the bounds of the array.
72+19: System.IndexOutOfRangeException:Index was outside the bounds of the array.
73+19: System.IndexOutOfRangeException:Index was outside the bounds of the array.
74+19: System.IndexOutOfRangeException:Index was outside the bounds of the array.
75+19: System.IO.InvalidDataException:Found invalid data while decoding.
76+19: System.IndexOutOfRangeException:Index was outside the bounds of the array.
77+19: System.IndexOutOfRangeException:Index was outside the bounds of the array.
78+19: System.IndexOutOfRangeException:Index was outside the bounds of the array.
79+19: System.IndexOutOfRangeException:Index was outside the bounds of the array.
80+19: System.IndexOutOfRangeException:Index was outside the bounds of the array.
81+19: System.IO.InvalidDataException:Found invalid data while decoding.
82+19: System.IndexOutOfRangeException:Index was outside the bounds of the array.
83+20: Invalid data (No Exception)
84+21: Invalid data (No Exception)
85+22: Invalid data (No Exception)
86+22: System.IndexOutOfRangeException:Index was outside the bounds of the array.
87+23: Invalid data (No Exception)
88+24: Invalid data (No Exception)
89+25: Invalid data (No Exception)
90+25: System.IndexOutOfRangeException:Index was outside the bounds of the array.
91+26: Invalid data (No Exception)
92+26: System.IO.InvalidDataException:Found invalid data while decoding.
93+26: System.IndexOutOfRangeException:Index was outside the bounds of the array.
94+27: Invalid data (No Exception)
95+27: System.IndexOutOfRangeException:Index was outside the bounds of the array.
96+27: System.IndexOutOfRangeException:Index was outside the bounds of the array.
97+28: Invalid data (No Exception)
98+28: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
99+28: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
100+29: Invalid data (No Exception)
101+30: Invalid data (No Exception)
102+31: Invalid data (No Exception)
103+32: Invalid data (No Exception)
104+32: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
105+33: Invalid data (No Exception)
106+34: Invalid data (No Exception)
107+35: Invalid data (No Exception)
108+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
109+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
110+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
111+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
112+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
113+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
114+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
115+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
116+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
117+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
118+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
119+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
120+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
121+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
122+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
123+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
124+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
125+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
126+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
127+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
128+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
129+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
130+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
131+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
132+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
133+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
134+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
135+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
136+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
137+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
138+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
139+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
140+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
141+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
142+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
143+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
144+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
145+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
146+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
147+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
148+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
149+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
150+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
151+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
152+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
153+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
154+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
155+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
156+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
157+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
158+36: Invalid data (No Exception)
159+36: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
160+36: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
161+37: Invalid data (No Exception)
162+38: Invalid data (No Exception)
163+39: Invalid data (No Exception)
164+40: Invalid data (No Exception)
165+41: Invalid data (No Exception)
166+41: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
167+41: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
168+41: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
169+41: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
170+41: System.IO.InvalidDataException:The stream size in GZip footer does not match the real stream size.
171+41: System.IO.InvalidDataException:The stream size in GZip footer does not match the real stream size.
172+41: System.IO.InvalidDataException:The stream size in GZip footer does not match the real stream size.
173+41: System.IO.InvalidDataException:The stream size in GZip footer does not match the real stream size.

下面是测试这是在LINQPad copy'n'paste可运行(对.NET 3.5和4,使用为C#语句模式):

Here is the test which is copy'n'paste runnable in LINQPad (for .NET 3.5 and 4, use "as C# statements" mode):

   string sample = "This is a compression test of microsoft .net gzip compression method and decompression methods";
   var encoding = new ASCIIEncoding();
   var data = encoding.GetBytes(sample);
   string sampleOut = null;
   byte[] cmpData;

   // Compress 
   using (var cmpStream = new MemoryStream())
   {
      using (var hgs = new System.IO.Compression.GZipStream(cmpStream, System.IO.Compression.CompressionMode.Compress))
      {
         hgs.Write(data, 0, data.Length);
      }
      cmpData = cmpStream.ToArray();
   }

   int corruptBytesNotDetected = 0;

   // corrupt data byte by byte
   for (var byteToCorrupt = 0; byteToCorrupt < cmpData.Length; byteToCorrupt++)
   {
      var corruptData = new List<byte>(cmpData).ToArray();
      // corrupt the data
      corruptData[byteToCorrupt]++;

      using (var decomStream = new MemoryStream(corruptData))
      {
         using (var hgs = new System.IO.Compression.GZipStream(decomStream, System.IO.Compression.CompressionMode.Decompress))
         {
            using (var reader = new StreamReader(hgs))
            {
               string message;
               try
               {
                  sampleOut = reader.ReadToEnd();

                  // if we get here, the corrupt data was not detected by GZipStream
                  // ... okay so long as the correct data is extracted

                  if (!sample.SequenceEqual(sampleOut)) {
                    corruptBytesNotDetected++;
                    message = "Invalid data (No Exception)";
                  } else {
                    message = "Good data (No Exception)";
                  }
               }
               catch(Exception ex)
               {
                    message = (ex.GetType() + ":" + ex.Message);
               }
               string.Format("{0}+{1}: {2}",
                    byteToCorrupt, corruptBytesNotDetected, message).Dump();
            }
         }
      }

   }

下面是COM pressed中的 .NET 3.5 的(GZipStream是出了名的差COM pressing小型有效载荷数据,但它是一个不会解决的问题,因为流在技术上仍有效):

Here is the compressed data in .NET 3.5 (GZipStream is notoriously bad at "compressing" small payloads but it is a "Won't Fix" issue because the stream is still technically valid):

1F 8B 08 00 00 00 00 00 04 00 ED BD 07 60 1C 49 96 25 26 2F
6D CA 7B 7F 4A F5 4A D7 E0 74 A1 08 80 60 13 24 D8 90 40 10
EC C1 88 CD E6 92 EC 1D 69 47 23 29 AB 2A 81 CA 65 56 65 5D
66 16 40 CC ED 9D BC F7 DE 7B EF BD F7 DE 7B EF BD F7 BA 3B
9D 4E 27 F7 DF FF 3F 5C 66 64 01 6C F6 CE 4A DA C9 9E 21 80
AA C8 1F 3F 7E 7C 1F 3F 22 DE CC 8B 26 A5 FF 65 E9 B4 5A AC
EA BC 69 8A 6A 99 B6 79 D3 A6 D5 79 BA 28 A6 75 D5 54 E7 6D
3A 5E E6 6D 7A F1 83 62 15 B4 5B E4 ED BC 9A A5 D9 72 96 CE
F2 FE 17 CD FF 03 5C 51 5E 27 5E 00 00 00

(而且,只是傻笑,在.NET 4中产生一个稍大/不同的COM pressed流。)

(And, just for giggles, in .NET 4 it generates a slightly larger/different compressed stream.)

1F 8B 08 00 00 00 00 00 04 00 EC BD 07 60 1C 49 96 25 26 2F
6D CA 7B 7F 4A F5 4A D7 E0 74 A1 08 80 60 13 24 D8 90 40 10
EC C1 88 CD E6 92 EC 1D 69 47 23 29 AB 2A 81 CA 65 56 65 5D
66 16 40 CC ED 9D BC F7 DE 7B EF BD F7 DE 7B EF BD F7 BA 3B
9D 4E 27 F7 DF FF 3F 5C 66 64 01 6C F6 CE 4A DA C9 9E 21 80
AA C8 1F 3F 7E 7C 1F 3F 22 DE CC 8B 26 A5 FF 65 E9 B4 5A AC
EA BC 69 8A 6A 99 B6 79 D3 A6 D5 79 BA 28 A6 75 D5 54 E7 6D
3A 5E E6 6D 7A F1 83 62 15 B4 5B E4 ED BC 9A A5 D9 72 96 CE
F2 FE 17 CD FF 13 00 00 FF FF 5C 51 5E 27 5E 00 00 00

补充说明:

试验可的微妙的缺陷的这种情况。当GZipStream未能检测腐败(无异常),然后从StreamReader的读出的数据是(一个空字符串):在这种情况下,为什么 ReadToEnd()的没有的引发异常(IOException异常或其他)?

The test may be subtly flawed in this case. When the GZipStream "fails to detect corruption" (no Exception) then the data read from the StreamReader is "" (an empty string): In that case, why does ReadToEnd() not raise an Exception (IOException or otherwise)?

是不是这样的没有的GZipStream而是StreamReader的是古怪这里还是这仍然是一个问题,与GZipStream(不抛出异常)?有一些正确的方法来可靠地处理这个用例? (考虑一下,当从当前位置输入流中的真的是的空。)

Is it thus not GZipStream but rather the StreamReader that is "quirky" here or is this still an issue with the GZipStream (for not throwing an Exception)? Is there some correct way to handle this use-case reliably? (Consider when the input stream from the current position really is empty.)

推荐答案

.NET [4和previous]用户不应该使用Microsoft提供的GZipStream或DeflateStream类在任何情况下,除非微软完全替代他们一些作品。使用DotNetZip库,而不是。

Preface

.NET [4 and previous] users should not use the Microsoft-provided GZipStream or DeflateStream classes under any circumstances, unless Microsoft replaces them completely with something that works. Use the DotNetZip library instead.

在.NET Framework 4.5和更高版本有固定的COM pression问题,GZipStream和DeflateStream使用zlib的那些版本。我不知道下面提到的CRC问题已得到修复。

The .NET Framework 4.5 and later have fixed the compression problem, and GZipStream and DeflateStream use zlib in those versions. I do not know if the CRC problem referenced below has been fixed.

在CRC错误不仅不固定,但微软已经决定,他们won't修复吧!

The CRC bug is not only not fixed, but Microsoft has decided that they won't fix it!

&NBSP;

我在响应为什么我的C#gzip的产生比提琴手或PHP的大文件?表明,这种行为不反映了一个正确实施GZIP腐败的检测。在所有测试的情况下,将通过适当的实施来检测错误。 (这种反应还指出,为什么七个案件产生良好的数据。)

My response in Why does my C# gzip produce a larger file than Fiddler or PHP? shows that this behavior does not reflect a proper implementation of gzip corruption detection. In all of the cases tested, the error would be detected by a proper implementation. (That response also notes why seven of the cases result in good data.)

另一个问题是:它如何COM pression的方法,从一个94字节的字符串产生174个字节的输出呢?尤其是看到如何字符串为com pressible - GZIP它减少到84个字节,即使有一个18字节的头和尾的开销。你能提供的是174个字节的十六进制转储?

Another question is: how does this "compression" method produce 174 bytes of output from a 94-byte string? Especially seeing as how the string is compressible -- gzip reduces it to 84-bytes, even with the overhead of an 18-byte header and trailer. Can you provide a hex dump of that 174 bytes?