我有过一些计算机同步大文件。这些文件可以高达6GB的大小。同步将手动完成每隔几个星期。我不能把文件名考虑,因为他们可以随时更改。
I have to sync large files across some machines. The files can be up to 6GB in size. The sync will be done manually every few weeks. I cant take the filename into consideration because they can change anytime.
我的计划是在目标计算机上,并在源电脑上,比复制与校验和的所有文件,这是不是已经在目的地,目的地创建校验。 我的第一次尝试是这样的:
My plan is to create checksums on the destination PC and on the source PC and than copy all files with a checksum, which are not already in the destination, to the destination. My first attempt was something like this:
using System.IO;
using System.Security.Cryptography;
private static string GetChecksum(string file)
{
using (FileStream stream = File.OpenRead(file))
{
SHA256Managed sha = new SHA256Managed();
byte[] checksum = sha.ComputeHash(stream);
return BitConverter.ToString(checksum).Replace("-", String.Empty);
}
}
问题是运行时: - 与SHA256有1,6 GB的文件 - >20分钟 - 用MD5与1,6 GB的文件 - >6.15分钟
The Problem was the runtime: - with SHA256 with a 1,6 GB File -> 20 minutes - with MD5 with a 1,6 GB File -> 6.15 minutes
有没有更好的 - 快 - 的方式来获得校验(也许有更好的散列函数)
Is there a better - faster - way to get the checksum (maybe with a better hash function)?
这里的问题是, SHA256Managed
读4096个字节的时间(从的FileStream 和覆盖读(字节[],INT,INT)
来看看它从文件流了多少次读取),这实在太少了,缓冲磁盘IO。
The problem here is that SHA256Managed
reads 4096 bytes at a time (inherit from FileStream
and override Read(byte[], int, int)
to see how much it reads from the filestream), which is too small a buffer for disk IO.
要加快进度(2分钟散列2Gb的我机SHA256,对MD5 1分钟文件)包装的FileStream
在 BufferedStream
,并设置合理的大小的缓冲区的大小(我试过〜1 MB缓存):
To speed things up (2 minutes for hashing 2 Gb file on my machine with SHA256, 1 minute for MD5) wrap FileStream
in BufferedStream
and set reasonably-sized buffer size (I tried with ~1 Mb buffer):
// Not sure if BufferedStream should be wrapped in using block
using(var stream = new BufferedStream(File.OpenRead(filePath), 1200000))
{
// The rest remains the same
}