获取当前文件长度/ FileInfo.Length缓存和陈旧的信息陈旧、缓存、长度、文件

2023-09-04 00:38:02 作者:閉嘴

我跟踪文件和它们的文件长度的文件夹的,这些文件中的至少一个仍然得到写入。

I am keeping track of a folder of files and their file lengths, at least one of these files is still getting written to.

我要保持每个文件的长度,我用于其他用途不断更新的记录。

I have to keep a continuously updated record of each file length which I use for other purposes.

更新方法被称为每15秒,并更新文件的属性,如果文件长度不同于在previous更新所确定的长度。

The Update method is called every 15 seconds and updates the file's properties if the file length differs from the length determined in the previous update.

update方法看起来是这样的:

The update method looks something like this:

var directoryInfo = new DirectoryInfo(archiveFolder);
var archiveFiles = directoryInfo.GetFiles()
                                .OrderByDescending(f=>f.CreationTimeUtc); 
foreach (FileInfo fi in archiveFiles)
{
    //check if file existed in previous update already
    var origFileProps = cachedFiles.GetFileByName(fi.FullName);
    if (origFileProps != null && fi.Length == origFileProps.EndOffset)
    {
        //file length is unchanged
    }
    else
    {
        //Update the properties of this file
        //set EndOffset of the file to current file length
    }
}

我知道的事实,即DirectoryInfo.GetFiles()为pre-填充许多的FileInfo 属性,包括长度 - 这是确定的,只要没有缓存完成后的在的更新(缓存的信息不应超过15秒)。

I am aware of the fact that DirectoryInfo.GetFiles() is pre-populating many of the FileInfo properties including Length - and this is ok as long as no caching is done between updates (cached information should not be older than 15 seconds).

我的假设下,每个 DirectoryInfo.GetFiles()调用生成一个新设置 FileInfos 所有填充了新的信息的权利,然后使用用FindFirstFile / FindNextFile 的Win32 API。但是,这似乎并不如此。

I was under the assumption that each DirectoryInfo.GetFiles() call generates a new set of FileInfos which all are populated with fresh information right then using the FindFirstFile/FindNextFile Win32 API. But this does not seem to be the case.

非常罕见,但最终肯定我碰到的情况下的文件长度,这是获得写入未更新的5年,10年甚至20分钟的时间(测试在Windows 2008 Server X64的,如果做一个文件该事项)。

Very rarely, but eventually for sure I run into situations where the file length for a file that is getting written to is not updated for 5, 10 or even 20 minutes at a time (testing is done on Windows 2008 Server x64 if that matters).

当前的解决方法是调用 fi.Refresh()来强制对每个文件信息的更新。这内部似乎委托给 GetFileAttributesEx Win32 API调用来更新文件信息。

A current workaround is to call fi.Refresh() to force an update on each file info. This internally seems to delegate to a GetFileAttributesEx Win32 API call to update the file information.

虽然手动强制刷新的成本是可以容忍的,我宁愿理解的为什么的我越来越摆在首位过时的信息。如果是的FileInfo 信息生成的,它是如何涉及到 DirectoryInfo.GetFiles调用()?是否有下一个文件I / O缓存层,我不完全掌握?

While the cost of forcing a refresh manually is tolerable I would rather understand why I am getting stale information in the first place. When is the FileInfo information generated and how does it relate to the call of DirectoryInfo.GetFiles() ? Is there a file I/O caching layer underneath that I don't fully grasp?

推荐答案

雷蒙德陈现在已经写了非常详细的博客文章究竟这个问题:

Raymond Chen has now written a very detailed blog post about exactly this issue:

为什么对于那些仍然文件错误报告文件大小写入?

在NTFS,文件系统元数据的目录项的属性不   而是该文件的,与一些元数据复制到   目录条目作为一个调整,以提高目录枚举   性能。像功能用FindFirstFile报告目录   条目,并通过将脂肪使用者习惯的元数据   获得免费,他们能避免被比FAT慢了   目录列表。 目录枚举函数报告   最后更新的元数据,其可以不对应于实际的元数据   如果该目录项是陈旧的。

In NTFS, file system metadata is a property not of the directory entry but rather of the file, with some of the metadata replicated into the directory entry as a tweak to improve directory enumeration performance. Functions like Find­First­File report the directory entry, and by putting the metadata that FAT users were accustomed to getting "for free", they could avoid being slower than FAT for directory listings. The directory-enumeration functions report the last-updated metadata, which may not correspond to the actual metadata if the directory entry is stale.

基本上可以归结为业绩:从收集的目录信息DirectoryInfo.GetFiles()用FindFirstFile / FindNextFile Win32 API的下方被缓存性能方面的考虑,以保证在NTFS的性能优于旧FAT获取目录信息。准确的文件大小信息只能通过调用 GetFileSize()上的文件直接(在.NET中调用刷新()在的FileInfo 或取得的FileInfo 从文件名直接) - 或打开和关闭文件流,导致更新的文件的信息传播到所述目录元数据高速缓存。后一种情况可以解释为什么文件大小,当写作过程中关闭该文件会立即更新。

Essentially it comes down to performance: The directory information gathered from DirectoryInfo.GetFiles() and the FindFirstFile/FindNextFile Win32 API underneath is cached for performance reasons to guarantee better performance in NTFS than in the old FAT for acquiring directory information. Accurate file size information can only be acquired by calling Get­File­Size() on a file directly (in .NET call Refresh() on the FileInfo or acquire a FileInfo from the file name directly) - or opening and closing the file stream which causes the updated file information to be propagated to the directory metadata cache. The later case explains why the file size is immediately updated when the writing process closes the file.

这也说明了这个问题似乎并没有在Windows中显示2003服务器 - 当时的文件信息被更多的时候/时的高速缓存刷新复制 - 这是不是这种情况了适用于Windows 2008 Server的:

This also explains that the problem seemingly did not show up in Windows 2003 Server - back then the file info was replicated more often / whenever the cache was flushed - this is not the case anymore for Windows 2008 Server:

至于多久,答案是一个稍微复杂一点。首发   Windows Vista中(及其相应的Windows Server版本,我   不知道,但我敢肯定,你可以看一下,和你我的意思是宇宏   宝),NTFS文件系统执行这种礼遇的复制时,   最后一句柄文件对象已关闭。 早期版本的NTFS   复制的数据,而该文件是开放每当高速缓存是   满脸通红,这意味着它根据每隔一段时间发生的   联合国predictable时间表。这种变化的结果是,该   目录项现在被更新较不频繁,因此   最后更新的文件尺寸更外的日期比它已经是。

As for how often, the answer is a little more complicated. Starting in Windows Vista (and its corresponding Windows Server version which I don't know but I'm sure you can look up, and by "you" I mean "Yuhong Bao"), the NTFS file system performs this courtesy replication when the last handle to a file object is closed. Earlier versions of NTFS replicated the data while the file was open whenever the cache was flushed, which meant that it happened every so often according to an unpredictable schedule. The result of this change is that the directory entry now gets updated less frequently, and therefore the last-updated file size is more out-of-date than it already was.

阅读全文是非常丰富和建议!

Reading the full article is very informative and recommended!