任务并行库的目录遍历遍历、任务、目录

2023-09-03 05:01:32 作者:清醒的人最荒唐

我想遍历目录在我的硬盘驱动器,并通过搜索特定搜索字符串的所有文件。这听起来像是一个完美的候选人东西,可以(或应该)并行,因为IO做得相当缓慢。

I'd like to traverse a directory on my hard drive and search through all the files for a specific search string. This sounds like the perfect candidate for something that could (or should) be done in parallel since the IO is rather slow.

传统上,我会写一个递归函数来查找并处理当前目录下的所有文件,然后递归到该目录中的所有目录。我不知道我怎么可以修改这是更多的并行。起初,我只是修改:

Traditionally, I would write a recursive function to finds and processes all files in the current directory and then recurse into all the directories in that directory. I'm wondering how I can modify this to be more parallel. At first I simply modified:

foreach (string directory in directories) { ... }

Parallel.ForEach(directories, (directory) => { ... }) 

但我觉得这可能会造成太多的任务,让自己变成海里,试图派遣返回到UI线程时尤其如此。我还认为,任务的数目是未predictable并认为这可能不是给parallize的有效方法(就是一个字?)此任务

but I feel that this might create too many tasks and get itself into knots, especially when trying to dispatch back onto a UI thread. I also feel that the number of tasks is unpredictable and that this might not be an efficient way to parallize (is that a word?) this task.

有没有人成功地做了一些类似这样的过吗?你这样做有什么建议?

Has anyone successfully done something like this before? What advice do you have in doing so?

推荐答案

不,这听起来不像是一个很好的候选人并行的 precisely 的,因为IO速度很慢。你会被diskbound。假设你只得到了一个磁盘,你真的不希望被从而寻求到多个不同的地方在同一时间。

No, this doesn't sound like a good candidate for parallelism precisely because the IO is slow. You're going to be diskbound. Assuming you've only got one disk, you don't really want to be making it seek to multiple different places at the same time.

这是一个有点像试图附加一些软管同一个水龙头,以获得水快上 - 或者试图在单个内核上运行16个CPU绑定的主题:)

It's a bit like trying to attach several hoses to the same tap in order to get water out faster - or trying to run 16 CPU-bound threads on a single core :)