如何使用的Parallel.For /的ForEach以获得最大的性能呢? (包括性能计时)性能、如何使用、最大、For

2023-09-04 00:34:21 作者:就此别过

我想我的并行网络分析工具,但速度上涨显得非常微不足道。我有i7-2600K(8核超线程)。

下面是一些code,以显示你的想法。我只能说明 Parallel.ForEach ,但你的想法:

 名单,其中,串> AllLinks = this.GetAllLinks();
ConcurrentDictionary<字符串,主题> AllTopics =新ConcurrentDictionary<字符串,主题> ();

诠释计数= 0;
秒表SW =新的秒表();
sw.Start();

Parallel.ForEach(AllLinks,currentLink =>
{
    主题| = this.ExtractTopicData(currentLink);
    this.AllTopics.TryAdd(currentLink,主题);

    ++计数;

    如果(计数→50)
    {
        Console.WriteLine(sw.ElapsedMilliseconds);
        计数= 0;
    }
});
 

我得到这些时间:

 标准的foreach循环:
24582
59234
82800
117786
140315

每秒2链接


Paralel.For:

21902
31649
41168
49817
59321


每秒5链接

Paralel.ForEach:
10217
20401
39056
49220
58125

每秒5链接
 
C 中如何使用Parallel.For和Parallel.ForEach

首先,为什么是启动时间是的Parallel.For

慢得多

除此之外并联回路给我2.5倍的速度比标准的foreach循环。这是正常的吗?

有没有设置,我可以设置成并联回路可以使用所有的内核?

编辑:

下面是pretty的很多东西 ExtractTopicData 作用:

  HtmlAgilityPack.HtmlWeb网=新HtmlWeb();
HtmlAgilityPack.HtmlDocument DOC = web.Load(URL);
IEnumerable的< HtmlNode>链接= doc.DocumentNode.SelectNodes(// * [@ ID = \topicDetails \]);

VAR话题=新主题();

的foreach(在链接VAR链接)
{
    //解析链接数据
}
 

解决方案

HtmlAgilityPack.HtmlWeb 的简要细读确认其正在使用的同步 WebRequest的 API。因此,你将长时间运行的任务到线程池(通过并行)。线程池设计用于产生的线程返回到池中快速短暂的操作。阻塞IO是一大禁忌。鉴于线程池不愿启动新线程(因为它是不适合这种用法),你会被这种行为进行约束。

异步获取您的Web内容(的 和这里正确的API来使用,你就必须进一步调查自己...... 的),这样​​就不会占用线程池与阻塞任务。然后,您可以喂德codeD回应HtmlAgilityPack解析。

如果你真的想爵士乐表演,你还需要考虑的WebRequest是不能执行异步DNS查找。国际海事组织这是一个可怕的瑕疵的WebRequest的设计。

  

的BeginGetResponse方法需要一些同步设置任务完成(DNS解析,代理检测,并且TCP套接字连接,例如)之前该方法变得异步。

这让高性能下载一个真正的皮塔饼。这是在这个时候,你可能会考虑编写自己的HTTP库,这样一切都可以执行而不阻塞(因此挨饿线程池)。

顺便说一句,通过网络页面chumming时获得最大吞吐量是一个棘手的事情。根据我的经验,你得到的code右键,然后被辜负了它要经过的路由设备。国内许多路由器根本就不能胜任工作。

I am trying to parallelize my web parsing tool but the speed gains seem very minimal. I have i7-2600K (8 cores hyper-threading).

Here is some code to show you the idea. I only show Parallel.ForEach but you get the idea:

List<string> AllLinks = this.GetAllLinks();
ConcurrentDictionary<string, Topic> AllTopics = new ConcurrentDictionary<string, Topic> ( );

int count = 0;
Stopwatch sw = new Stopwatch ( );
sw.Start ( );

Parallel.ForEach ( AllLinks, currentLink =>
{
    Topic topic = this.ExtractTopicData ( currentLink );
    this.AllTopics.TryAdd ( currentLink, topic );

    ++count;

    if ( count > 50 )
    {
        Console.WriteLine ( sw.ElapsedMilliseconds );
        count = 0;
    }
} );

I get these timings:

Standard foreach loop:
24582
59234
82800
117786
140315

2 links per second


Paralel.For:

21902
31649
41168
49817
59321


5 links per second

Paralel.ForEach:
10217
20401
39056
49220
58125

5 links per second

Firstly why is the "startup" timing is much slower in Parallel.For?

Other than that the parallel loops give me 2.5x speed over the standard foreach loop. Is this normal?

Is there a setting I can set so that the parallel loops can use all the cores?

EDIT:

Here is pretty much what ExtractTopicData does:

HtmlAgilityPack.HtmlWeb web = new HtmlWeb ( );
HtmlAgilityPack.HtmlDocument doc = web.Load ( url );
IEnumerable<HtmlNode> links = doc.DocumentNode.SelectNodes ( "//*[@id=\"topicDetails\"]" );

var topic = new Topic();

foreach ( var link in links )
{
    //parse the link data
}

解决方案

A brief perusal of HtmlAgilityPack.HtmlWeb confirms that it is using the synchronous WebRequest API. You are therefore placing long running tasks into the ThreadPool (via Parallel). The ThreadPool is designed for short-lived operations that yield the thread back to the pool quickly. Blocking on IO is a big no-no. Given the ThreadPool's reluctance to start new threads (because it is not designed for this kind of usage), you're going to be constrained by this behaviour.

Fetch your web content asynchronously (see here and here for the correct API to use, you'll have to investigate further yourself...) so that you are not tying up the ThreadPool with blocking tasks. You can then feed the decoded response to the HtmlAgilityPack for parsing.

If you really want to jazz up performance, you'll also need to consider that WebRequest is incapable of performing asynchronous DNS lookup. IMO this is a terrible flaw in the design of WebRequest.

The BeginGetResponse method requires some synchronous setup tasks to complete (DNS resolution, proxy detection, and TCP socket connection, for example) before this method becomes asynchronous.

It makes high performance downloading a real PITA. It's at about this time that you might consider writing your own HTTP library so that everything can execute without blocking (and therefore starving the ThreadPool).

As an aside, getting maximum throughput when chumming through web-pages is a tricky affair. In my experience, you get the code right and are then let down by the routing equipment it has to go through. Many domestic routers simply aren't up to the job.