如何使用的Parallel.For /的ForEach以获得最大的性能呢？（包括性能计时）性能、如何使用、最大、For

2023-09-04 00:34:21 作者：就此别过

我想我的并行网络分析工具，但速度上涨显得非常微不足道。我有i7-2600K（8核超线程）。

下面是一些code，以显示你的想法。我只能说明 Parallel.ForEach ，但你的想法：

 名单，其中，串＆GT; AllLinks = this.GetAllLinks（）;
ConcurrentDictionary＆LT;字符串，主题＆GT; AllTopics =新ConcurrentDictionary＆LT;字符串，主题＆GT; （）;

诠释计数= 0;
秒表SW =新的秒表（）;
sw.Start（）;

Parallel.ForEach（AllLinks，currentLink =＆GT;
{
    主题| = this.ExtractTopicData（currentLink）;
    this.AllTopics.TryAdd（currentLink，主题）;

    ++计数;

    如果（计数→50）
    {
        Console.WriteLine（sw.ElapsedMilliseconds）;
        计数= 0;
    }
}）;

我得到这些时间：

 标准的foreach循环：
24582
59234
82800
117786
140315

每秒2链接


Paralel.For：

21902
31649
41168
49817
59321


每秒5链接

Paralel.ForEach：
10217
20401
39056
49220
58125

每秒5链接

首先，为什么是启动时间是的Parallel.For ？

慢得多

除此之外并联回路给我2.5倍的速度比标准的foreach循环。这是正常的吗？

有没有设置，我可以设置成并联回路可以使用所有的内核？

编辑：

下面是pretty的很多东西 ExtractTopicData 作用：



  HtmlAgilityPack.HtmlWeb网=新HtmlWeb（）;
HtmlAgilityPack.HtmlDocument DOC = web.Load（URL）;
IEnumerable的＆LT; HtmlNode＆GT;链接= doc.DocumentNode.SelectNodes（// * [@ ID = \topicDetails \]）;

VAR话题=新主题（）;

的foreach（在链接VAR链接）
{
    //解析链接数据
}
 

解决方案 
  HtmlAgilityPack.HtmlWeb 的简要细读确认其正在使用的同步 WebRequest的 API。因此，你将长时间运行的任务到线程池（通过并行）。线程池设计用于产生的线程返回到池中快速短暂的操作。阻塞IO是一大禁忌。鉴于线程池不愿启动新线程（因为它是不适合这种用法），你会被这种行为进行约束。 

异步获取您的Web内容（的  和这里正确的API来使用，你就必须进一步调查自己...... 的），这样就不会占用线程池与阻塞任务。然后，您可以喂德codeD回应HtmlAgilityPack解析。

如果你真的想爵士乐表演，你还需要考虑的WebRequest是不能执行异步DNS查找。国际海事组织这是一个可怕的瑕疵的WebRequest的设计。


  的BeginGetResponse方法需要一些同步设置任务完成（DNS解析，代理检测，并且TCP套接字连接，例如）之前该方法变得异步。 


这让高性能下载一个真正的皮塔饼。这是在这个时候，你可能会考虑编写自己的HTTP库，这样一切都可以执行而不阻塞（因此挨饿线程池）。

顺便说一句，通过网络页面chumming时获得最大吞吐量是一个棘手的事情。根据我的经验，你得到的code右键，然后被辜负了它要经过的路由设备。国内许多路由器根本就不能胜任工作。
I am trying to parallelize my web parsing tool but the speed gains seem very minimal. I have i7-2600K (8 cores hyper-threading).

Here is some code to show you the idea. I only show Parallel.ForEach but you get the idea:
List<string> AllLinks = this.GetAllLinks();
ConcurrentDictionary<string, Topic> AllTopics = new ConcurrentDictionary<string, Topic> ( );

int count = 0;
Stopwatch sw = new Stopwatch ( );
sw.Start ( );

Parallel.ForEach ( AllLinks, currentLink =>
{
    Topic topic = this.ExtractTopicData ( currentLink );
    this.AllTopics.TryAdd ( currentLink, topic );

    ++count;

    if ( count > 50 )
    {
        Console.WriteLine ( sw.ElapsedMilliseconds );
        count = 0;
    }
} );
I get these timings:
Standard foreach loop:
24582
59234
82800
117786
140315

2 links per second


Paralel.For:

21902
31649
41168
49817
59321


5 links per second

Paralel.ForEach:
10217
20401
39056
49220
58125

5 links per second
Firstly why is the "startup" timing is much slower in Parallel.For?

Other than that the parallel loops give me 2.5x speed over the standard foreach loop. Is this normal?

Is there a setting I can set so that the parallel loops can use all the cores?

EDIT:

Here is pretty much what ExtractTopicData does:
HtmlAgilityPack.HtmlWeb web = new HtmlWeb ( );
HtmlAgilityPack.HtmlDocument doc = web.Load ( url );
IEnumerable<HtmlNode> links = doc.DocumentNode.SelectNodes ( "//*[@id=\"topicDetails\"]" );

var topic = new Topic();

foreach ( var link in links )
{
    //parse the link data
}

 解决方案 A brief perusal of HtmlAgilityPack.HtmlWeb confirms that it is using the synchronous WebRequest API. You are therefore placing long running tasks into the ThreadPool (via Parallel). The ThreadPool is designed for short-lived operations that yield the thread back to the pool quickly. Blocking on IO is a big no-no. Given the ThreadPool's reluctance to start new threads (because it is not designed for this kind of usage), you're going to be constrained by this behaviour. 

Fetch your web content asynchronously (see here and here for the correct API to use, you'll have to investigate further yourself...) so that you are not tying up the ThreadPool with blocking tasks. You can then feed the decoded response to the HtmlAgilityPack for parsing.

If you really want to jazz up performance, you'll also need to consider that WebRequest is incapable of performing asynchronous DNS lookup. IMO this is a terrible flaw in the design of WebRequest.
  The BeginGetResponse method requires some synchronous setup tasks to complete (DNS resolution, proxy detection, and TCP socket connection, for example) before this method becomes asynchronous. 
It makes high performance downloading a real PITA. It's at about this time that you might consider writing your own HTTP library so that everything can execute without blocking (and therefore starving the ThreadPool).

As an aside, getting maximum throughput when chumming through web-pages is a tricky affair. In my experience, you get the code right and are then let down by the routing equipment it has to go through. Many domestic routers simply aren't up to the job.



                
                
                                    上一篇：如何运行C＃埃克没有.Net框架框架、埃克、Net
                                                            下一篇：（对象[，]）range.get_Value（XL.XlRangeValueDataType.xlRangeValueDefault）导致转换错误对象、错误、range、XL
                                    

                
                
                    
                        相关推荐
                       
                    
                  

                    
字符（这不是最大长度）文本框最高限额这不是、限额、文本
如何使用GROUPBY（）组在多个列用VB.NET？多个、如何使用、N
每个请求AntiForgeryToken变化AntiForgeryToken
XPATH VS DOM API，哪一个具有更好的性能性能、VS、XPAT
如何使用HttpWebRequest.Credentials物业基本身份验证
是的WindowsFormsHost适合目的（.NET WPF主办的WinForms
如何将图标添加到System.Windows.Forms.MenuItem？如何
如何删除在的WinForms容器控件的边界填充？边界、控件、
使用foreach循环来获取文本框的一个分组框之内文本框
如何使用类型转换器转换成具体的双重文化？转换器、转换
				   
                

                


    
        
                  

        
        
                  

          

             
        
    
    
                  

    


                
                
                    
                        猜您喜欢
                    
                    
					 
								
								有比蓝鲸大一亿倍的生物吗 并没有存在_蓝
							
						
                        
   改变getRecentTasks的结果结果、getRecentTasks
     路径生成算法的困惑算法、路径、困惑
     处理中NUnit的任何错误错误、NUnit
     坚果的营养价值及功效 详解坚果有哪些功效与营养价值_茄子营养价
     利可以组什么词语_碌可以组什么词 碌可以组什么词语
     艾伯塔爪龙：靠着吃白蚁为生的恐龙（生存在晚白垩世）
     意大利有4826名医护职员感受新冠肺炎，占世界确诊总额9%
                                                      
                                        

                
                
                
                
                    
                        精彩图集
                     
                    
                       
                    宇宙这么大，那么宇宙之外的是什么?会有什
                        
                    浩瀚宇宙有多大:宇宙到底有多大呢？是无边
                        
                    地球和仙女座的距离，我们来计算一下有多少
                        
                    磁星是宇宙中的贵族，至今仅发现20余颗，其磁
                        
                    感受一下白垩纪著名的恐龙灭绝事件，一代地
                        
                    物种演化离不开自然法则，但早已克服生存困
                        
                    人类在其他星球都能跳跃多少高度呢？接下来



            
            

    
    
        


   
    
        精彩推荐
        
        
 
 
    

    
        
 
 
    
    
        图片推荐
        
            
                    
                    日原福岛便池究竟，女教员上茅厕创造男尸（偷
                
                    
                    刘邦是哪个往代的人物 刘邦修国后杀了哪

如何使用的Parallel.For /的ForEach以获得最大的性能呢？ （包括性能计时）性能、如何使用、最大、For

如何使用的Parallel.For /的ForEach以获得最大的性能呢？（包括性能计时）性能、如何使用、最大、For