为什么我的线程的.net应用程序扩展线性分配大量内存的时候?我的、线程、线性、应用程序

2023-09-04 00:25:26 作者:青春之帆由我掌舵

我碰到一些奇怪的大内存分配在.NET运行时的可扩展性的影响。在我的测试应用程序创建大量的在紧密循环的周期固定数量的字符串,并吐出每秒循环迭代的速度。怪胎来当我运行这个循环中的多个线程中 - 似乎,房价不会呈线性增长。这个问题得到当您创建大型字符串更糟。

I’ve run into something strange about the effect of large memory allocations on the scalability of the .Net runtime. In my test application I create lots of strings in a tight loop for a fixed number of cycles and spit out a rate of loop iterations per second. The weirdness comes in when I run this loop in several threads – it appears that the rate does not increase linearly. The problem gets even worse when you create large strings.

让我来告诉你结果。我的机器是运行Windows Server 2008 R1,32位的8GB,8芯盒。它有两个4核Intel Xeon 1.83GHz的(E5320)处理器。执行的工作是一组在一根绳子上交替调用 ToUpper的() TOLOWER()的。运行测试为一个线程,两个线程,等等 - 达到最大。下表中的列是:

Let me show you the results. My machine is an 8gb, 8-core box running Windows Server 2008 R1, 32-bit. It has two 4-core Intel Xeon 1.83ghz (E5320) processors. The "work" performed is a set of alternating calls to ToUpper() and ToLower() on a string. I run the test for one thread, two threads, etc – up to the maximum. The columns in the table below are:

率:跨越的时间将所有的线程循环的数量 线速度:的理想的速度,如果性能要线性扩展。据计算由一个线程乘以线程为该测试的数量来实现的速率。 方差:计算为的速率达不到的线速度的百分比 Rate: The number of loops across all threads divided by the duration. Linear Rate: The ideal rate if performance were to scale linearly. It is calculated as the rate achieved by one thread multiplied by the number of threads for that test. Variance: Calculated as the percentage by which the rate falls short of the linear rate.

第一个例子有一个线程开始了,然后两个线程,最终运行测试与八个线程。每个线程创建万串1024碳化的每个:

The first example starts off with one thread, then two threads and eventually runs the test with eight threads. Each thread creates 10,000 strings of 1024 chars each:


Creating 10000 strings per thread, 1024 chars each, using up to 8 threads
GCMode = Server

Rate          Linear Rate   % Variance    Threads
--------------------------------------------------------
322.58        322.58        0.00 %        1
689.66        645.16        -6.90 %       2
882.35        967.74        8.82 %        3
1081.08       1290.32       16.22 %       4
1388.89       1612.90       13.89 %       5
1666.67       1935.48       13.89 %       6
2000.00       2258.07       11.43 %       7
2051.28       2580.65       20.51 %       8
Done.

例2:10,000循环,8线程,每串32000个字符

在第二个例子中,我增加字符每串的数量为32,000。

Example 2: 10,000 loops, 8 threads, 32,000 chars per string

In the second example I’ve increased the number of chars for each string to 32,000.


Creating 10000 strings per thread, 32000 chars each, using up to 8 threads
GCMode = Server

Rate          Linear Rate   % Variance    Threads
--------------------------------------------------------
14.10         14.10         0.00 %        1
24.36         28.21         13.64 %       2
33.15         42.31         21.66 %       3
40.98         56.42         27.36 %       4
48.08         70.52         31.83 %       5
61.35         84.63         27.51 %       6
72.61         98.73         26.45 %       7
67.85         112.84        39.86 %       8
Done.

注意在从线性率变化的差异;在第二表中的实际速度比线性速率小于39%。

Notice the difference in variance from the linear rate; in the second table the actual rate is 39% less than the linear rate.

我的问题是:?为什么这个程序不是线性扩展

我最初以为,这可能是由于错误共享,但是,您将在源$ C ​​$ C看到的,我不分享任何集合和字符串是相当大的。可能存在的唯一的重叠是在一个串和另一个端的开头。

I initially thought that this could be due to False Sharing but, as you’ll see in the source code, I’m not sharing any collections and the strings are quite big. The only overlap that could exist is at the beginning of one string and the end of another.

我用gcServer启用= true,这样每个内核都有自己的堆和垃圾收集器线程。

I’m using gcServer enabled=true so that each core gets its own heap and garbage collector thread.

我不认为我的对象分配被发送到大对象堆,因为它们是在85000字节大。

I don't think that objects I allocate are being sent to the Large Object Heap because they are under 85000 bytes big.

我认为字符串值可能会在引擎盖下被共享,由于实习MSDN,所以我试图编译实习禁用。这样就产生除上述

I thought that string values may being shared under the hood due to interningMSDN, so I tried compiling interning disabled. This produced worse results than those shown above

我试着用小的,大的整数数组,其中我是通过每个元素循环,并更改值相同的例子。它产生类似的结果,在与较大的分配进行更糟糕的趋势。

I tried the same example using small and large integer arrays, in which I loop through each element and change the value. It produces similar results, following the trend of performing worse with larger allocations.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;
using System.Diagnostics;
using System.Runtime;
using System.Runtime.CompilerServices;

namespace StackOverflowExample
{
  public class Program
  {
    private static int columnWidth = 14;

    static void Main(string[] args)
    {
      int loopCount, maxThreads, stringLength;
      loopCount = maxThreads = stringLength = 0;
      try
      {
        loopCount = args.Length != 0 ? Int32.Parse(args[0]) : 1000;
        maxThreads = args.Length != 0 ? Int32.Parse(args[1]) : 4;
        stringLength = args.Length != 0 ? Int32.Parse(args[2]) : 1024;
      }
      catch
      {
        Console.WriteLine("Usage: StackOverFlowExample.exe [loopCount] [maxThreads] [stringLength]");
        System.Environment.Exit(2);
      }

      float rate;
      float linearRate = 0;
      Stopwatch stopwatch;
      Console.WriteLine("Creating {0} strings per thread, {1} chars each, using up to {2} threads", loopCount, stringLength, maxThreads);
      Console.WriteLine("GCMode = {0}", GCSettings.IsServerGC ? "Server" : "Workstation");
      Console.WriteLine();
      PrintRow("Rate", "Linear Rate", "% Variance", "Threads"); ;
      PrintRow(4, "".PadRight(columnWidth, '-'));

      for (int runCount = 1; runCount <= maxThreads; runCount++)
      {
        // Create the workers
        Worker[] workers = new Worker[runCount];
        workers.Length.Range().ForEach(index => workers[index] = new Worker());

        // Start timing and kick off the threads
        stopwatch = Stopwatch.StartNew();
        workers.ForEach(w => new Thread(
          new ThreadStart(
            () => w.DoWork(loopCount, stringLength)
          )
        ).Start());

        // Wait until all threads are complete
        WaitHandle.WaitAll(
          workers.Select(p => p.Complete).ToArray());
        stopwatch.Stop();

        // Print the results
        rate = (float)loopCount * runCount / stopwatch.ElapsedMilliseconds;
        if (runCount == 1) { linearRate = rate; }

        PrintRow(String.Format("{0:#0.00}", rate),
          String.Format("{0:#0.00}", linearRate * runCount),
          String.Format("{0:#0.00} %", (1 - rate / (linearRate * runCount)) * 100),
          runCount.ToString()); 
      }
      Console.WriteLine("Done.");
    }

    private static void PrintRow(params string[] columns)
    {
      columns.ForEach(c => Console.Write(c.PadRight(columnWidth)));
      Console.WriteLine();
    }

    private static void PrintRow(int repeatCount, string column)
    {
      for (int counter = 0; counter < repeatCount; counter++)
      {
        Console.Write(column.PadRight(columnWidth));
      }
      Console.WriteLine();
    }
  }

  public class Worker
  {
    public ManualResetEvent Complete { get; private set; }

    public Worker()
    {
      Complete = new ManualResetEvent(false);
    }

    public void DoWork(int loopCount, int stringLength)
    {
      // Build the string
      string theString = "".PadRight(stringLength, 'a');
      for (int counter = 0; counter < loopCount; counter++)
      {
        if (counter % 2 == 0) { theString.ToUpper(); }
        else { theString.ToLower(); }
      }
      Complete.Set();
    }
  }

  public static class HandyExtensions
  {
    public static IEnumerable<int> Range(this int max)
    {
      for (int counter = 0; counter < max; counter++)
      {
        yield return counter;
      }
    }

    public static void ForEach<T>(this IEnumerable<T> items, Action<T> action)
    {
      foreach(T item in items)
      {
        action(item);
      }
    }
  }
}

的app.config

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
  <runtime>
    <gcServer enabled="true"/>
  </runtime>
</configuration>

运行示例

通俗易懂,什么是.NET 什么是.NET Framework 什么是.NET Core

要在你的机器上运行StackOverflowExample.exe,这些命令行参数调用它:

Running the Example

To run StackOverflowExample.exe on your box, call it with these command-line parameters:

StackOverFlowExample.exe [loopCount] [maxThreads] [stringLength]

loopCount :的次数每个线程都操作字符串 maxThreads :线程的数目,以发展为。 stringLength :字符数,以填补字符串 loopCount: The number of times each thread will manipulate the string. maxThreads: The number of threads to progress to. stringLength: the number of characters to fill the string with.

推荐答案

您可能想看看那个this矿山的问题。

You may want to look that this question of mine.

我跑进这是因为,在CLR分配内存时要避免重复分配执行线程间同步的事实了类似的问题。如今,随着服务器GC,锁定算法可能会有所不同 - 但本着同样的事情可能会影响你的code

I ran into a similar problem that was due to the fact that the CLR performs inter-thread synchronization when allocating memory to avoid overlapping allocations. Now, with the server GC, the locking algorithm may be different - but something along those same lines may be affecting your code.