什么是初始化多维数组在.NET中的非缺省值的最快的方法?多维、数组、初始化、最快

2023-09-03 02:47:18 作者:掉落的记忆。那里寻回。

我如何初始化一个多维度的基本类型的数组尽可能快?

How do I initialize a multi-dimensional array of a primitive type as fast as possible?

我坚持使用多维数组。我的问题是性能。下面的程序初始化一个100×100阵列中的约。 500蜱。卸下int.MaxValue初始化会导致约180只蜱的循环。大约100蜱创建阵列没有循环,没有初始化为int.MaxValue。

I am stuck with using multi-dimensional arrays. My problem is performance. The following routine initializes a 100x100 array in approx. 500 ticks. Removing the int.MaxValue initialization results in approx. 180 ticks just for the looping. Approximately 100 ticks to create the array without looping and without initializing to int.MaxValue.

在例程过程中跑类同这个被称为几十万到几百万次。 阵列大小将不会在运行过程中改变,并且创建的阵列一在一次一,使用,然后被丢弃,并且新的数组创建。 运行,这可以从一分钟(使用10×10阵列)持续四十五分钟(100×100)。 在该应用程序创建INT,BOOL和结构的阵列。 在可以有多个运行,在同一时间执行,但不是因为性能降低了苦头。 在我使用的100×100作为一个基线。

我愿意就如何优化阵列本次非缺省初始化建议。我有一个想法是使用一个较小的基本类型时可用。例如,当使用的不是int字节,节省100蜱。我会很高兴这一点,但我希望我没有更改原始数据类型。

I am open to suggestions on how to optimize this non-default initialization of an array. One idea I had is to use a smaller primitive type when available. For instance, using byte instead of int, saves 100 ticks. I would be happy with this, but I am hoping that I don't have to change the primitive data type.

    public int[,] CreateArray(Size size) {
        int[,] array = new int[size.Width, size.Height];
        for (int x = 0; x < size.Width; x++) {
            for (int y = 0; y < size.Height; y++) {
                array[x, y] = int.MaxValue;
            }
        }
        return array;
    }

下到450蜱为以下内容:

Down to 450 ticks with the following:

    public int[,] CreateArray1(Size size) {
        int iX = size.Width;
        int iY = size.Height;
        int[,] array = new int[iX, iY];
        for (int x = 0; x < iX; x++) {
            for (int y = 0; y < iY; y++) {
                array[x, y] = int.MaxValue;
            }
        }
        return array;
    }

CreateArray5;接受的执行:限制:无法调整,可以改变

下到约165蜱蜱2800一次性初始化操作。 (见我的回答如下)。如果我能得到 stackalloc 用多维数组的工作,我应该能够得到相同的性能,而不必并初始化的私有静态阵列。

CreateArray5; Accepted Implementation: Limitation: Unable to Resize, can be changed

Down to approximately 165 ticks after a one-time initialization of 2800 ticks. (See my answer below.) If I can get stackalloc to work with multi-dimensional arrays, I should be able to get the same performance without having to intialize the private static array.

    private static bool _arrayInitialized5;
    private static int[,] _array5;

    public static int[,] CreateArray5(Size size) {
        if (!_arrayInitialized5) {
            int iX = size.Width;
            int iY = size.Height;
            _array5 = new int[iX, iY];
            for (int x = 0; x < iX; x++) {
                for (int y = 0; y < iY; y++) {
                    _array5[x, y] = int.MaxValue;
                }
            }
            _arrayInitialized5 = true;
        }
        return (int[,])_array5.Clone();
    }

CreateArray8;接受FPGA实现;限制:需要完全信任

低至约165蜱,而无需使用上面的克隆技术。 (见我的回答如下)。我相信我能得到蜱下,如果我可以计算出返回 CreateArray9

CreateArray8; Accepted Implemention; Limitation: Requires Full Trust

A01 数组基础 创建数组,多维数组切片,数组属性 使用数组 基本操作符,数组特殊运算符 索引 花式索引,布尔索引,缺省索引 python 涂作权的博客

Down to approximately 165 ticks without using the "clone technique" above. (See my answer below.) I am sure I can get the ticks lower, if I can just figure out the return of CreateArray9.

    public unsafe static int[,] CreateArray8(Size size) {
        int iX = size.Width;
        int iY = size.Height;
        int[,] array = new int[iX, iY];
        fixed (int* pfixed = array) {
            int count = array.Length;
            for (int* p = pfixed; count-- > 0; p++)
                *p = int.MaxValue;
        }
        return array;
    }

注意

我提供所有code和对于这个问题的答案说明。我们希望,这将节省别人的时间在未来。

Notes

I am providing all code and notes regarding this question as answers. Hopefully, it will save someone's time in the future.

分配在大对象堆(LOH)数组不是该讨论的一部分。注意听到的性能改进只为在堆中分配的数组。

Arrays allocated on the Large Object Heap (LOH) are not part of this discussion. The performance improvements noted hear are only for arrays allocated on the heap.

我用 stackalloc 来消除初始化数组为默认值没有工作,因为分配的堆栈存储器必须复制该方法的理念。意思是,我必须创建另一个数组来保存结果。这个阵列会被初始化击败使用的全部目的 stackalloc

My idea of using stackalloc to eliminate initializing array to default values did not work out because the allocated stack memory must be copied out of the method. Meaning, I would have to create another array to hold the results. This array would be initialized defeating the whole purpose of using stackalloc.

CLR将只执行不安全 code,如果它是在一个完全信任的程序集。

The CLR will only execute unsafe code if it is in a fully trusted assembly.

需要的变量,以确定是否阵列被初始化并存储初始化数组。性能是相同的不安全/固定初始化后方法。请参阅可能的解决方案丹道的答案。

Requires variables to determine if array is initialized and to store the initialized array. Performance is the same as the unsafe/fixed method after initialization. See Dan Tao's answer for possible solution.

我吸的百分比,但300%是我想通(500〜165滴答)。

I suck at percentages, but 300% is what I figured (500 to 165 ticks).

对于这种应用,我决定使用克隆的方法。以下是瘦肉精在性能样本应用程序中使用通用的实现。

For this application, I settled on using the "clone" method. Following is a "lean" Generic implementation used in the application with performance samples.

初​​始化:

栅格&LT; INT&GT; ;通用克隆类initalize:4348,4336,4339,4654 栅格&LT;布尔&GT; ;通用克隆类initalize:2692,2684,3916,2680 栅格&LT;颜色&GT; ;通用克隆类initalize:3747,4630,2702,2708 Grid<int>; generic clone class initalize: 4348, 4336, 4339, 4654 Grid<bool>; generic clone class initalize: 2692, 2684, 3916, 2680 Grid<Color>; generic clone class initalize: 3747, 4630, 2702, 2708

使用:

栅格&LT; INT&GT; ;通用克隆类:185,159,152,290 栅格&LT;布尔&GT; ;通用克隆类:39,36,44,46

栅格&LT;颜色&GT; ;通用克隆类:2229,2431,2460,2496 Grid<int>; generic clone class: 185, 159, 152, 290 Grid<bool>; generic clone class: 39, 36, 44, 46

Grid<Color>; generic clone class: 2229, 2431, 2460, 2496

public class Grid<T> {
    private T[,] _array;
    private T _value;
    private bool _initialized;
    private int _x;
    private int _y;
    public Grid(Size size, T value, bool initialize) {
        _x = size.Width;
        _y = size.Height;
        _value = value;
        if (initialize) {
            InitializeArray();
        }
    }
    private void InitializeArray() {
        int iX = _x;
        int iY = _y;
        _array = new T[iX, iY];
        for (int y = 0; y < iY; y++) {
            for (int x = 0; x < iX; x++) {
                _array[x, y] = _value;
            }
        }
        _initialized = true;
    }
    public T[,] CreateArray() {
        if (!_initialized) {
            InitializeArray();
        }
        return (T[,])_array.Clone();
    }
}

推荐答案

克隆办法的说明:我怀疑你将能够打败它,在以下方面性能。但是,它可能是一个重大更改考虑到第一次初始化后,它忽略了尺寸参数,只是返回相同大小的数组在每次调用。根据是否实际在方案中重要的,你既可以:

A note on your Clone approach: I doubt you will be able to beat it, in terms of performance. However, it could be a breaking change considering that after the first initialization, it disregards the Size parameter and just returns an array of the same size on every call. Depending on whether that actually matters in your scenario, you could either:

坚持下去,因为它并不重要。 创建一个词典&LT;尺寸,INT [,]&GT; (我的认为的尺寸将循规蹈矩作为重点 - 没有测试),以$ P $独特的尺寸要求每一次对初始化数组。这样做的开销,我不知道的。 抛弃克隆的想法。 Stick with it, because it doesn't matter. Create a Dictionary<Size, int[,]> (I believe Size would behave properly as a key -- haven't tested) to pre-initialize an array every time a unique Size is requested. The overhead of this I am not sure of. Abandon the Clone idea.

如果你最终不得不用3到上面去,这里有一些边缘荒谬的建议:

In case you end up having to go with 3 above, here are a few borderline ridiculous suggestions:

1。缓存你的宽度身高属性本地,而不是从访问它们大小结构在每次迭代

1. Cache your Width and Height properties locally, rather than accessing them from the Size struct on each iteration.

static int[,] CreateArray(Size size) {
    int w = size.Width;
    int h = size.Height;

    int[,] array = new int[w, h];
    for (int x = 0; x < w; x++) {
        for (int y = 0; y < h; y++) {
            array[x, y] = int.MaxValue;
        }
    }

    return array;
}

要创建一个1000×1000的数组,我的机器上,这导致约120000蜱与约14蜱的平均执行时间。

To create a 1000x1000 array, on my machine, this resulted in an average execution time of about 120000 ticks versus about 140000 ticks.

2。如果你有他们并初始化并行阵列利用多个内核。

static int[,] CreateArray(Size size) {
    int w = size.Width;
    int h = size.Height;

    int[,] array = new int[w, h];
    Action<int[,], int, int> fillFirstHalf = FillArray;
    Action<int[,], int, int> fillSecondHalf = FillArray;

    var firstResult = fillFirstHalf.BeginInvoke(array, 0, h / 2, null, null);
    var secondResult = fillSecondHalf.BeginInvoke(array, h / 2, h, null, null);

    fillFirstHalf.EndInvoke(firstResult);
    fillSecondHalf.EndInvoke(secondResult);

    return array;
}

static void FillArray(int[,] array, int ystart, int yend) {
    int w = array.GetLength(0);

    for (int x = 0; x < w; ++x) {
        for (int y = ystart; y < yend; ++y) {
            array[x, y] = int.MaxValue;
        }
    }
}

这一次可能不是在你的场景非常逼真的建议,因为它似乎是你只创建100×100阵列,在这种情况下并行的开销超过了性能提升。然而,对于创建一个1000×1000的数组,我发现,这种做法减少了我的执行时间下降到约70K蜱的平均水平(相对于〜120K蜱我从第一次优化,我建议得到了)。

This one probably isn't a very realistic suggestion in your scenario, since it seems that you're only creating 100x100 arrays, in which case the overhead of the parallelization exceeds the performance gain. However, for creating a 1000x1000 array, I found that this approach reduced my execution times down to about 70k ticks on average (compared to the ~120k ticks I got from the first optimization I suggested).

另外,如果你正在创建的多阵列的这种方式,我会极力推荐并行的是的(也就是说,如果你需要从创建一千阵列,创建500各两个线程),假设你有多个处理器做的工作适合你。如果没有多个处理器,把它忘掉;添加线程只会伤害你的表现。

Also, if you are creating many arrays this way, I would highly recommend parallelizing that (i.e., if you need to create a thousand arrays, create 500 each from two threads), assuming you have multiple processors to do the work for you. Without multiple processors, forget it; adding threads will only hurt your performance.

3。通过使用不安全指针获得更高的性能。

3. Get enhanced performance by using an unsafe pointer.

现在的这里的一个有趣的发现:似乎一个二维数组中的.NET被分配在predictable方式*:基本上作为内存的一维块,其中每个行从起点的量相当于所有previous行的长度被抵消。换言之,一个10X2阵列可使用指针就像20X1阵列进行访问;一个10×10阵列可以访问像100X1阵列等

Now here's an interesting discovery: it appears that a two-dimensional array in .NET is allocated in a predictable way*: basically as a one-dimensional block of memory, where each "row" is offset from the starting point by an amount equivalent to the length of all previous rows. In other words, a 10x2 array can be accessed using pointer just like a 20x1 array; a 10x10 array can be accessed like a 100x1 array, etc.

我不知道如果这是记录在案的行为或不。这可能是一个未指定的实现细节,你不希望依赖。无论哪种方式,这是值得研究的。

I have no idea if this is documented behavior or not. It may be an unspecified implementation detail that you don't want to depend on. Either way, it's worth looking into.

* 这有可能是其他大多数.NET开发人员已经知道这一点,我只是说明明显,在这种情况下,我解除我的评论关于这个是有趣。

在任何情况下,这方面的知识,您可以利用在不安全上下文固定的关键字显著性能提升:

In any case, knowledge of this allows you to exploit the fixed keyword in an unsafe context for a significant performance gain:

static int[,] CreateArray(Size size) {
    int w = size.Width;
    int h = size.Height;

    int[,] array = new int[w, h];
    unsafe {
        fixed (int* ptr = array) {
            for (int i = 0; i < w * h; ++i)
                ptr[i] = int.MaxValue;
        }
    }

    return array;
}

有关初始化signifcant大小的数组,我甚至会建议结合上述方法(并行)这一个 - 所以,从建议#2保持相同的 CreateArray ,然后重写 FillArray 为:

For initializing arrays of a signifcant size, I would even recommend combining the above approach (parallelization) with this one -- so, keep the same CreateArray from suggestion #2, and then rewrite FillArray as:

static void FillArray(int[,] array, int ystart, int yend) {
    int w = array.GetLength(0);

    unsafe {
        fixed (int* p = array) {
            for (int i = w * ystart; i < w * yend; ++i)
                p[i] = int.MaxValue;
        }
    } 
}

实际上,它似乎你已经想通了这最后一部分之前,我张贴了这个,但是我想我无论如何包括它主要用于对组合不安全与并行点

stackalloc 的说明:我想你可能会后文件prechaun在这一个在彩虹的尽头追逐。据对 stackalloc :

A note on stackalloc: I think you may be chasing after the leprechaun at the end of the rainbow with this one. According to the documentation on stackalloc:

的足够大小的内存块   包含类型的 EXPR 元素键入   分配堆栈,而不是上   堆;块的地址是   存储在指针 PTR 。这个存储器是   不受垃圾收集和   因此不必被钉扎   (通过固定)。的的寿命   存储器块被限制为   其中是该方法的寿命   定义。(重点煤矿)

A block of memory of sufficient size to contain expr elements of type type is allocated on the stack, not the heap; the address of the block is stored in pointer ptr. This memory is not subject to garbage collection and therefore does not have to be pinned (via fixed). The lifetime of the memory block is limited to the lifetime of the method in which it is defined. (emphasis mine)

这使我相信,你的不能的返回的对象,其数据存储在由 stackalloc 分配的内存从函数,因为该存储器是仅分配给功能的寿命。

This leads me to believe that you cannot return an object whose data is stored in memory allocated by stackalloc from a function, because that memory is only allocated for the lifetime of the function.