为什么复制引用字符串复制比整数慢得多(但反之为Array.Copy())?得多、整数、字符串、Copy

2023-09-03 02:53:33 作者:[ 枷锁 ]

比方说,我想向右1.我可以使用 Array.Copy 或只是一个循环复制的元素逐个移动一个阵列的一部分:

 私有静态无效BuiltInCopy< T>(T []阿根廷,INT启动){
    INT长度= arg.Length  - 开始 -  1;
    Array.Copy(阿根廷,开始,阿根廷,启动+ 1,长度);
}

私有静态无效ElementByElement< T>(T []阿根廷,INT启动){
    的for(int i = arg.Length  -  1; I>启动;我 - ){
        ARG [i] = ARG [我 -  1];
    }
}

私有静态无效ElementByElement2< T>(T []阿根廷,INT启动){
    INT I = arg.Length  -  1;
    而(I>启动)
        ARG [i] = ARG [ - 我]
}
 

ElementByElement2 建议是由Matt威尔斯。)

我测试了使用 Minibench 和结果让我吃惊不少。

 内部类节目{
    私有静态诠释smallArraySize = 32;

    公共静态无效的主要(字串[] args){
        BenchArrayCopy();
    }

    私有静态无效BenchArrayCopy(){
        VAR smallArrayInt =新INT [smallArraySize]
        的for(int i = 0; I< smallArraySize;我++)
            smallArrayInt [我] =我;

        VAR smallArrayString =新的字符串[smallArraySize]
        的for(int i = 0; I< smallArraySize;我++)
            smallArrayString [I] = i.ToString();

        VAR smallArrayDateTime =新的日期时间[smallArraySize]
        的for(int i = 0; I< smallArraySize;我++)
            smallArrayDateTime [我] = DateTime.Now;

        VAR moveInt =新的TestSuite< INT [],INT>(移动阵列的一部分右移1:INT)
            。再加上(BuiltInCopy,Array.Copy())
            。再加上(ElementByElement,元素由元素())
            。再加上(ElementByElement2,元素之间(而))
            .RunTests(smallArrayInt,0);

        VAR moveString =新的TestSuite<字符串[],字符串>(向右移动1阵列的一部分:字符串)
            。再加上(BuiltInCopy,Array.Copy())
            。再加上(ElementByElement,元素由元素())
            。再加上(ElementByElement2,元素之间(而))
            .RunTests(smallArrayString,0);

        moveInt.Display(ResultColumns.All,moveInt.FindBest());
        moveString.Display(ResultColumns.All,moveInt.FindBest());
    }

    私有静态牛逼ElementByElement< T>(T [] ARG){
        ElementByElement(阿根廷,1);
        返回精氨酸[0];
    }

    私有静态牛逼ElementByElement2< T>(T [] ARG){
        ElementByElement2(阿根廷,1);
        返回精氨酸[0];
    }

    私有静态牛逼BuiltInCopy< T>(T [] ARG){
        BuiltInCopy(阿根廷,1);
        返回精氨酸[0];
    }

    私有静态无效BuiltInCopy< T>(T []阿根廷,INT启动){
        INT长度= arg.Length  - 开始 -  1;
        Array.Copy(阿根廷,开始,阿根廷,启动+ 1,长度);
    }

    私有静态无效ElementByElement< T>(T []阿根廷,INT启动){
        的for(int i = arg.Length  -  1; I>启动;我 - ){
            ARG [i] = ARG [我 -  1];
        }
    }

    私有静态无效ElementByElement2< T>(T []阿根廷,INT启动){
        INT I = arg.Length  -  1;
        而(I>启动)
            ARG [i] = ARG [ - 我]
    }
}
 

注意分配不会在这里被测量。所有的方法都只是复制数组元素。由于我是在32位的操作系统,一个 INT 字符串引用占用堆栈空间相同数量。

java中在arraylist表中的字符串类型数值怎么取

这是我所希望看到:

BuiltInCopy 应该是最快的,原因有二:1)它可以进行内存复制; 2)名单,其中,T> .Insert 使用 Array.Copy 。在另一方面,这是不通用的,它可以做很多额外的工作,当阵列有不同的类型,所以也许它没有考虑1个整体优势)。 ElementByElement 应同样快 INT 字符串 BuiltInCopy 要么同样快 INT 字符串,或更慢的 INT (如果它做一些拳)。

然而,所有这些设想都是错误的(至少在我的机器上使用.NET 3.5 SP1)!

BuiltInCopy< INT> ElementByElement&LT显著慢; INT> 32元素的数组。当大小增加时, BuiltInCopy< INT> 变快 ElementByElement<字符串> 是 4倍以上慢于 ElementByElement< INT> BuiltInCopy< INT> BuiltInCopy&LT快;串>

有人能解释这些结果?

更新:从CLR code代团队对数组边界的博客帖子检查消除:

  

建议4:当你复制中型到大型阵列,使用Array.Copy,而不是明确的复制循环。首先,所有的范围检查将被悬挂的循环外一个检查。 如果数组包含对象引用,你也将获得有效的吊装的相关存储到对象类型的数组两个费用:每元素存储检查相关阵列协方差往往可以通过检查被淘汰在动态类型的数组,和垃圾收集相关的写障碍将被汇总,并变得更加高效。最后,我们将能够使用更高效的memcpy式的副本循环。 (而在未来的多核的世界,甚至采用并行,如果阵列是足够大的!)

最后一列是分(总持续时间在滴答迭代/数,由最好的结果归)。

在两个 smallArraySize = 32 运行:

  F:\ MyProgramming \ TimSort \标准\ BIN \发布> Benchmarks.exe
============移动阵列的一部分右移1:INT ============
Array.Copy()468791028 0:30.350 1,46
逐个元素(用于)637091585 0:29.895 1.06
元素之间(而)667595468 0:29.549 1,00

============移动阵列的一部分右移1:字符串============
Array.Copy()432459039 0:30.929 1,62
逐个元素(用于)165344842 0:30.407 4,15
元素之间(而)150996286 0:28.399 4,25


F:\ MyProgramming \ TimSort \标准\ BIN \发布> Benchmarks.exe
============移动阵列的一部分右移1:INT ============
Array.Copy()459040445 0:29.262 1.38
逐个元素(用于)645863535 0:30.929 1.04
元素之间(而)651068500 0:30.064 1,00

============移动阵列的一部分右移1:字符串============
Array.Copy()403684808 0:30.191 1,62
逐个元素(用于)162646202 0:30.051 4,00
元素之间(而)160947492 0:30.945 4,16
 

在两个 smallArraySize = 256 运行:

  F:\ MyProgramming \ TimSort \标准\ BIN \发布> Benchmarks.exe
============移动阵列的一部分右移1:INT ============
Array.Copy()172632756 0:30.128 1,00
逐个元素(用于)91403951 0:30.253 1,90
元素之间(而)65352624 0:29.141 2.56

============移动阵列的一部分右移1:字符串============
Array.Copy()153426720 0:28.964 1,08
逐个元素(用于)19518483 0:30.353 8,91
元素之间(而)19399180 0:29.793 8,80


F:\ MyProgramming \ TimSort \标准\ BIN \发布> Benchmarks.exe
============移动阵列的一部分右移1:INT ============
Array.Copy()184710866 0:30.456 1,00
逐个元素(用于)92878947 0:29.959 1.96
元素之间(而)73588500 0:30.331 2,50

============移动阵列的一部分右移1:字符串============
Array.Copy()157998697 0:30.336 1,16
逐个元素(用于)19905046 0:29.995 9,14
元素之间(而)18838572 0:29.382 9,46
 

解决方案

System.Buffer.BlockCopy接近到C的memcpy的,但仍然有开销。你自己的方法一般会更快地为小的情况下,同时BlockCopy将更快的大型案件。

复制引用比复制整数慢,因为.NET必须做在大多数情况下,一些额外的工作,当你分配一个参考 - 这种额外的工作是与垃圾收集。

有关这一事实的演示,看看code以下,其中包括原生code复制每个字符串元素VS复制每个INT元(原生code在评论)。注意,它实际上使一个函数调用分配字符串引用为src [I]中,而int值内联方式完成:

 静态无效TestStrings()
    {
        字符串[] SRC =新的字符串[5];
        的for(int i = 0; I< src.Length;我++)
            的src [I] = i.ToString();
        字符串[] DST =新的字符串[src.Length]
        运行时//永远循环下去,所以我们可以进入调试器
        //没有调试器。
        而(真)
        {
            的for(int i = 0; I< src.Length;我++)
                / *
                 * 0000006f推DWORD PTR [EBX + ESI * 4 + 0CH]
                 * 00000073 MOV EDX,ESI
                 * 00000075 MOV ECX,DWORD PTR [EBP-14H]
                 * 00000078电话6E9EC15C
                 * /
                DST [i] = SRC [I]
        }
    }
    静态无效TestInts()
    {
        INT []的src =新INT [5];
        的for(int i = 0; I< src.Length;我++)
            SRC [我] =我;
        INT [] DST =新INT [src.Length]
        运行时//永远循环下去,所以我们可以进入调试器
        //没有调试器。
        而(真)
        {
            的for(int i = 0; I< src.Length;我++)
                / *
                 * 0000003d MOV ECX,DWORD PTR [EDI + EDX * 4 + 8]
                 * 00000041 CMP EDX,DWORD PTR [EBX + 4]
                 * 00000044宰00000051
                 * 00000046 MOV DWORD PTR [EBX + EDX * 4 + 8],ECX
                 * /
                DST [i] = SRC [I]
        }
    }
 

Let's say I want to move a part of an array right by 1. I can either use Array.Copy or just make a loop copying elements one by one:

private static void BuiltInCopy<T>(T[] arg, int start) {
    int length = arg.Length - start - 1;
    Array.Copy(arg, start, arg, start + 1, length);
}

private static void ElementByElement<T>(T[] arg, int start) {
    for (int i = arg.Length - 1; i > start; i--) {
        arg[i] = arg[i - 1];
    }
}

private static void ElementByElement2<T>(T[] arg, int start) {
    int i = arg.Length - 1;
    while (i > start)
        arg[i] = arg[--i];
}

(ElementByElement2 was suggested by Matt Howells.)

I tested it using Minibench, and results surprised me quite a lot.

internal class Program {
    private static int smallArraySize = 32;

    public static void Main(string[] args) {
        BenchArrayCopy();
    }

    private static void BenchArrayCopy() {
        var smallArrayInt = new int[smallArraySize];
        for (int i = 0; i < smallArraySize; i++)
            smallArrayInt[i] = i;

        var smallArrayString = new string[smallArraySize];
        for (int i = 0; i < smallArraySize; i++)
            smallArrayString[i] = i.ToString();

        var smallArrayDateTime = new DateTime[smallArraySize];
        for (int i = 0; i < smallArraySize; i++)
            smallArrayDateTime[i] = DateTime.Now;

        var moveInt = new TestSuite<int[], int>("Move part of array right by 1: int")
            .Plus(BuiltInCopy, "Array.Copy()")
            .Plus(ElementByElement, "Element by element (for)")
            .Plus(ElementByElement2, "Element by element (while)")
            .RunTests(smallArrayInt, 0);

        var moveString = new TestSuite<string[], string>("Move part of array right by 1: string")
            .Plus(BuiltInCopy, "Array.Copy()")
            .Plus(ElementByElement, "Element by element (for)")
            .Plus(ElementByElement2, "Element by element (while)")
            .RunTests(smallArrayString, "0");

        moveInt.Display(ResultColumns.All, moveInt.FindBest());
        moveString.Display(ResultColumns.All, moveInt.FindBest());
    }

    private static T ElementByElement<T>(T[] arg) {
        ElementByElement(arg, 1);
        return arg[0];
    }

    private static T ElementByElement2<T>(T[] arg) {
        ElementByElement2(arg, 1);
        return arg[0];
    }

    private static T BuiltInCopy<T>(T[] arg) {
        BuiltInCopy(arg, 1);
        return arg[0];
    }

    private static void BuiltInCopy<T>(T[] arg, int start) {
        int length = arg.Length - start - 1;
        Array.Copy(arg, start, arg, start + 1, length);
    }

    private static void ElementByElement<T>(T[] arg, int start) {
        for (int i = arg.Length - 1; i > start; i--) {
            arg[i] = arg[i - 1];
        }
    }

    private static void ElementByElement2<T>(T[] arg, int start) {
        int i = arg.Length - 1;
        while (i > start)
            arg[i] = arg[--i];
    }
}

Note that allocations are not being measured here. All methods just copy array elements. Since I am on 32-bit OS, an int and a string reference take up the same amount of space on stack.

This is what I expected to see:

BuiltInCopy should be the fastest for two reasons: 1) it can do memory copy; 2) List<T>.Insert uses Array.Copy. On the other hand, it's non-generic, and it can do a lot of extra work when arrays have different types, so perhaps it didn't take full advantage of 1). ElementByElement should be equally fast for int and string. BuiltInCopy should either be equally fast for int and string, or slower for int (in case it has to do some boxing).

However, all of these suppositions were wrong (at least, on my machine with .NET 3.5 SP1)!

BuiltInCopy<int> is significantly slower than ElementByElement<int> for 32-element arrays. When size is increased, BuiltInCopy<int> becomes faster. ElementByElement<string> is over 4 times slower than ElementByElement<int>. BuiltInCopy<int> is faster than BuiltInCopy<string>.

Can anybody explain these results?

UPDATE: From a CLR Code Generation Team blog post on array bounds check elimination:

Advice 4: when you’re copying medium-to-large arrays, use Array.Copy, rather than explicit copy loops. First, all your range checks will be "hoisted" to a single check outside the loop. If the arrays contain object references, you will also get efficient "hoisting" of two more expenses related to storing into arrays of object types: the per-element "store checks" related to array covariance can often be eliminated by a check on the dynamic types of the arrays, and garbage-collection-related write barriers will be aggregated and become much more efficient. Finally, we will able to use more efficient "memcpy"-style copy loops. (And in the coming multicore world, perhaps even employ parallelism if the arrays are big enough!)

The last column is the score (total duration in ticks/number of iterations, normalized by the best result).

Two runs at smallArraySize = 32:

f:\MyProgramming\TimSort\Benchmarks\bin\Release>Benchmarks.exe
============ Move part of array right by 1: int ============
Array.Copy()               468791028 0:30.350 1,46
Element by element (for)   637091585 0:29.895 1,06
Element by element (while) 667595468 0:29.549 1,00

============ Move part of array right by 1: string ============
Array.Copy()               432459039 0:30.929 1,62
Element by element (for)   165344842 0:30.407 4,15
Element by element (while) 150996286 0:28.399 4,25


f:\MyProgramming\TimSort\Benchmarks\bin\Release>Benchmarks.exe
============ Move part of array right by 1: int ============
Array.Copy()               459040445 0:29.262 1,38
Element by element (for)   645863535 0:30.929 1,04
Element by element (while) 651068500 0:30.064 1,00

============ Move part of array right by 1: string ============
Array.Copy()               403684808 0:30.191 1,62
Element by element (for)   162646202 0:30.051 4,00
Element by element (while) 160947492 0:30.945 4,16

Two runs at smallArraySize = 256:

f:\MyProgramming\TimSort\Benchmarks\bin\Release>Benchmarks.exe
============ Move part of array right by 1: int ============
Array.Copy()               172632756 0:30.128 1,00
Element by element (for)    91403951 0:30.253 1,90
Element by element (while)  65352624 0:29.141 2,56

============ Move part of array right by 1: string ============
Array.Copy()               153426720 0:28.964 1,08
Element by element (for)    19518483 0:30.353 8,91
Element by element (while)  19399180 0:29.793 8,80


f:\MyProgramming\TimSort\Benchmarks\bin\Release>Benchmarks.exe
============ Move part of array right by 1: int ============
Array.Copy()               184710866 0:30.456 1,00
Element by element (for)    92878947 0:29.959 1,96
Element by element (while)  73588500 0:30.331 2,50

============ Move part of array right by 1: string ============
Array.Copy()               157998697 0:30.336 1,16
Element by element (for)    19905046 0:29.995 9,14
Element by element (while)  18838572 0:29.382 9,46

解决方案

System.Buffer.BlockCopy is closer to C's memcpy but still has overhead. Your own method will generally be faster for small cases while BlockCopy will be faster for large cases.

Copying references is slower than copying ints because .NET has to do some extra work in most cases when you assign a reference - this extra work is related to garbage collection.

For a demonstration of this fact, look at the code below which includes the native code for copying each string element vs copying each int element (the native code is in comments). Notice that it actually makes a function call to assign the string reference to src[i], while the int is done inline:

    static void TestStrings()
    {
        string[] src = new string[5];
        for (int i = 0; i < src.Length; i++)
            src[i] = i.ToString();
        string[] dst = new string[src.Length];
        // Loop forever so we can break into the debugger when run
        // without debugger.
        while (true)
        {
            for (int i = 0; i < src.Length; i++)
                /*
                 * 0000006f  push        dword ptr [ebx+esi*4+0Ch] 
                 * 00000073  mov         edx,esi 
                 * 00000075  mov         ecx,dword ptr [ebp-14h] 
                 * 00000078  call        6E9EC15C 
                 */
                dst[i] = src[i];
        }
    }
    static void TestInts()
    {
        int[] src = new int[5];
        for (int i = 0; i < src.Length; i++)
            src[i] = i;
        int[] dst = new int[src.Length];
        // Loop forever so we can break into the debugger when run
        // without debugger.
        while (true)
        {
            for (int i = 0; i < src.Length; i++)
                /*
                 * 0000003d  mov         ecx,dword ptr [edi+edx*4+8] 
                 * 00000041  cmp         edx,dword ptr [ebx+4] 
                 * 00000044  jae         00000051 
                 * 00000046  mov         dword ptr [ebx+edx*4+8],ecx 
                 */
                dst[i] = src[i];
        }
    }

 
精彩推荐