在C#中的string.join性能问题性能、问题、string、join

2023-09-04 00:37:42 作者:于笙

我一直在研究一个问题,这是presented我:如何编写一个函数,它接受一个字符串作为输入,并返回一个字符串中的字符之间的空格。的功能是将被写入到优化性能时,它被称为每秒数千次。

我知道.NET有一个名为的string.join 功能,我可能会通过在空白字符作为与原始字符串分隔符。

除非使用的string.join ,我可以使用的StringBuilder 下课后追加空格每个字符。

另一种方式来完成此任务是定义一个字符阵列,2 * n-1个字符(你必须添加n-1个字符的空格)。字符数组可以填充在一个循环,然后传递到字符串构造

我已经写了一百万每个参数你好,世界和措施倍多长时间运行所有这些算法的一些.NET code执行的时间。方法(3)是多少,比要快得多(1)或(2)。

我知道,(3)应该是非常快的,因为它避免了创建任何其他字符串引用进行垃圾回收,但在我看来,一个内置的.NET功能,如的string.join 应产生良好的性能。为什么使用的string.join 这样比手工做的工作要慢很多?

 公共静态类识别TestClass
{
    // 491毫秒100万次迭代
    公共静态字符串Space1(字符串s)
    {
        返回的string.join(,s.AsEnumerable());
    }

    // 190毫秒100万次迭代
    公共静态字符串Space2(字符串s)
    {
        如果(s.Length 2)
            返回S;
        StringBuilder的SB =新的StringBuilder();
        sb.Append(S [0]);
        的for(int i = 1; I< s.Length;我++)
        {
            sb.Append('');
            sb.Append(S [I]);
        }
        返回sb.ToString();
    }

    // 50毫秒1百万次迭代
    公共静态字符串Space3(字符串s)
    {
        如果(s.Length 2)
            返回S;
        的char []数组=新的char [s.Length * 2  -  1];
        数组[0] = S [0];
        的for(int i = 1; I< s.Length;我++)
        {
            阵列[2 *的i-1] ='';
            数组[2 * I] = S [I]
        }
        返回新的字符串(数组);
    }
 

更新:我​​已经改变了我的项目为释放模式,并更新了我的运行时间在相应的问题

解决方案   

为什么使用的string.join所以比手工做的工作要慢很多?

原因的string.join 在这种情况下,速度较慢的 的是,你可以写一个算法,具有的确切性质的先验知识您的IEnumerable< T>

的string.join< T>(字符串,IEnumerable的< T>) (过载您正在使用),而另一方面,旨在用任意枚举类型,这意味着它不能pre-分配到合适的大小工作。在这种情况下,它是纯的性能和速度的交易的灵活性。

许多框架的方法处理某些情况下,事情可以加快检查的条件,但是这通常是当特殊情况下将是普遍只完成。

怎么在Java8中使用StringJoiner类拼接字符串

在这种情况下,你有效地创建一个边缘情况下手写程序会更快,但它不是一个普通的用例的string.join 的。在这种情况下,因为你知道,就是这样,事先需要什么,你必须避免所有需要开销的能力,有一个灵活的设计由pre-分配完全正确大小的数组,并建设手动结果。

您会发现,在一般情况下,它通常的可能的编写将出特定的输入数据执行一些框架程序的的的一种方法。这是常见的,作为框架的例程有任何数据集,这意味着你不能优化为特定的输入方案的工作。

I've been researching a question that was presented to me: How to write a function that takes a string as input and returns a string with spaces between the characters. The function is to be written to optimize performance when it is called thousands of times per second.

I know that .net has a function called String.Join, to which I may pass in the space character as a separator along with the original string.

Barring the use of String.Join, I can use the StringBuilder class to append spaces after each character.

Another way to accomplish this task is to declare a character array with 2*n-1 characters (You have to add n-1 characters for the spaces). The character array can be filled in a loop and then passed to the String constructor.

I've written some .net code that runs each of these algorithms one millions times each with the parameter "Hello, World" and measures how long it takes to execute. Method (3) is much, much faster than (1) or (2).

I know that (3) should be very fast because it avoids creating any additional string references to be garbage collected, but it seems to me that a built-in .net function such as String.Join should yield good performance. Why is using String.Join so much slower than doing the work by hand?

public static class TestClass
{
    // 491 milliseconds for 1 million iterations
    public static string Space1(string s) 
    {            
        return string.Join(" ", s.AsEnumerable());
    }

    //190 milliseconds for 1 million iterations
    public static string Space2(string s) 
    {
        if (s.Length < 2)
            return s;
        StringBuilder sb = new StringBuilder();
        sb.Append(s[0]);
        for (int i = 1; i < s.Length; i++)
        {
            sb.Append(' ');
            sb.Append(s[i]);
        }            
        return sb.ToString();
    }

    // 50 milliseconds for 1 million iterations
    public static string Space3(string s) 
    {
        if (s.Length < 2)
            return s;
        char[] array = new char[s.Length * 2 - 1];
        array[0] = s[0];
        for (int i = 1; i < s.Length; i++)
        {
            array[2*i-1] = ' ';
            array[2*i] = s[i];
        }
        return new string(array);
    }

Update: I have changed my project to "Release" mode and updated my elapsed times in the question accordingly.

解决方案

Why is using String.Join so much slower than doing the work by hand?

The reason String.Join is slower in this case is that you can write an algorithm that has prior knowledge of the exact nature of your IEnumerable<T>.

String.Join<T>(string, IEnumerable<T>) (the overload you're using), on the other hand, is intended to work with any arbitrary enumerable type, which means it cannot pre-allocate to the proper size. In this case, it's trading flexibility for pure performance and speed.

Many of the framework methods do handle certain cases where things could be sped up by checking for conditions, but this typically is only done when that "special case" is going to be common.

In this case, you're effectively creating an edge case where a hand-written routine will be faster, but it is not a common use case of String.Join. In this case, since you know, exactly, in advance what is required, you have the ability to avoid all of the overhead required to have a flexible design by pre-allocating an array of exactly the right size, and building the results manually.

You'll find that, in general, it's often possible to write a method that will out perform some of the framework routines for specific input data. This is common, as the framework routines have to work with any dataset, which means that you can't optimize for a specific input scenario.