最快的一个可能的空结尾的ASCII字节[]转换成字符串的方法吗?转换成、字符串、字节、结尾

2023-09-03 06:00:03 作者:只能亲一口喏

我需要一个(可能)空值终止的ASCII字节数组转换为字符串,在C#中我发现这样做是使用下面显示我的UnsafeAsciiBytesToString方法,以最快的方式。此方法使用String.String(为sbyte *)构造函数包含在它的言论警告:

的值参数假定为指向重$ P $阵列使用默认的ANSI code页面(也就是,由Encoding.Default指定的编码方式)。psenting一个字符串连接codeD

注: *由于默认的ANSI code页面取决于系统,通过此构造,从相同的符号字节数组创建的字符串可能不同,在不同的系统。 *

*如果指定的数组是不是空终止,此构造方法的行为取决于系统。例如,这样的情况可能会导致访问冲突。 *

现在,我敢肯定,该字符串是EN codeD的方式永远不会改变......但系统上的我的应用程序运行在默认的codePAGE可能会发生变化。那么,有没有,我不应该运行使用String.String(为sbyte *)用于此目的的尖叫什么原因?

 使用系统;
使用System.Text;

命名空间FastAsciiBytesToString
{
    静态类StringEx
    {
    公共静态字符串AsciiBytesToString(此字节[]缓冲区,诠释抵消,诠释最大长度)
    {
    INT maxIndex =偏移+最大长度;

    的for(int i =抵消; I< maxIndex;我++)
    {
    ///跳过非空值。
    如果(缓冲[I]!= 0)继续;
    ///首先空,我们发现,返回字符串。
    返回Encoding.ASCII.GetString(缓冲区,偏移,我 - 偏移量);
    }
    ///终止空没有找到。转换整款从胶印到最大长度。
    返回Encoding.ASCII.GetString(缓冲,抵消,最大长度);
    }

    公共静态字符串UnsafeAsciiBytesToString(此字节[]缓冲区,诠释抵消)
    {
    字符串结果= NULL;

    不安全
    {
    固定(BYTE * pAscii =&放大器;缓冲区[偏移])
    {
    结果=新的String((为sbyte *)pAscii);
    }
    }

    返回结果;
    }
    }

    类节目
    {
    静态无效的主要(字串[] args)
    {
    字节[] asciiBytes =新的字节[] {0,0,0,(字节)'的',(字节)B,(字节)的'c',0,0,0};

    字符串结果= asciiBytes.AsciiBytesToString(3,6);

    Console.WriteLine(AsciiBytesToString结果:\{0} \,结果);

    结果= asciiBytes.UnsafeAsciiBytesToString(3);

    Console.WriteLine(UnsafeAsciiBytesToString结果:\{0} \,结果);

    ///非空值终止试验。
    asciiBytes =新字节[] {0,0,0,(字节)'的',(字节)B,(字节)的'c'};

    结果= asciiBytes.UnsafeAsciiBytesToString(3);

    Console.WriteLine(UnsafeAsciiBytesToString结果:\{0} \,结果);

    到Console.ReadLine();
    }
    }
}
 

解决方案

任何理由不使用字符串(为sbyte *,诠释,诠释)?这将避免它的问题可能没有空值终止。也许大致为:

 公共静态字符串UnsafeAsciiBytesToString(此字节[]缓冲区,诠释抵消)
{
    字符串结果= NULL;

    不安全
    {
       固定(BYTE * pAscii =安培;缓冲区)
       {
           结果=新的String((为sbyte *)pAscii,偏移,buffer.Length  - 偏移量);
       }
    }

    返回结果;
}
 
C语言 输出数字6的ascii码值和ascii码为6的字符串 应该怎样编写呢

如果这确实是一个ASCII字符串(即所有字节小于128),那么codePAGE的问题不应该是一个问题,除非你有一个的尤其的奇怪的默认$ C这是不基于ASCII $ CPAGE

出于兴趣,你实际上异形您的应用程序,以确保这是真正的瓶颈?你肯定需要绝对的最快的转换,而不是一个哪个更可读的(例如,使用Encoding.GetString为相应的编码)?的

I need to convert a (possibly) null terminated array of ascii bytes to a string in C# and the fastest way I've found to do it is by using my UnsafeAsciiBytesToString method shown below. This method uses the String.String(sbyte*) constructor which contains a warning in it's remarks:

"The value parameter is assumed to point to an array representing a string encoded using the default ANSI code page (that is, the encoding method specified by Encoding.Default).

Note: * Because the default ANSI code page is system-dependent, the string created by this constructor from identical signed byte arrays may differ on different systems. * ...

* If the specified array is not null-terminated, the behavior of this constructor is system dependent. For example, such a situation might cause an access violation. * "

Now, I'm positive that the way the string is encoded will never change... but the default codepage on the system that my app is running on might change. So, is there any reason that I shouldn't run screaming from using String.String(sbyte*) for this purpose?

using System;
using System.Text;

namespace FastAsciiBytesToString
{
    static class StringEx
    {
    	public static string AsciiBytesToString(this byte[] buffer, int offset, int maxLength)
    	{
    		int maxIndex = offset + maxLength;

    		for( int i = offset; i < maxIndex; i++ )
    		{
    			/// Skip non-nulls.
    		    if( buffer[i] != 0 ) continue;
    		    /// First null we find, return the string.
    		    return Encoding.ASCII.GetString(buffer, offset, i - offset);
    		}
    		/// Terminating null not found. Convert the entire section from offset to maxLength.
    		return Encoding.ASCII.GetString(buffer, offset, maxLength);
    	}

    	public static string UnsafeAsciiBytesToString(this byte[] buffer, int offset)
    	{
    		string result = null;

    		unsafe
    		{
    			fixed( byte* pAscii = &buffer[offset] )
    			{ 
    				result = new String((sbyte*)pAscii);
    			}
    		}

    		return result;
    	}
    }

    class Program
    {
    	static void Main(string[] args)
    	{
    		byte[] asciiBytes = new byte[]{ 0, 0, 0, (byte)'a', (byte)'b', (byte)'c', 0, 0, 0 };

    		string result = asciiBytes.AsciiBytesToString(3, 6);

    		Console.WriteLine("AsciiBytesToString Result: \"{0}\"", result);

    		result = asciiBytes.UnsafeAsciiBytesToString(3);

    		Console.WriteLine("UnsafeAsciiBytesToString Result: \"{0}\"", result);

    		/// Non-null terminated test.
    		asciiBytes = new byte[]{ 0, 0, 0, (byte)'a', (byte)'b', (byte)'c' };

    		result = asciiBytes.UnsafeAsciiBytesToString(3);

    		Console.WriteLine("UnsafeAsciiBytesToString Result: \"{0}\"", result);

    		Console.ReadLine();
    	}
    }
}

解决方案

Any reason not to use String(sbyte*, int, int)? That would avoid the issue of it potentially not being null-terminated. Probably along the lines of:

public static string UnsafeAsciiBytesToString(this byte[] buffer, int offset)
{
    string result = null;

    unsafe
    {
       fixed( byte* pAscii = &buffer)
       { 
           result = new String((sbyte*)pAscii, offset, buffer.Length - offset);
       }
    }

    return result;
}

If this truly is an ASCII string (i.e. all bytes are less than 128) then the codepage problem shouldn't be an issue unless you've got a particularly strange default codepage which isn't based on ASCII.

Out of interest, have you actually profiled your application to make sure that this is really the bottleneck? Do you definitely need the absolute fastest conversion, instead of one which is more readable (e.g. using Encoding.GetString for the appropriate encoding)?