正位C#bitarray指数正位、指数、bitarray

2023-09-11 04:10:53 作者:我是文科男

我有交流#BitArray这是相当大的(500,000)的长度,我试图让阵列中设置的所有正位的索引。目前我实现这一目标是:

I have a c# BitArray that is fairly large (500,000) in length, and I am trying to get the index of all the positive bits set in the array. currently I am achieving this by:

public int[] GetIndexesForPositives()
{
    var idIndexes = new int[GetPositiveCount + 1];
    var idx = 0;
    for (var i = 0; i < Length; i++)
        {
            if (Get(i))
            {
                idIndexes[idx++] = i;
            }
        }
    return idIndexes;
}

我LOPP创建已知阳性位的大小的空数组,然后我在bitarray和索引值添加到返回的数组。

I create an empty array of the size of known positive bits, then i lopp over the bitarray and add the index value to the return array.

这意味着我必须完成50​​万遍历数组和它的不完全快。 (大约需要15毫秒)。

This means I have to perform 500,000 loops over the array and its not exactly fast. (takes around 15ms).

我知道BitArray使用一个整数数组在被窝里(我用它来写GetPositiveCount功能 - 通过alogrithm我下车堆栈)?我不知道是否有一个algorythm做到这一点藏汉

I know the BitArray uses an integer array under the covers (i used it to write the GetPositiveCount function - via an alogrithm I got off stack), I wonder if there is an algorythm to do this aswell?

推荐答案

如果你能得到一个int数组中BitArray底层,这应该提供更好的性能:

If you are able to get a int array underlying the BitArray, this should provide much better performance:

假设你不知道设置的位数:

Assuming you don't know the number of bits that are set:

public static int[] GetIndexesForPositives()
{
    var idIndexes = new List<int>();
    System.Reflection.FieldInfo field = data.GetType().GetField("m_array", System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.Instance);
    int[] values = field.GetValue(data) as int[];

    for (var i = 0; i < values.Length; i++)
    {
        int _i = values[i];
        if (_i != 0)
        {
            for (var j = 0; j < 32; j++)
            {
                if ((_i & (1 << j)) != 0)
                {
                    idIndexes.Add(i * 32 + j);
                }
            }
        }
    }
    return idIndexes.ToArray();
}

如果你知道设置,你可以做到这一点位的数量,而不是:

If you do know the number of bits that are set you can do this instead:

public static int[] GetIndexesForPositives(int length)
{
    var idIndexes = new int[length];
    var idx = 0;
    System.Reflection.FieldInfo field = data.GetType().GetField("m_array", System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.Instance);
    int[] values = field.GetValue(data) as int[];

    for (var i = 0; i < values.Length; i++)
    {
        int _i = values[i];
        if (_i != 0)
        {
            for (var j = 0; j < 32; j++)
            {
                if ((_i & (1 << j)) != 0)
                {
                    idIndexes[idx++] = i * 32 + j;
                }
            }
        }
}

我的测试中有这两个工作比你的方法快,即使是一个不知道返回数组多大是摆在首位。

My tests have these two working faster than your method, even the one that doesn't know how large the return array will be in the first place.

我的结果采用随机BitArray的5000万条记录进行测试:

My results tested using a random BitArray of 50million records:

1) 25001063 records found in 50000000, took 1415.5752ms
2) 25001063 records found in 50000000, took 1099.67ms
3) 25001063 records found in 50000000, took 1045.6862ms
4) 25001063 records found in 50000000, took 745.7762ms"

1) is your code but using an arraylist instead of using some `GetPositiveCount` to get the output length.
2) is your code
3) is my (revised) first example
4) is my (revised) second example

编辑:此外,值得指出的是,这是真的可以被制成多线程中受益的一个问题。打破的ByteArray成4个部分,并有你有可能在一次运行,检查数据,4线程

edit: furthermore it is worth pointing out that this is a problem that could really benefit from being made multi-threaded. Break the ByteArray up into 4 parts and there you have 4 threads that could run checking the data at once.

编辑:我知道这已经是公认的,但这里的另一个位可以做,以提高性能,如果你知道,大部分的时间你的名单将是非常稀疏的:

I know this is already accepted but here's another bit you can do to improve performance if you know that most of the time your list will be very sparse:

for (var j = 0; j < 32; j++)
{
     if (_i == 0)
         break;
     if ((_i & (1)) != 0)
     {
         idIndexes.Add(i * 32 + j);
     }
     _i = _i >> 1;
 }

这是稍微慢一些,当列表> 40%以上的人口不过,如果你知道列表始终将是10%1和90%0,那么这将运行得更快了你。

it is slightly slower when the list is >40% or more populated however if you know the list is always going to be 10% 1s and 90% 0s then this will run even faster for you.