查找与O(n)的时间和O的重复符号整数(1)空间整数、符号、时间、空间

2023-09-11 03:09:40 作者:誰是ωǒ的糖

(这是一种概括:Finding在O(n)时间及O(1)空间)重复

问题:写C ++或C函数的时间和空间为O(n)和O的复杂性(1)分别,发现在一个给定的数组中的重复整数而不改变它

实例:假设{1,0,-2,4,4,1,3,-1,-2}函数必须打印1,-2,和4次(以任何顺序)。 编辑:下面的解决方案需要哆位(重新present 0,1和2),用于在最小到阵列的最大值的范围内的每个整数。必要的字节数(不考虑数组的大小)不超过(INT_MAX - INT_MIN)/ 4 + 1

 的#include< stdio.h中>

无效set_min_max(INT A [],很长很长的无符号的大小,\
                 为int * min_addr,为int * max_addr)
{
    长长的签名我;

    如果收益率(大小!);
    * min_addr = * max_addr =一个[0];
    对于(i = 1; I<大小; ++ I)
    {
        如果(一个[1]  - ; * min_addr)* min_addr = A [1];
        如果(A [1]≥* max_addr)* max_addr = A [1];
    }
}

无效print_repeats(INT A [],很长很长的无符号的大小)
{
    长长的签名我;
    int的最小值,最大值=分钟;
    长长的差异,Q,R;
    字符*二重奏;

    set_min_max(一,大小,和放大器;分钟,和放大器;最大);
    差异=(久长)最大值 - (久长)分钟;
    二重奏=释放calloc(DIFF / 4 + 1,1);
    对于(i = 0; I<大小; ++ I)
    {
        差异=(久长)A [1]  - (久长)分钟; / *夺位的索引
                                                    对应于[I]
                                                    在二人位序列* /
        Q =差异/ 4; / *包含二重奏哆位字节的索引* /
        R =差异%4; / *二人位偏移* /
        开关((二重奏[Q]≥>(6  -  2 * r)的)和3)
        {
            案0:二重奏[Q] + =(1&其中;≤(6  -  2 * R));
                    打破;
            情况1:二重奏[Q] + =(1&其中;≤(6  -  2 * R));
                    的printf(%D,A [1]);
        }
    }
    的putchar('\ N');
    免费(二重奏);
}

无效的主要()
{
    诠释一个[] = {1,0,-2,4,4,1,3,1,-2};
    print_repeats(一个,的sizeof(一)/的sizeof(int)的);
}
 

解决方案

大O符号的定义是,它的参数是一个函数( F(X)的),作为在变量函数(的 X 的)趋于无穷大,存在一个常数的 K 的,使得所述目标成本函数将小于 Kf个(x)的的。典型的 F 的被选择为最小的这样简单的功能,使得满足条件。 (这是pretty的明显如何解除上述多个变量。)

这很重要,因为那的 K 的 - 你不需要指定 - 允许一个整体许多复杂的行为被隐藏的视线。例如,如果该算法的核心是O(n 2 ),它允许其他各种O(1),O(LOGN),O(N),O(nlogn),O (N 3/2 )等支撑位被隐藏,的即使对于现实的输入数据的部分是什么真正占据主导地位。的这是正确的,也可以是完全误导! (一些票友BIGNUM算法具有这种性质的现实。与数学说谎是一件美妙的事情。)

那么,这是怎么回事?好了,你可以假设 INT 大小是固定的很轻松了(例如,32位),并利用这些信息来跳过了很多麻烦和分配的固定大小的标志位的数组来保存所有你真正需要的信息。事实上,通过使用两个位在每个可能的值(一位说你是否已经看到了价值可言,另一个说你是否已经印它),那么你可以处理code与1GB内存固定块小型化。这将然后给你足够的标志信息,以应对尽可能多的32位整数,你可能的永远的希望来处理。 (哎呀这是即使实际在64位机器上。)是的,这将需要一些时间来设置内存块了,但它的不变,所以它的正式O(1),并因此降低了分析。鉴于这种情况,你就必须不断的(但高达)内存消耗和线性时间(你一定要看看每个值,看它是否是新的,见过一次,等),而这正是中提出的要求。

这是一个肮脏的把戏,但。你也可以尝试在扫描输入列表制定出在正常情况下使用的范围内,允许较少的内存;同样,这仅增加线性的时间,你能严格约束如上所以这是不变所需的内存。然而,更多的trickiness,但正式的法律。

样品的 C 的code(这不是C ++,但我不精通C ++,主要的区别是如何的标志阵列的分配和管理):

的#include< stdio.h中> #包括< stdlib.h中> //位摆弄法宝 int是(INT *叉,无符号整型值){     返回元[值GT;> 5&安培; (1&其中;≤(值安培; 31)); } 无效集(INT *叉,无符号整型值){     进制[值GT;大于5] | = 1&其中;≤(值安培; 31); } //主回路 无效print_repeats(INT A [],无符号​​大小){     INT *看出,*完成的;     无符号的我;     可见=释放calloc(134217728,的sizeof(INT));     做=释放calloc(134217728,的sizeof(INT));     对于(i = 0; I<大小;我++){         如果(是(做,(无符号)A [1]))             继续;         如果(被(看到的,(无符号)A [1])){             集(做,(无符号)A [1]);             的printf(%D,A [1]);         } 其他             集(看过,(无符号)A [1]);     }     的printf(\ N);     免费(完成);     免费(看到); } 无效的主要(){     诠释一个[] = {1,0,-2,4,4,1,3,1,-2};     print_repeats(一个,的sizeof(一)/的sizeof(int)的); } 剑指 day 2

(This is a generalization of: Finding duplicates in O(n) time and O(1) space)

Problem: Write a C++ or C function with time and space complexities of O(n) and O(1) respectively that finds the repeating integers in a given array without altering it.

Example: Given {1, 0, -2, 4, 4, 1, 3, 1, -2} function must print 1, -2, and 4 once (in any order). EDIT: The following solution requires a duo-bit (to represent 0, 1, and 2) for each integer in the range of the minimum to the maximum of the array. The number of necessary bytes (regardless of array size) never exceeds (INT_MAX – INT_MIN)/4 + 1.

#include <stdio.h>

void set_min_max(int a[], long long unsigned size,\
                 int* min_addr, int* max_addr)
{
    long long unsigned i;

    if(!size) return;
    *min_addr = *max_addr = a[0];
    for(i = 1; i < size; ++i)
    {
        if(a[i] < *min_addr) *min_addr = a[i];
        if(a[i] > *max_addr) *max_addr = a[i];
    }
}

void print_repeats(int a[], long long unsigned size)
{
    long long unsigned i;
    int min, max = min;
    long long diff, q, r;
    char* duos;

    set_min_max(a, size, &min, &max);
    diff = (long long)max - (long long)min;
    duos = calloc(diff / 4 + 1, 1);
    for(i = 0; i < size; ++i)
    {
        diff = (long long)a[i] - (long long)min; /* index of duo-bit
                                                    corresponding to a[i]
                                                    in sequence of duo-bits */
        q = diff / 4; /* index of byte containing duo-bit in "duos" */
        r = diff % 4; /* offset of duo-bit */
        switch( (duos[q] >> (6 - 2*r )) & 3 )
        {
            case 0: duos[q] += (1 << (6 - 2*r));
                    break;
            case 1: duos[q] += (1 << (6 - 2*r));
                    printf("%d ", a[i]);
        }
    }
    putchar('\n');
    free(duos);
}

void main()
{
    int a[] = {1, 0, -2, 4, 4, 1, 3, 1, -2};
    print_repeats(a, sizeof(a)/sizeof(int));
}

解决方案

The definition of big-O notation is that its argument is a function (f(x)) that, as the variable in the function (x) tends to infinity, there exists a constant K such that the objective cost function will be smaller than Kf(x). Typically f is chosen to be the smallest such simple function such that the condition is satisfied. (It's pretty obvious how to lift the above to multiple variables.)

This matters because that K — which you aren't required to specify — allows a whole multitude of complex behavior to be hidden out of sight. For example, if the core of the algorithm is O(n2), it allows all sorts of other O(1), O(logn), O(n), O(nlogn), O(n3/2), etc. supporting bits to be hidden, even if for realistic input data those parts are what actually dominate. That's right, it can be completely misleading! (Some of the fancier bignum algorithms have this property for real. Lying with mathematics is a wonderful thing.)

So where is this going? Well, you can assume that int is a fixed size easily enough (e.g., 32-bit) and use that information to skip a lot of trouble and allocate fixed size arrays of flag bits to hold all the information that you really need. Indeed, by using two bits per potential value (one bit to say whether you've seen the value at all, another to say whether you've printed it) then you can handle the code with fixed chunk of memory of 1GB in size. That will then give you enough flag information to cope with as many 32-bit integers as you might ever wish to handle. (Heck that's even practical on 64-bit machines.) Yes, it's going to take some time to set that memory block up, but it's constant so it's formally O(1) and so drops out of the analysis. Given that, you then have constant (but whopping) memory consumption and linear time (you've got to look at each value to see whether it's new, seen once, etc.) which is exactly what was asked for.

It's a dirty trick though. You could also try scanning the input list to work out the range allowing less memory to be used in the normal case; again, that adds only linear time and you can strictly bound the memory required as above so that's constant. Yet more trickiness, but formally legal.

[EDIT] Sample C code (this is not C++, but I'm not good at C++; the main difference would be in how the flag arrays are allocated and managed):

#include <stdio.h>
#include <stdlib.h>

// Bit fiddling magic
int is(int *ary, unsigned int value) {
    return ary[value>>5] & (1<<(value&31));
}
void set(int *ary, unsigned int value) {
    ary[value>>5] |= 1<<(value&31);
}

// Main loop
void print_repeats(int a[], unsigned size) {
    int *seen, *done;
    unsigned i;

    seen = calloc(134217728, sizeof(int));
    done = calloc(134217728, sizeof(int));

    for (i=0; i<size; i++) {
        if (is(done, (unsigned) a[i]))
            continue;
        if (is(seen, (unsigned) a[i])) {
            set(done, (unsigned) a[i]);
            printf("%d ", a[i]);
        } else
            set(seen, (unsigned) a[i]);
    }

    printf("\n");
    free(done);
    free(seen);
}

void main() {
    int a[] = {1,0,-2,4,4,1,3,1,-2};
    print_repeats(a,sizeof(a)/sizeof(int));
}