对大澳清单PHP函数函数、清单、PHP

2023-09-10 22:24:51 作者:做自己的盖世英雄♡

使用PHP一段时间了之后,我注意到,并非所有的PHP内置函数预期的快。考虑一个函数,发现如果使用的质数的缓存阵列的数是素两个以下可能的实现。

After using PHP for a while now, I've noticed that not all PHP built in functions as fast as expected. Consider the below two possible implementations of a function that finds if a number is prime using a cached array of primes.

//very slow for large $prime_array
$prime_array = array( 2, 3, 5, 7, 11, 13, .... 104729, ... );
$result_array = array();
foreach( $array_of_number => $number ) {
    $result_array[$number] = in_array( $number, $large_prime_array );
}

//speed is much less dependent on size of $prime_array, and runs much faster.
$prime_array => array( 2 => NULL, 3 => NULL, 5 => NULL, 7 => NULL,
                       11 => NULL, 13 => NULL, .... 104729 => NULL, ... );
foreach( $array_of_number => $number ) {
    $result_array[$number] = array_key_exists( $number, $large_prime_array );
}

这是因为in_array与线性搜索为O(n),这将线性减慢 $ prime_array 的增长来实现。其中, array_key_exists 功能与哈希查找Ô实施(1),这将不会放慢,除非哈希表变得非常人口(在这种情况下,它只是为O(n))

This is because in_array is implemented with a linear search O(n) which will linearly slow down as $prime_array grows. Where the array_key_exists function is implemented with a hash lookup O(1) which will not slow down unless the hash table gets extremely populated (in which case it's only O(n)).

到目前为止,我已经通过反复试验来发现大O的,偶尔looking在源$ C ​​$ C 。现在的问题...

So far I've had to discover the big-O's via trial and error, and occasionally looking at the source code. Now for the question...

是有理论(或实际)大O次数的所有列表*内置函数的PHP?

*或至少有趣的

例如觉得很难predict什么样的上市,因为可能的实现依赖于PHP的未知的核心数据结构功能的大O:array_merge,array_merge_recursive,array_reverse,和array_intersect,array_combine,str_replace函数(与数组输入)等等。

For example find it very hard to predict what the big O of functions listed because the possible implementation depends on unknown core data structures of PHP: array_merge, array_merge_recursive, array_reverse, array_intersect, array_combine, str_replace (with array inputs), etc.

推荐答案

由于它似乎并不像其他人也这样做之前,我认为这会是不错的主意,有其参考的地方。我已经走了,但并或者通过基准或code-撇表征阵列_ * 功能。我试图把更有趣的大O接近顶部。这份名单是不完整的。

Since it doesn't seem like anyone has done this before I thought it'd be good idea to have it for reference somewhere. I've gone though and either via benchmark or code-skimming to characterize the array_* functions. I've tried to put the more interesting Big-O near the top. This list is not complete.

注:其中假设计算哈希查找所有大-O是O(1),即使它是真正为O(n)。 n个系数是如此之低,存储足够大阵会伤害你以前的查找大O的特征将开始生效的RAM中的开销。例如至 array_key_exists 在N = 1和N = 1,000,000一个呼叫之差为〜50%的时间的增加。

Note: All the Big-O where calculated assuming a hash lookup is O(1) even though it's really O(n). The coefficient of the n is so low, the ram overhead of storing a large enough array would hurt you before the characteristics of lookup Big-O would start taking effect. For example the difference between a call to array_key_exists at N=1 and N=1,000,000 is ~50% time increase.

有趣的点

使用isset / array_key_exists in_array 更快和 array_search + (工会)比 array_merge 快一点(而且看起来更好)。但是它的工作方式不同,因此记住这一点。 洗牌是在同一个大O层为 array_rand array_pop / array_push 是快于 array_shift / array_unshift 由于重新索引点球 isset/array_key_exists is much faster than in_array and array_search +(union) is a bit faster than array_merge (and looks nicer). But it does work differently so keep that in mind. shuffle is on the same Big-O tier as array_rand array_pop/array_push is faster than array_shift/array_unshift due to re-index penalty

查找

array_key_exists O(N),但真正接近O(1) - 这是因为在碰撞中的线性投票,但由于冲突的机会非常小,系数也非常小。我觉得你对待哈希查找为O(1),得到一个更现实的大O。例如正之间= 1000和N =不同100000大约只有50%有所放缓。

array_key_exists O(n) but really close to O(1) - this is because of linear polling in collisions, but because the chance of collisions is very small, the coefficient is also very small. I find you treat hash lookups as O(1) to give a more realistic big-O. For example the different between N=1000 and N=100000 is only about 50% slow down.

使用isset($数组[$指数]) O(N),但真正接近O(1) - 它采用了相同的查找为array_key_exists。由于它的语言结构,将缓存查找,如果关键是硬codeD,从而加快的情况下相同的密钥被重复使用。

isset( $array[$index] ) O(n) but really close to O(1) - it uses the same lookup as array_key_exists. Since it's language construct, will cache the lookup if the key is hardcoded, resulting in speed up in cases where the same key is used repeatedly.

in_array O(N) - 这是因为它虽然阵列线性搜索,直到找到值

in_array O(n) - this is because it does a linear search though the array until it finds the value.

array_search O(N) - 它采用了相同的核心功能in_array但返回值

array_search O(n) - it uses the same core function as in_array but returns value.

队列功能

array_push O(Σvar_i,对于所有的i)

array_push O(∑ var_i, for all i)

array_pop O(1)

array_shift O(N) - 它必须重新索引所有的按键

array_shift O(n) - it has to reindex all the keys

array_unshift O(N +Σvar_i,对所有的i) - 它必须重新索引所有的按键

array_unshift O(n + ∑ var_i, for all i) - it has to reindex all the keys

阵交集,减

array_intersect_key 如果路口100%做O(MAX(param_i_size)*Σparam_i_count,对于所有的i),如果路口0%相交O(Σparam_i_size,对所有的i )

array_intersect_key if intersection 100% do O(Max(param_i_size)*∑param_i_count, for all i), if intersection 0% intersect O(∑param_i_size, for all i)

和array_intersect 如果路口100%做到为O(n ^ 2 *Σparam_i_count,对于所有的i),如果路口0%相交为O(n ^ 2)

array_intersect if intersection 100% do O(n^2*∑param_i_count, for all i), if intersection 0% intersect O(n^2)

array_intersect_assoc 如果路口100%做O(MAX(param_i_size)*Σparam_i_count,对于所有的i),如果路口0%相交O(Σparam_i_size,对所有的i )

array_intersect_assoc if intersection 100% do O(Max(param_i_size)*∑param_i_count, for all i), if intersection 0% intersect O(∑param_i_size, for all i)

和array_diff 0(πparam_i_size,对所有的i) - 这是所有的param_sizes产品

array_diff O(π param_i_size, for all i) - That's product of all the param_sizes

array_diff_key O(Σparam_i_size,其中i = 1!) - 这是因为我们并不需要遍历第一阵列

array_diff_key O(∑ param_i_size, for i != 1) - this is because we don't need to iterate over the first array.

array_merge O(Σarray_i,I = 1!) - 并不需要遍历第一阵列

array_merge O( ∑ array_i, i != 1 ) - doesn't need to iterate over the first array

+ (工会)O(n),其中n是第2个数组的大小(即array_first + array_second) - 比array_merge,因为它不具有较少的开销重新编号

+ (union) O(n), where n is size of the 2nd array (ie array_first + array_second) - less overhead than array_merge since it doesn't have to renumber

array_replace O(Σarray_i,对于所有的i)

array_replace O( ∑ array_i, for all i )

随机

洗牌 O(N)

array_rand O(n)的 - 需要一个线性的民意调查

array_rand O(n) - Requires a linear poll.

明显的大O

array_fill O(N)

array_fill_keys O(N)

范围 O(N)

array_splice 0(偏移+长度)

array_slice 0(偏移+长度)或O(N),如果长度= NULL

array_slice O(offset + length) or O(n) if length = NULL

array_keys O(N)

array_values​​ O(N)

array_reverse O(N)

array_pad O(pad_size)

array_pad O(pad_size)

array_flip O(N)

array_sum O(N)

array_product O(N)

array_reduce O(N)

array_filter O(N)

array_map O(N)

array_chunk O(N)

array_combine O(N)

我要感谢 Eureqa 以使它容易找到大O的功能。这是一个惊人的免费的程序,可以找到最适合的功能,任意数据。

I'd like to thank Eureqa for making it easy to find the Big-O of the functions. It's an amazing free program that can find the best fitting function for arbitrary data.

编辑:

对于那些谁怀疑PHP数组的查找 O(N),我写了一个基准测试(他们仍然有效 O(1)对于大多数实际值)。

For those who doubt that PHP array lookups are O(N), I've written a benchmark to test that (they are still effectively O(1) for most realistic values).

$tests = 1000000;
$max = 5000001;


for( $i = 1; $i <= $max; $i += 10000 ) {
    //create lookup array
    $array = array_fill( 0, $i, NULL );

    //build test indexes
    $test_indexes = array();
    for( $j = 0; $j < $tests; $j++ ) {
        $test_indexes[] = rand( 0, $i-1 );
    }

    //benchmark array lookups
    $start = microtime( TRUE );
    foreach( $test_indexes as $test_index ) {
        $value = $array[ $test_index ];
        unset( $value );
    }
    $stop = microtime( TRUE );
    unset( $array, $test_indexes, $test_index );

    printf( "%d,%1.15f\n", $i, $stop - $start ); //time per 1mil lookups
    unset( $stop, $start );
}
 
精彩推荐
图片推荐