使用PHP一段时间了之后,我注意到,并非所有的PHP内置函数预期的快。考虑一个函数,发现如果使用的质数的缓存阵列的数是素两个以下可能的实现。
After using PHP for a while now, I've noticed that not all PHP built in functions as fast as expected. Consider the below two possible implementations of a function that finds if a number is prime using a cached array of primes.
//very slow for large $prime_array
$prime_array = array( 2, 3, 5, 7, 11, 13, .... 104729, ... );
$result_array = array();
foreach( $array_of_number => $number ) {
$result_array[$number] = in_array( $number, $large_prime_array );
}
//speed is much less dependent on size of $prime_array, and runs much faster.
$prime_array => array( 2 => NULL, 3 => NULL, 5 => NULL, 7 => NULL,
11 => NULL, 13 => NULL, .... 104729 => NULL, ... );
foreach( $array_of_number => $number ) {
$result_array[$number] = array_key_exists( $number, $large_prime_array );
}
这是因为in_array与线性搜索为O(n),这将线性减慢 $ prime_array
的增长来实现。其中, array_key_exists
功能与哈希查找Ô实施(1),这将不会放慢,除非哈希表变得非常人口(在这种情况下,它只是为O(n))
This is because in_array is implemented with a linear search O(n) which will linearly slow down as $prime_array
grows. Where the array_key_exists
function is implemented with a hash lookup O(1) which will not slow down unless the hash table gets extremely populated (in which case it's only O(n)).
到目前为止,我已经通过反复试验来发现大O的,偶尔looking在源$ C $ C 。现在的问题...
So far I've had to discover the big-O's via trial and error, and occasionally looking at the source code. Now for the question...
是有理论(或实际)大O次数的所有列表*内置函数的PHP?
*或至少有趣的
例如觉得很难predict什么样的上市,因为可能的实现依赖于PHP的未知的核心数据结构功能的大O:array_merge,array_merge_recursive,array_reverse,和array_intersect,array_combine,str_replace函数(与数组输入)等等。
For example find it very hard to predict what the big O of functions listed because the possible implementation depends on unknown core data structures of PHP: array_merge, array_merge_recursive, array_reverse, array_intersect, array_combine, str_replace (with array inputs), etc.
由于它似乎并不像其他人也这样做之前,我认为这会是不错的主意,有其参考的地方。我已经走了,但并或者通过基准或code-撇表征阵列_ *
功能。我试图把更有趣的大O接近顶部。这份名单是不完整的。
Since it doesn't seem like anyone has done this before I thought it'd be good idea to have it for reference somewhere. I've gone though and either via benchmark or code-skimming to characterize the array_*
functions. I've tried to put the more interesting Big-O near the top. This list is not complete.
注:其中假设计算哈希查找所有大-O是O(1),即使它是真正为O(n)。 n个系数是如此之低,存储足够大阵会伤害你以前的查找大O的特征将开始生效的RAM中的开销。例如至 array_key_exists
在N = 1和N = 1,000,000一个呼叫之差为〜50%的时间的增加。
Note: All the Big-O where calculated assuming a hash lookup is O(1) even though it's really O(n). The coefficient of the n is so low, the ram overhead of storing a large enough array would hurt you before the characteristics of lookup Big-O would start taking effect. For example the difference between a call to array_key_exists
at N=1 and N=1,000,000 is ~50% time increase.
有趣的点
使用isset
/ array_key_exists
比 in_array
更快和 array_search
+
(工会)比 array_merge
快一点(而且看起来更好)。但是它的工作方式不同,因此记住这一点。
洗牌
是在同一个大O层为 array_rand
array_pop
/ array_push
是快于 array_shift
/ array_unshift
由于重新索引点球
isset
/array_key_exists
is much faster than in_array
and array_search
+
(union) is a bit faster than array_merge
(and looks nicer). But it does work differently so keep that in mind.
shuffle
is on the same Big-O tier as array_rand
array_pop
/array_push
is faster than array_shift
/array_unshift
due to re-index penalty
查找
array_key_exists
O(N),但真正接近O(1) - 这是因为在碰撞中的线性投票,但由于冲突的机会非常小,系数也非常小。我觉得你对待哈希查找为O(1),得到一个更现实的大O。例如正之间= 1000和N =不同100000大约只有50%有所放缓。
array_key_exists
O(n) but really close to O(1) - this is because of linear polling in collisions, but because the chance of collisions is very small, the coefficient is also very small. I find you treat hash lookups as O(1) to give a more realistic big-O. For example the different between N=1000 and N=100000 is only about 50% slow down.
使用isset($数组[$指数])
O(N),但真正接近O(1) - 它采用了相同的查找为array_key_exists。由于它的语言结构,将缓存查找,如果关键是硬codeD,从而加快的情况下相同的密钥被重复使用。
isset( $array[$index] )
O(n) but really close to O(1) - it uses the same lookup as array_key_exists. Since it's language construct, will cache the lookup if the key is hardcoded, resulting in speed up in cases where the same key is used repeatedly.
in_array
O(N) - 这是因为它虽然阵列线性搜索,直到找到值
in_array
O(n) - this is because it does a linear search though the array until it finds the value.
array_search
O(N) - 它采用了相同的核心功能in_array但返回值
array_search
O(n) - it uses the same core function as in_array but returns value.
队列功能
array_push
O(Σvar_i,对于所有的i)
array_push
O(∑ var_i, for all i)
array_pop
O(1)
array_shift
O(N) - 它必须重新索引所有的按键
array_shift
O(n) - it has to reindex all the keys
array_unshift
O(N +Σvar_i,对所有的i) - 它必须重新索引所有的按键
array_unshift
O(n + ∑ var_i, for all i) - it has to reindex all the keys
阵交集,减
array_intersect_key
如果路口100%做O(MAX(param_i_size)*Σparam_i_count,对于所有的i),如果路口0%相交O(Σparam_i_size,对所有的i )
array_intersect_key
if intersection 100% do O(Max(param_i_size)*∑param_i_count, for all i), if intersection 0% intersect O(∑param_i_size, for all i)
和array_intersect
如果路口100%做到为O(n ^ 2 *Σparam_i_count,对于所有的i),如果路口0%相交为O(n ^ 2)
array_intersect
if intersection 100% do O(n^2*∑param_i_count, for all i), if intersection 0% intersect O(n^2)
array_intersect_assoc
如果路口100%做O(MAX(param_i_size)*Σparam_i_count,对于所有的i),如果路口0%相交O(Σparam_i_size,对所有的i )
array_intersect_assoc
if intersection 100% do O(Max(param_i_size)*∑param_i_count, for all i), if intersection 0% intersect O(∑param_i_size, for all i)
和array_diff
0(πparam_i_size,对所有的i) - 这是所有的param_sizes产品
array_diff
O(π param_i_size, for all i) - That's product of all the param_sizes
array_diff_key
O(Σparam_i_size,其中i = 1!) - 这是因为我们并不需要遍历第一阵列
array_diff_key
O(∑ param_i_size, for i != 1) - this is because we don't need to iterate over the first array.
array_merge
O(Σarray_i,I = 1!) - 并不需要遍历第一阵列
array_merge
O( ∑ array_i, i != 1 ) - doesn't need to iterate over the first array
+
(工会)O(n),其中n是第2个数组的大小(即array_first + array_second) - 比array_merge,因为它不具有较少的开销重新编号
+
(union) O(n), where n is size of the 2nd array (ie array_first + array_second) - less overhead than array_merge since it doesn't have to renumber
array_replace
O(Σarray_i,对于所有的i)
array_replace
O( ∑ array_i, for all i )
随机
洗牌
O(N)
array_rand
O(n)的 - 需要一个线性的民意调查
array_rand
O(n) - Requires a linear poll.
明显的大O
array_fill
O(N)
array_fill_keys
O(N)
范围
O(N)
array_splice
0(偏移+长度)
array_slice
0(偏移+长度)或O(N),如果长度= NULL
array_slice
O(offset + length) or O(n) if length = NULL
array_keys
O(N)
array_values
O(N)
array_reverse
O(N)
array_pad
O(pad_size)
array_pad
O(pad_size)
array_flip
O(N)
array_sum
O(N)
array_product
O(N)
array_reduce
O(N)
array_filter
O(N)
array_map
O(N)
array_chunk
O(N)
array_combine
O(N)
我要感谢 Eureqa 以使它容易找到大O的功能。这是一个惊人的免费的程序,可以找到最适合的功能,任意数据。
I'd like to thank Eureqa for making it easy to find the Big-O of the functions. It's an amazing free program that can find the best fitting function for arbitrary data.
编辑:
对于那些谁怀疑PHP数组的查找 O(N)
,我写了一个基准测试(他们仍然有效 O(1)
对于大多数实际值)。
For those who doubt that PHP array lookups are O(N)
, I've written a benchmark to test that (they are still effectively O(1)
for most realistic values).
$tests = 1000000;
$max = 5000001;
for( $i = 1; $i <= $max; $i += 10000 ) {
//create lookup array
$array = array_fill( 0, $i, NULL );
//build test indexes
$test_indexes = array();
for( $j = 0; $j < $tests; $j++ ) {
$test_indexes[] = rand( 0, $i-1 );
}
//benchmark array lookups
$start = microtime( TRUE );
foreach( $test_indexes as $test_index ) {
$value = $array[ $test_index ];
unset( $value );
}
$stop = microtime( TRUE );
unset( $array, $test_indexes, $test_index );
printf( "%d,%1.15f\n", $i, $stop - $start ); //time per 1mil lookups
unset( $stop, $start );
}