我需要在Python列表中较小的N个。我需要这是非常快,因为它在性能的一个关键组成部分,它需要被重复了很多次了。
氮通常不大于10和列表通常具有左右20000元件。该列表总是互不相我调用该函数的时间。排序不可能的地方进行。起初,我写了这个功能:
高清分钟(物品,N):
分钟= [浮动('INF')] * N
在项目的项目:
对于我,在最小枚举(分钟):
如果项目<分:
mins.insert(一,项目)
mins.pop()
打破
返回分钟
不过这个功能不能打败一个简单的排序(项目)[:n],这对整个列表进行排序。下面是我的测试:
随机进口randint,随机
进口时间
test_data = [randint(10,50)+随机()为i的范围(20000)]
的init = time.time()
分钟=分钟(test_data,8)
打印'分钟(物品,N):,time.time() - 初始化
的init = time.time()
分钟=排序(test_data)[:8]
打印排序(项目):N]:',time.time() - 初始化
结果:
分钟(物品,N):0.0632939338684
排序(项目):N]:0.0231449604034
排序()[:n]为快三倍。我相信这是因为:
插入()操作成本高,因为Python列表没有链表。 排序()是一个优化的C函数和我是纯粹的蟒蛇。有没有什么办法可以击败排序():N]? 我应该使用一个C扩展,或耐热玻璃或Psyco的或类似的东西?
在此先感谢您的回答。
解决方案您真正想要分钟的排序序列。
分钟=项目[:N]
mins.sort()
在我的项目[N:]:
如果我<分钟[-1]:
mins.append㈠
mins.sort()
分钟=分钟[:N]
这运行的多的速度更快,因为你甚至不看分钟,除非它可证明有一个比给定项目较大。约1 /第十原始算法的时间
这跑在零时间在我的戴尔。我不得不跑10次获得可测量的运行时间。
分钟(物品,N):0.297000169754
排序(项目):N]:0.109999895096
mins2(项):N]:0.0309998989105
使用 bisect.insort
,而不是追加和排序可能会进一步加快这一个发型。
I need to get the lesser n numbers of a list in Python. I need this to be really fast because it's in a critical part for performance and it needs to be repeated a lot of times.
n is usually no greater than 10 and the list usually has around 20000 elements. The list is always different each time I call the function. Sorting can't be made in place.
Initially, I have written this function:
def mins(items, n):
mins = [float('inf')]*n
for item in items:
for i, min in enumerate(mins):
if item < min:
mins.insert(i, item)
mins.pop()
break
return mins
But this function can't beat a simple sorted(items)[:n] which sort the entire list. Here is my test:
from random import randint, random
import time
test_data = [randint(10, 50) + random() for i in range(20000)]
init = time.time()
mins = mins(test_data, 8)
print 'mins(items, n):', time.time() - init
init = time.time()
mins = sorted(test_data)[:8]
print 'sorted(items)[:n]:', time.time() - init
Results:
mins(items, n): 0.0632939338684
sorted(items)[:n]: 0.0231449604034
sorted()[:n] is three times faster. I believe this is because:
insert() operation is costly because Python lists are not linked lists. sorted() is an optimized c function and mine is pure python.Is there any way to beat sorted()[:n] ? Should I use a C extension, or Pyrex or Psyco or something like that?
Thanks in advance for your answers.
解决方案You actually want a sorted sequence of mins.
mins = items[:n]
mins.sort()
for i in items[n:]:
if i < mins[-1]:
mins.append(i)
mins.sort()
mins= mins[:n]
This runs much faster because you aren't even looking at mins unless it's provably got a value larger than the given item. About 1/10th the time of the original algorithm.
This ran in zero time on my Dell. I had to run it 10 times to get a measurable run time.
mins(items, n): 0.297000169754
sorted(items)[:n]: 0.109999895096
mins2(items)[:n]: 0.0309998989105
Using bisect.insort
instead of append and sort may speed this up a hair further.