复杂的使用二进制搜索和特里特里、复杂

2023-09-11 22:41:38 作者:无味是清欢

给出的一个文件中按字母顺序排列的话大名单,我需要编写一个程序,给定一个字X,确定是否x是在列表中。 preprocessing是确定的,因为我会调用这个函数多次在不同的输入。 priorties:1.速度。 2.内存

given a large list of alphabetically sorted words in a file,I need to write a program that, given a word x, determines if x is in the list. Preprocessing is ok since I will be calling this function many times over different inputs. priorties: 1. speed. 2. memory

我已经知道我可以使用(n是单词的数量,m是的话平均长度) 1.一个线索,时间为O(的log(n)),空间(最好情况)为O(日志(n * m个)),空间(最坏情况)为O(n * m个)。 2.负载的完整列表到内存中,然后二进制搜索,时间为O(日志(N)),空间为O(N * M)

I already know I can use (n is number of words, m is average length of the words) 1. a trie, time is O(log(n)), space(best case) is O(log(n*m)), space(worst case) is O(n*m). 2. load the complete list into memory, then binary search, time is O(log(n)), space is O(n*m)

我不知道有关的复杂性三,请纠正我,如果他们错了。也有其他的好办法?

I'm not sure about the complexity on tri, please correct me if they are wrong. Also are there other good approaches?

推荐答案

这是线索O(M)时间,以及高达O(M *的log(n))的二进制搜索。空间是渐进O(N * M),用于任何合理的方法,你可能会减少在某些情况下使用COM pression。该线索结构是,从理论上讲,稍微好一些的内存,但在实践中,魔鬼藏在细节的实现:需要存储指针内存,并可能坏高速缓存访​​问

It is O(m) time for the trie, and up to O(m*log(n)) for the binary search. The space is asymptotically O(n*m) for any reasonable method, which you can probably reduce in some cases using compression. The trie structure is, in theory, somewhat better on memory, but in practice it has devils hiding in the implementation details: memory needed to store pointers and potentially bad cache access.

有其他的选择实施一系列结构 - 的HashSet和TreeSet中的大多数语言方便的选择。我会去的哈希集合,因为它是高效,简单的。

There are other options for implementing a set structure - hashset and treeset are easy choices in most languages. I'd go for the hash set as it is efficient and simple.