算法:一种更好的方式来计算单词列表的频率算法、单词、频率、方式

2023-09-11 04:06:14 作者:夏沐兮╭╮

这个问题其实很简单,但我想听到一些想法跳进编码之前。给定一个文件中的每一行字,计算最多n频繁的号码。

This question is actually quite simple yet I would like to hear some ideas before jumping into coding. Given a file with a word in each line, calculating most n frequent numbers.

在弹出的在我脑海中的使用使用的std ::地图前,不幸的是唯一。我知道老乡C ++ ERS会说, unordered_map 会有那么多合理。

The first and unfortunately only thing that pops up in my mind use to use a std::map. I know fellow C++'ers will say that unordered_map would be so much reasonable.

我想知道,如果有什么可以被添加到该算法侧面或这仅仅是基本谁选择最佳的数据结构的胜利式的问题。我搜索它在互联网上读到哈希表和优先级队列可以提供的算法与 O(N)的运行时间,但是我认为这将是复杂,难以实施

I would like to know if anything could be added to the algorithm side or this is just basically 'whoever picks the best data structure wins' type of question. I've searched it over the internet and read that hash table and a priority queue might provide an algorithm with O(n) running time however I assume it will be to complex to implement

任何想法?

推荐答案

最好的数据结构,用于此任务是特里:

The best data structure to use for this task is a Trie:

http://en.wikipedia.org/wiki/Trie

这将胜过一个哈希表计数的字符串。

It will outperform a hash table for counting strings.