方式来存储大字典,低内存占用+快速查找(在Android)快速查找、内存、方式、大字典

2023-09-11 02:55:02 作者:假朋友比真敌人更可怕。

我正在开发,需要大量的安卓文字游戏应用程序(〜25万字字典)可用。我需要:

I'm developing an android word game app that needs a large (~250,000 word dictionary) available. I need:

在相当快看看跌宕如固定时间preferable,需要做的也许200查找有时第二,解决了文字解谜,也许在0.2秒以内更加频繁20查找来检查的话,用户只需拼写。

编辑:查找一般要求在字典里是什么吗?。我想支持两个通配符字为好,但这是很容易,只需生成所有可能的字母通配符本来是和(有两个通配符,即26 * 26查找的单词)检查产生的话

Lookups are typically asking "Is in the dictionary?". I'd like to support up to two wildcards in the word as well, but this is easy enough by just generating all possible letters the wildcards could have been and checking the generated words (i.e. 26 * 26 lookups for a word with two wildcards).

,因为它是一个移动应用程序,使用尽可能少的内存越好,需要为字典数据只有很小的初始下载是重中之重。

用于Java的HashMap类我的第一天真的尝试,这导致内存溢出异常。我已经研究过使用在Android的精简版SQL数据库,但是这似乎有点小题大做。

My first naive attempts used Java's HashMap class, which caused an out of memory exception. I've looked into using the SQL lite databases available on android, but this seems like overkill.

什么是一个很好的方式做什么,我需要?

What's a good way to do what I need?

推荐答案

您可以实现你的目标更卑微的方式也...如果它是一个文字游戏那么我怀疑你正在处理27字母的字母表。因此,假设不超过32个字母的字母表,即每封信5位。你可以将随后12个字母(12×5 = 60位)转换为一个Java的长的使用5位/信琐碎的编码。

You can achieve your goals with more lowly approaches also... if it's a word game then I suspect you are handling 27 letters alphabet. So suppose an alphabet of not more than 32 letters, i.e. 5 bits per letter. You can cram then 12 letters (12 x 5 = 60 bits) into a single Java long by using 5 bits/letter trivial encoding.

这意味着,实际上,如果你不具备较长的单词,超过12个字符/单词你可以重新present你的字典作为一组Java多头。如果你有25万字这套作为多头的一个排序数组应取25万字×8字节/字= 200万〜2MB内存的一个微不足道的presentation。查找是随后通过二进制搜索,应非常快速给出的数据集的尺寸小(小于20比较作为2 ^ 20带您到上述一百万)。

This means that actually if you don't have longer words than 12 letters / word you can just represent your dictionary as a set of Java longs. If you have 250,000 words a trivial presentation of this set as a single, sorted array of longs should take 250,000 words x 8 bytes / word = 2,000,000 ~ 2MB memory. Lookup is then by binary search, which should be very fast given the small size of the data set (less than 20 comparisons as 2^20 takes you to above one million).

如果你有较长的单词,超过12个字符,然后点击我将存储在另一个数组的> 12个字母的话,其中1个字将被重新由2 psented串联的Java渴望一个明显的方式$ P $

IF you have longer words than 12 letters, then I would store the >12 letters words in another array where 1 word would be represented by 2 concatenated Java longs in an obvious manner.

注:为什么这个作品,并有可能更多的空间效率比一个线索和至少很容易实现的是,词典是不变的原因...搜索树是好的,如果你需要修改的数据集,但如果数据集是恒定的,你经常可以运行简单的二进制搜索的方式。

NOTE: the reason why this works and is likely more space-efficient than a trie and at least very simple to implement is that the dictionary is constant... search trees are good if you need to modify the data set, but if the data set is constant, you can often run a way with simple binary search.