整数序列的COM pression提供随机访问整数、序列、pression、OM

2023-09-11 04:46:40 作者：痴骨ら

我有个整数在一个小范围内的序列 [0，K）和所有的整数具有相同的频率 F （使该序列的大小是 N = F *氏/ code>）。我想要，现在要做的就是COM preSS这个序列，同时提供的随机访问的（什么是第i个整数）。实现随机访问的时候不必为O（1）。我更感兴趣的是在更高的随机存取时间为代价实现高COM pression。



我还没有尝试过用Huffman编码，因为它赋予codeS基于频率（和我所有的频率都是一样的）。也许我错过了一些简单的编码这种特殊情况下。

任何帮助或指针将AP preciated。

在此先感谢。

 PS：已经问cs.stackexchange，但要求在这里也为更好的覆盖，遗憾
解决方案 如果所有的整数具有相同的频率，那么一个公平的近似最优COM pression将 CEIL（LOG2（K ））元整数位。您可以访问在固定时间有点阵列的这些。

如果 K 是痛苦小（如3），上述方法可能会浪费空间相当数量。但是，你可以结合你的小整数一个固定的数字转换为基地 -   K 数，它可以更有效地融入位固定数量的（也可能是你能方便地适合的结果到一个标准大小的字）。在任何情况下，您也可以访问这个编码在固定时间内。

如果您的整数的没有的具有相同的频率，优化COM pression可能产生的可变比特率从你输入的不同部分，因此简单的数组访问将无法工作。在这种情况下，良好的随机访问性能，需要索引结构：打破你的COM pressed数据至适宜大小的块，其中每个可以DECOM pressed顺序，但是这一次是由块大小为界



如果每个数字的频率的完全的一样，你可以通过充分利用这点来节省一些空间 - 但它可能是不够的，不值得。

  N 随机数的范围 [0，K）是 n的熵LOG2（K），这是 LOG2（k）的每个号码位;这是它需要连接code你的号码的没有的取精确频率的优势位数。

的区分排列F的将每个 k的元素（其中 n中的熵= F * K ）是：

  LOG2（N /（F）^ķ！）= LOG2（N！） - （！F）K * LOG2
 


应用斯特灵公式（这是很好的在这里只有 N 和 F 大），收益率：

 〜N LOG2（N） -  N LOG2（五） -  K（F LOG2（F） - ˚FLOG2（E））
= N LOG2（N） -  N LOG2（五） -  N LOG2（F）+ N LOG2（五）
= N（LOG2（N） -  LOG2（F））
= N LOG2（N / F）
= N LOG2（K）
 


这句话的意思是，如果 N 大且 K 是小，你不会得到一个显著通过把你输入的准确的频率优势的空间。

从斯特林逼近上方的总误差 O（LOG2（N）+ K LOG2（F）），这是 0（LOG2 （N）/ N + LOG2（F）/ F）每个数字连接codeD。这也意味着，如果你的 K 是如此之大，你的 F 小（即，每一个不同的号码只有一个少数副本），您可以节省一些空间，一个聪明的编码。然而，问题指定了 K 是，其实小。
I have a sequence of n integers in a small range [0,k) and all the integers have the same frequency f (so the size of the sequence is n=f∗k). What I'm trying to do now is to compress this sequence while providing random access (what is the i-th integer). The time to achieve random access doesn't have to be O(1). I'm more interested in achieving high compression at the expense of higher random access times.

I haven't tried with Huffman coding since it assigns codes based on frequencies (and all my frequencies are the same). Perhaps I'm missing some simple encoding for this particular case.

Any help or pointers would be appreciated.

Thanks in advance.

PS: Already asked in cs.stackexchange, but asking here also for better coverage, sorry.
 解决方案 If all your integers have the same frequency, then a fair approximation to optimal compression will be ceil(log2(k)) bits per integer.  You can access a bit-array of these in constant time.

If k is painfully small (like 3), the above method may waste a fair amount of space.  But, you can combine a fixed number of your small integers into a base-k number, which can fit more efficiently into a fixed number of bits (you may also be able to fit the result conveniently into a standard-sized word).  In any case, you can also access this coding in constant time.

If your integers don't have the same frequency, optimal compression may yield variable bit rates from different parts of your input, so the simple array access won't work.  In that case, good random-access performance would require an index structure:  break your compressed data into convenient sized chunks, which can each be decompressed sequentially, but this time is bounded by the chunk size.



If the frequency of each number is exactly the same, you may be able to save some space by taking advantage of this -- but it may not be enough to be worthwhile.

The entropy of n random numbers in range [0,k) is n log2(k), which is log2(k) bits per number; this is the number of bits it takes to encode your numbers without taking advantage of the exact frequency.

The entropy of distinguishable permutations of f copies each of k elements (where n=f*k) is:
log2( n!/(f!)^k ) = log2(n!) - k * log2(f!)
Applying Stirling's approximation (which is good here only if n and f are large), yields:
~ n log2(n) - n log2(e) - k ( f log2(f) - f log2(e) )
= n log2(n) - n log2(e) - n log2(f) + n log2(e)
= n ( log2(n) - log2(f) )
= n log2(n/f)
= n log2(k)
What this means is that, if n is large and k is small, you will not gain a significant amount of space by taking advantage of the exact frequency of your input.

The total error from the Stirling approximation above is O(log2(n) + k log2(f)), which is O(log2(n)/n + log2(f)/f) per number encoded.  This does mean that if your k is so large that your f is small (i.e., each distinct number only has a small number of copies), you may be able to save some space with a clever encoding.  However, the question specifies that k is, in fact, small.



                
                
                                    上一篇：找到从字符串列表匹配一个给定的字符串最好的子集字符串、最好的、子集、列表
                                                            下一篇：弗雷德里克森对的堆选择算法简单的解释算法、弗雷德、简单、克森
                                    

                
                
                    
                        相关推荐
                       
                    
                  

                    
了解Sch&#246;nhage-Strassen的算法（巨大的整数倍）算法
两个音频序列之间的感知相似性相似性、序列、音频、两
显示woocommerce消息叠加格或提示无刷新页面提示、消
爪哇 - 在IE抛org.apache.commons.fileupload.Multipa
整数范围的数据结构构造和查找集数据结构、整数、范围
如果评估整数POT（两个动力）整数、两个、动力、POT
整数n次方根方根、整数
如何使用级序遍历序列构造二叉树遍历、如何使用、序列
我有2个排序的整数数组，如何找到第k为O（LOGN）时最大的项
使用塞西 - 乌尔曼算法EX pressions code发电机乌尔、
				   
                

                


    
        
                  

        
        
                  

          

             
        
    
    
                  

    


                
                
                    
                        猜您喜欢
                    
                    
					 
								
								南魮:长有四根髯毛的鲇鱼_枯水期水坑中也
							
						
                        
   在命令行中创建项目时，如何使用code模板中的android？如何使用、命
     本地主机连接的Android主机、Andr、oid
     Reaction-Router-Dom中的Location.pathname和match.url有什么不
     有没有人知道一个更快的方法做String.Split（）？更快、有没有人、方法
     门市房租赁合同范本5篇_门面房租赁合同范本 个人商铺租赁合同
     2017年江苏高考时间：6月7日-6月8_2017年浙江高考时间：6月7日-6月8
     山东建筑大学是一本吗_山东艺术学院是一本吗
                                                      
                                        

                
                
                
                
                    
                        精彩图集
                     
                    
                       
                    宇宙这么大，那么宇宙之外的是什么?会有什
                        
                    浩瀚宇宙有多大:宇宙到底有多大呢？是无边
                        
                    地球和仙女座的距离，我们来计算一下有多少
                        
                    磁星是宇宙中的贵族，至今仅发现20余颗，其磁
                        
                    感受一下白垩纪著名的恐龙灭绝事件，一代地
                        
                    物种演化离不开自然法则，但早已克服生存困
                        
                    人类在其他星球都能跳跃多少高度呢？接下来



            
            

    
    
        


   
    
        精彩推荐
        
        
 
 
    

    
        
 
 
    
    
        图片推荐
        
            
                    
                    朴槿惠迎70岁华诞（图）

整数序列的C​​OM pression提供随机访问整数、序列、pression、OM

整数序列的COM pression提供随机访问整数、序列、pression、OM