文字包装算法算法、文字

2023-09-11 01:53:11 作者：栀夏微凉

我打赌有人之前已经解决了这一点，但我的搜索想出空。

I bet somebody has solved this before, but my searches have come up empty.

我想包单词的列表到缓冲区中，记录每一个单词的起始位置和长度。诀窍是，我想通过消除冗余包有效缓冲。

I want to pack a list of words into a buffer, keeping track of the starting position and length of each word. The trick is that I'd like to pack the buffer efficiently by eliminating the redundancy.

例：公仔娃娃屋的房子

这些能装入缓冲区只是作为洋娃娃，记住，娃娃是四个字母开始在位置0 ，洋娃娃为9个字母为0和房子是五个字母为3。

These can be packed into the buffer simply as dollhouse, remembering that doll is four letters starting at position 0, dollhouse is nine letters at 0, and house is five letters at 3.

我想出来的，到目前为止是：

What I've come up with so far is:

排序的话最长到最短：（娃娃屋，房子，娃娃）扫描缓冲区以查看是否串已经存在作为一个子字符串，如果是的话注意的位置。如果它不存在，它添加到缓冲器的末端。

由于长字通常包含更短的话，这工作pretty的很好，但它应该有可能做显著更好。举例来说，如果我向单词列表，包括玩偶，然后我的算法来与 dollhouseragdoll 比 ragdollhouse 。


Since long words often contain shorter words, this works pretty well, but it should be possible to do significantly better.  For example, if I extend the word list to include ragdoll, then my algorithm comes up with dollhouseragdoll which is less efficient than ragdollhouse.
这是一个preprocessing一步，所以我并不十分担心速度。为O（n ^ 2）的罚款。在另一方面，我的实际列表有话好几万，所以为O（n！）可能是出了问题。
This is a preprocessing step, so I'm not terribly worried about speed.  O(n^2) is fine.  On the other hand, my actual list has tens of thousands of words, so O(n!) is probably out of the question.
作为一个侧面说明，这种存储方案用于数据的TrueType字体，参照了'name'表 http://www.microsoft.com/typography/otspec/name.htm  
As a side note, this storage scheme is used for the data in the `name' table of a TrueType font, cf. http://www.microsoft.com/typography/otspec/name.htm
推荐答案
这是在最短超弦理论问题：找到包含了一组给定的字符串作为子的最短的字符串。根据这个IEEE论文（你可能无法获得可惜），正是解决这一问题是 NP完全。然而，启发式的解决方案可供选择。
This is the shortest superstring problem: find the shortest string that contains a set of given strings as substrings.  According to this IEEE paper (which you may not have access to unfortunately), solving this problem exactly is NP-complete.  However, heuristic solutions are available.
作为第一步，你会发现，是其他字符串的子串并删除它们（当然你还需要记录自己的位置相对于包含字符串以某种方式）的所有字符串。这些设施齐全的字符串可以有效地使用广义后缀树被发现。
As a first step, you should find all strings that are substrings of other strings and delete them (of course you still need to record their positions relative to the containing strings somehow).  These fully-contained strings can be found efficiently using a generalised suffix tree.
然后，通过反复合并有重叠时间最长的两个字符串，你肯定可以产生一个解决方案，其长度是最小的可能长度不逊于4倍。它应该可以通过使用两个基数树找到重叠的尺寸迅速在建议的评论通过Zifre康拉德·鲁道夫的回答。或者，你也许能够以某种方式使用广义后缀树。
Then, by repeatedly merging the two strings having longest overlap, you are guaranteed to produce a solution whose length is not worse than 4 times the minimum possible length.  It should be possible to find overlap sizes quickly by using two radix trees as suggested by a comment by Zifre on Konrad Rudolph's answer.  Or, you might be able to use the generalised suffix tree somehow.
我很抱歉，我不能再挖了一个体面的链接，你 - 似乎没有成为一个维基百科页面，或者在这个特殊问题的任何可公开访问的信息。它是简单地提到这里，虽然没有提供建议的解决方案。
I'm sorry I can't dig up a decent link for you -- there doesn't seem to be a Wikipedia page, or any publicly accessible information on this particular problem.  It is briefly mentioned here, though no suggested solutions are provided.



                
                
                                    上一篇：程序/算法找到任何给定程序的时间复杂度程序、复杂度、算法、时间
                                                            下一篇：如何找到两个序列之间的重叠，并返回序列、两个
                                    

                
                
                    
                        相关推荐
                       
                    
                  

                    
算法nth_element算法、nth_element
该算法可以做的只有O（N）的移动稳定就地二元分割？算法、稳
算法来确定非负数值解实存的线性不定方程线性、方程、
随机播放算法算法
最快的算法素性测试素性、算法、最快、测试
为O（n）的算法，以寻找数字集合的中值中值、算法、数字
为什么不Dijkstra算法工作负重量的边缘？算法、重量、边
算法找到在N个字符串的常见字符串字符串、算法、常见
如何能在A *算法应用于旅行商问题？能在、应用于、算法
其中数据类型在Dijkstra算法的队列中使用？队列、算法、
				   
                

                


    
        
                  

        
        
                  

          

             
        
    
    
                  

    


                
                
                    
                        猜您喜欢
                    
                    
					 
								
								8月15日是什么记念日 值得全中国群众牢记
							
						
                        
   安卓2.2：调整屏幕亮度亮度、屏幕
     如何阻止Android的传入消息？消息、Android
     星期一到星期天的英文 星期几的英文单词_星期一到星期日的英语单
     德育原则内容_交朋友的原则有哪些 原则内容介绍
     生姜蜂蜜水能长久喝吗？长久和生姜蜂蜜水作用健壮吗
     鬼才德古拉三世 外西凡是尼亚（Transylvania）吸血又吸金
                                                      
                                        

                
                
                
                
                    
                        精彩图集
                     
                    
                       
                    宇宙这么大，那么宇宙之外的是什么?会有什
                        
                    浩瀚宇宙有多大:宇宙到底有多大呢？是无边
                        
                    地球和仙女座的距离，我们来计算一下有多少
                        
                    磁星是宇宙中的贵族，至今仅发现20余颗，其磁
                        
                    感受一下白垩纪著名的恐龙灭绝事件，一代地
                        
                    物种演化离不开自然法则，但早已克服生存困
                        
                    人类在其他星球都能跳跃多少高度呢？接下来



            
            

    
    
        


   
    
        精彩推荐
        
        
 
 
    

    
        
 
 
    
    
        图片推荐
        
            
                    
                    女生在健身房该当穿什么 在疏通的时间穿
                
                    
                    夹子是什么梗搜集用语，便是夹着声响谈话装
                
                     
                    8种物品别搁床头会招鬼，手机竟然也引鬼_还