最短共同超弦问题的目标是找到最短 包含每个字符串中的一组给定的子字符串的可能。据我所知,这个问题是NP完全。但有一些问题近似策略。 鉴于短字符串,例如
ABRAC
ACADA
ADABR
DABRA
RACAD
如何实现在最短的共同超弦理论的问题,使得上面的给出的字符串输出胡言乱语
?
另一个例子
鉴于
fegiach
bfgiak
hfdegi
iakhfd
fgiakhg
字符串
bfgiakhfdegiach
是一个长度为15的一个可能的解决方案。
我想实现这个在Ruby中,虽然我没有一个在算法的深入研究,但我的工作,以改善这一点。
一个幼稚的贪婪的实施将涉及创建一个后缀阵列的每个子
高清suffix_tree(串)
大小= string.length减
后缀= Array.new(大小)
size.times做|我|
后缀[I] = string.slice(一,大小)
结束
后缀
结束
#store后缀哈希
#键是一个片段,值=后缀
高清hash_of_suffixes(片段)
suffixes_hash = Hash.new
fragments.each做|断枝|
suffixes_hash [#{}的frag] = suffix_tree(FRAG)
结束
suffixes_hash
结束
片段= ['ABRAC','ACADA','ADABR','DABRA','RACAD']
H = hash_of_suffixes(片段)
#then搜索每个片段中的所有后缀树和返回的数
#overlaps每个键
#store结果在图表??
#find片段的可能的排序
我将不胜感激一些帮助。
解决方案
请注意评论指出的问题与您的例子。还要注意的是,如果有一些华而不实的方式做到这一点,我不知道它是什么。我只是遍历所有的排列,smoosh在一起,并找到最短的。
类ShortestSuperstring
高清初始化(*字符串)
self.strings =字符串
结束
高清通话
@result || = smoosh_many strings.permutation.min_by {|排列| smoosh_many(permutation.dup).size}
结束
私人
attr_accessor:字符串
高清smoosh_many(置换,current_word ='')
如果permutation.empty返回current_word?
next_word = permutation.shift
smoosh_many置换,smoosh_two(current_word,next_word)
结束
高清smoosh_two(基地,除)
如果base.include返回基地?加成
max_offset(基地,除).downto 0做|胶印|
返回基地<<除了[偏移..- 1]如果addition.start_with?基地[-offset,抵消]
结束
结束
高清max_offset(字符串1,字符串)
分string1.size,string2.size
结束
高清分钟(N1,N2)
N1< N2? N1:N2
结束
结束
和测试套件:
描述ShortestSuperstring做
高清self.ss(*字符串,possible_results)
例如#{strings.inspect}可能来自超弦#{possible_results.inspect}吗
结果= described_class.new(*字符串).CALL
strings.each {|字符串| result.should包括字符串}
possible_results.should包括(结果),#{result.inspect}不是预期的弦。
结束
结束
SS'',['']
SS一,一个,一个,['一']
分类的a,b的,%(重量)[AB巴]
SS'AB','公元前',['ABC']
SS'公元前','AB',['ABC']
SS'ABCD','AB',['ABCD']
SS'ABCD','公元前',['ABCD']
SS'ABCD','CD',['ABCD']
SS'ABCD','一个','B','C','D',['ABCD']
SS'ABCD','一个','B','C','D','ab'开始,CD,BCD,['ABCD']
%W [ABRAC ACADA ADABR DABRA RACAD] .permutation.each办|排列|
SS *的置换,%(重量)[RACADABRAC ADABRACADA]
结束
%W [fegiach bfgiak hfdegi iakhfd fgiakhg] .permutation.each办|排列|
SS *的置换,%(重量)[bfgiakhgiakhfdegifegiach
bfgiakhgfegiachiakhfdegi
iakhfdegibfgiakhgfegiach
iakhfdegibfgiakhgfegiach
fegiachiakhfdegibfgiakhg
fegiachbfgiakhgiakhfdegi
iakhfdegifegiachbfgiakhg]
结束
结束
The goal of the shortest common superstring problem is to find the shortest possible string that contains every string in a given set as substrings. I understand that the problem is "NP complete". But there are approximation strategies for the problem. Given short strings for example
ABRAC
ACADA
ADABR
DABRA
RACAD
How to implement the shortest common superstring problem such that the output for the given strings above is ABRACADABRA
?
Another example
Given
fegiach
bfgiak
hfdegi
iakhfd
fgiakhg
the string
bfgiakhfdegiach
is a possible solution of length 15.
I would like to implement this in Ruby although I do not have an in depth study of algorithms, though am working to improve that.
A naive greedy implementation would involve creating a suffix array for each substring
def suffix_tree(string)
size = string.length
suffixes = Array.new(size)
size.times do |i|
suffixes[i] = string.slice(i, size)
end
suffixes
end
#store the suffixes in a hash
#key is a fragment, value = suffixes
def hash_of_suffixes(fragments)
suffixes_hash = Hash.new
fragments.each do |frag|
suffixes_hash["#{frag}"]= suffix_tree(frag)
end
suffixes_hash
end
fragments = ['ABRAC','ACADA','ADABR','DABRA','RACAD']
h = hash_of_suffixes(fragments)
#then search each fragment in all the suffix trees and return the number of
#overlaps for each key
#store the results in graph??
#find possible ordering of the fragments
I would be grateful with some help.
解决方案
Please note the comment pointing out the problems with your examples. Also note that if there is some slick way to do this, I don't know what it is. I just iterate over all the permutations, smoosh them together, and find the shortest.
class ShortestSuperstring
def initialize(*strings)
self.strings = strings
end
def call
@result ||= smoosh_many strings.permutation.min_by { |permutation| smoosh_many(permutation.dup).size }
end
private
attr_accessor :strings
def smoosh_many(permutation, current_word='')
return current_word if permutation.empty?
next_word = permutation.shift
smoosh_many permutation, smoosh_two(current_word, next_word)
end
def smoosh_two(base, addition)
return base if base.include? addition
max_offset(base, addition).downto 0 do |offset|
return base << addition[offset..-1] if addition.start_with? base[-offset, offset]
end
end
def max_offset(string1, string2)
min string1.size, string2.size
end
def min(n1, n2)
n1 < n2 ? n1 : n2
end
end
And test suite:
describe ShortestSuperstring do
def self.ss(*strings, possible_results)
example "#{strings.inspect} can come from superstrings #{possible_results.inspect}" do
result = described_class.new(*strings).call
strings.each { |string| result.should include string }
possible_results.should include(result), "#{result.inspect} was not an expected superstring."
end
end
ss '', ['']
ss "a", "a", "a", ['a']
ss "a", "b", %w[ab ba]
ss 'ab', 'bc', ['abc']
ss 'bc', 'ab', ['abc']
ss 'abcd', 'ab', ['abcd']
ss 'abcd', 'bc', ['abcd']
ss 'abcd', 'cd', ['abcd']
ss 'abcd', 'a', 'b', 'c', 'd', ['abcd']
ss 'abcd', 'a', 'b', 'c', 'd', 'ab', 'cd', 'bcd', ['abcd']
%w[ABRAC ACADA ADABR DABRA RACAD].permutation.each do |permutation|
ss *permutation, %w[RACADABRAC ADABRACADA]
end
%w[fegiach bfgiak hfdegi iakhfd fgiakhg].permutation.each do |permutation|
ss *permutation, %w[bfgiakhgiakhfdegifegiach
bfgiakhgfegiachiakhfdegi
iakhfdegibfgiakhgfegiach
iakhfdegibfgiakhgfegiach
fegiachiakhfdegibfgiakhg
fegiachbfgiakhgiakhfdegi
iakhfdegifegiachbfgiakhg]
end
end