红宝石最短常见的超弦红宝石、最短、常见

2023-09-11 07:23:45 作者:昨日悲喜

最短共同超弦问题的目标是找到最短 包含每个字符串中的一组给定的子字符串的可能。据我所知,这个问题是NP完全。但有一些问题近似策略。 鉴于短字符串,例如

  ABRAC
ACADA
ADABR
DABRA
RACAD
 

如何实现在最短的共同超弦理论的问题,使得上面的给出的字符串输出胡言乱语? 另一个例子

 鉴于

fegiach
bfgiak
hfdegi
iakhfd
fgiakhg
 

字符串 bfgiakhfdegiach 是一个长度为15的一个可能的解决方案。

我想实现这个在Ruby中,虽然我没有一个在算法的深入研究,但我的工作,以改善这一点。

揭秘 红宝石 常见仿品及处理手段

一个幼稚的贪婪的实施将涉及创建一个后缀阵列的每个子

 高清suffix_tree(串)
  大小= string.length减
  后缀= Array.new(大小)
  size.times做|我|
    后缀[I] = string.slice(一,大小)
  结束
  后缀
结束

#store后缀哈希
#键是一个片段,值=后缀
高清hash_of_suffixes(片段)
  suffixes_hash = Hash.new
  fragments.each做|断枝|
    suffixes_hash [#{}的frag] = suffix_tree(FRAG)
  结束
  suffixes_hash
结束


片段= ['ABRAC','ACADA','ADABR','DABRA','RACAD']
H = hash_of_suffixes(片段)

#then搜索每个片段中的所有后缀树和返回的数
#overlaps每个键

#store结果在图表??
#find片段的可能的排序

我将不胜感激一些帮助。
 

解决方案

请注意评论指出的问题与您的例子。还要注意的是,如果有一些华而不实的方式做到这一点,我不知道它是什么。我只是遍历所有的排列,smoosh在一起,并找到最短的。

 类ShortestSuperstring
  高清初始化(*字符串)
    self.strings =字符串
  结束

  高清通话
    @result || = smoosh_many strings.permutation.min_by {|排列| smoosh_many(permutation.dup).size}
  结束

  私人

  attr_accessor:字符串

  高清smoosh_many(置换,current_word ='​​')
    如果permutation.empty返回current_word?
    next_word = permutation.shift
    smoosh_many置换,smoosh_two(current_word,next_word)
  结束

  高清smoosh_two(基地,除)
    如果base.include返回基地?加成
    max_offset(基地,除).downto 0做|胶印|
      返回基地<<除了[偏移..- 1]如果addition.start_with?基地[-offset,抵消]
    结束
  结束

  高清max_offset(字符串1,字符串)
    分string1.size,string2.size
  结束

  高清分钟(N1,N2)
    N1< N2? N1:N2
  结束
结束
 

和测试套件:

 描述ShortestSuperstring做
  高清self.ss(*字符串,possible_results)
    例如#{strings.inspect}可能来自超弦#{possible_results.inspect}吗
      结果= described_class.new(*字符串).CALL
      strings.each {|字符​​串| result.should包括字符串}
      possible_results.should包括(结果),#{result.inspect}不是预期的弦。
    结束
  结束

  SS'',['']
  SS一,一个,一个,['一']
  分类的a,b的,%(重量)[AB巴]
  SS'AB','公元前',['ABC']
  SS'公元前','AB',['ABC']
  SS'ABCD','AB',['ABCD']
  SS'ABCD','公元前',['ABCD']
  SS'ABCD','CD',['ABCD']
  SS'ABCD','一个','B','C','D',['ABCD']
  SS'ABCD','一个','B','C','D','ab'开始,CD,BCD,['ABCD']

  %W [ABRAC ACADA ADABR DABRA RACAD] .permutation.each办|排列|
    SS *的置换,%(重量)[RACADABRAC ADABRACADA]
  结束

  %W [fegiach bfgiak hfdegi iakhfd fgiakhg] .permutation.each办|排列|
    SS *的置换,%(重量)[bfgiakhgiakhfdegifegiach
                        bfgiakhgfegiachiakhfdegi
                        iakhfdegibfgiakhgfegiach
                        iakhfdegibfgiakhgfegiach
                        fegiachiakhfdegibfgiakhg
                        fegiachbfgiakhgiakhfdegi
                        iakhfdegifegiachbfgiakhg]
  结束
结束
 

The goal of the shortest common superstring problem is to find the shortest possible string that contains every string in a given set as substrings. I understand that the problem is "NP complete". But there are approximation strategies for the problem. Given short strings for example

ABRAC
ACADA
ADABR
DABRA
RACAD

How to implement the shortest common superstring problem such that the output for the given strings above is ABRACADABRA? Another example

Given

fegiach
bfgiak
hfdegi
iakhfd
fgiakhg

the string bfgiakhfdegiach is a possible solution of length 15.

I would like to implement this in Ruby although I do not have an in depth study of algorithms, though am working to improve that.

A naive greedy implementation would involve creating a suffix array for each substring

def suffix_tree(string)
  size = string.length
  suffixes = Array.new(size)
  size.times do |i|
    suffixes[i] = string.slice(i, size)
  end
  suffixes
end

#store the suffixes in a hash
#key is a fragment, value = suffixes
def hash_of_suffixes(fragments)
  suffixes_hash = Hash.new
  fragments.each do |frag|
    suffixes_hash["#{frag}"]= suffix_tree(frag)
  end
  suffixes_hash
end


fragments = ['ABRAC','ACADA','ADABR','DABRA','RACAD']
h = hash_of_suffixes(fragments)

#then search each fragment in all the suffix trees and return the number of 
#overlaps for each key

#store the results in graph??
#find possible ordering of the fragments

I would be grateful with some help.

解决方案

Please note the comment pointing out the problems with your examples. Also note that if there is some slick way to do this, I don't know what it is. I just iterate over all the permutations, smoosh them together, and find the shortest.

class ShortestSuperstring
  def initialize(*strings)
    self.strings = strings
  end

  def call
    @result ||= smoosh_many strings.permutation.min_by { |permutation| smoosh_many(permutation.dup).size }
  end

  private

  attr_accessor :strings

  def smoosh_many(permutation, current_word='')
    return current_word if permutation.empty?
    next_word = permutation.shift
    smoosh_many permutation, smoosh_two(current_word, next_word)
  end

  def smoosh_two(base, addition)
    return base if base.include? addition
    max_offset(base, addition).downto 0 do |offset|
      return base << addition[offset..-1] if addition.start_with? base[-offset, offset]
    end
  end

  def max_offset(string1, string2)
    min string1.size, string2.size
  end

  def min(n1, n2)
    n1 < n2 ? n1 : n2
  end
end

And test suite:

describe ShortestSuperstring do
  def self.ss(*strings, possible_results)
    example "#{strings.inspect} can come from superstrings #{possible_results.inspect}" do
      result = described_class.new(*strings).call
      strings.each { |string| result.should include string }
      possible_results.should include(result), "#{result.inspect} was not an expected superstring."
    end
  end

  ss '', ['']
  ss "a", "a", "a", ['a']
  ss "a", "b", %w[ab ba]
  ss 'ab', 'bc', ['abc']
  ss 'bc', 'ab', ['abc']
  ss 'abcd', 'ab', ['abcd']
  ss 'abcd', 'bc', ['abcd']
  ss 'abcd', 'cd', ['abcd']
  ss 'abcd', 'a', 'b', 'c', 'd', ['abcd']
  ss 'abcd', 'a', 'b', 'c', 'd', 'ab', 'cd', 'bcd', ['abcd']

  %w[ABRAC ACADA ADABR DABRA RACAD].permutation.each do |permutation|
    ss *permutation, %w[RACADABRAC ADABRACADA]
  end

  %w[fegiach bfgiak hfdegi iakhfd fgiakhg].permutation.each do |permutation|
    ss *permutation, %w[bfgiakhgiakhfdegifegiach
                        bfgiakhgfegiachiakhfdegi
                        iakhfdegibfgiakhgfegiach
                        iakhfdegibfgiakhgfegiach
                        fegiachiakhfdegibfgiakhg
                        fegiachbfgiakhgiakhfdegi
                        iakhfdegifegiachbfgiakhg]
  end
end