下面我的问题是NP完全问题,但是,我试图找到至少一个稍快字符串搜索功能或模块可能会减少一些计算时间相比,它是现在帮助。任何建议将是AP preciated。
My problem below is NP-complete, however, I'm trying to find at least a marginally faster string search function or module that might help in reducing some of the computation time compared to where it is at now. Any suggestions would be appreciated.
级联(最长的)超弦理论是:
The concatenated (longest possible) superstring is:
AGGAGTCCGCGTGAGGGAGGTGTAGTGTAGTGG
AGGAGTCCGCGTGAGGGAGGTGTAGTGTAGTGG
下面code产生最短的超弦理论中16米:
The below code produces the shortest superstring in 16m:
CCGTAGGTGGAGT
CCGTAGGTGGAGT
import itertools as it
def main():
seqs = ['AGG', 'AGT', 'CCG', 'CGT', 'GAG', 'GGA', 'GGT', 'GTA', 'GTG', 'TAG', 'TGG']
seq_perms = [''.join(perm) for perm in it.permutations(seqs)]
for i in range(0, len(''.join(seqs))):
seq_perms = [''.join(perm)[:i] for perm in it.permutations(seqs)]
for perm in seq_perms:
if all(perm.find(seq) != -1 for seq in seqs) == True:
print 'Shortest superstring containing all strings:\n{}'.format(perm)
return
if __name__ == '__main__':
main()
在更少的时间在我的系统完成任何重构将被标记解决。
Any refactoring that completes in less time on my system will be marked solved.
@InbarRose code采用19S我的机器上。到目前为止最快的。
@InbarRose code takes 19s on my machine. The quickest so far.
这应该这样做。
import itertools as it
SEQUENCES = ['AGG', 'AGT', 'CCG', 'CGT', 'GAG', 'GGA', 'GGT', 'GTA', 'GTG', 'TAG', 'TGG']
LONGEST_SUPERSTRING = ''.join(SEQUENCES)
def find_shortest_superstring():
current_shortest = LONGEST_SUPERSTRING
trim = len(current_shortest)-1
seen_prefixes = set()
for perm in it.permutations(SEQUENCES):
candidate_string = ''.join(perm)[:trim]
if candidate_string in seen_prefixes:
continue
seen_prefixes.add(candidate_string)
while is_superstring(candidate_string):
current_shortest = candidate_string
candidate_string = candidate_string[:-1]
trim = len(current_shortest)-1
return current_shortest
def is_superstring(s):
return all(seq in s for seq in SEQUENCES)
def main():
print 'Searching for shortest superstring containing all strings.'
ss = find_shortest_superstring()
print 'Found shortest superstring containing all strings:\n{}'.format(ss)
if __name__ == '__main__':
main()
在code大约需要15秒的运行,并产生以下输出:
The code takes about 15 seconds to run and produces the following output:
Searching for shortest superstring containing all strings.
Found shortest superstring containing all strings:
CCGTAGGTGGAGT