找出一组在Python的最小汉明距离最小、距离、汉明、Python

2023-09-11 02:32:20 作者:╰﹀ 花逝╰

予有存储在列表反式一组n个(〜1000000)字符串(DNA序列)。我一定要找到列表中的所有序列的最小汉明距离。我实现了一个幼稚的蛮力算法,它已经运行了超过​​一天,还没有给出解决办法。我的code是

  DMIN = LEN(反[0])
对我的xrange(LEN(反)):
    对于j中的xrange第(i + 1,LEN(反式)):
            DIST = hamdist(反式[I] [: -  1],反式[J] [: -  1])
            如果DIST< DMIN:
                    DMIN = DIST
 

有没有更有效的方法来做到这一点?这里hamdist是一个函数我写信给找汉明距离。这是

 高清hamdist(STR1,STR2):
diff文件= 0
如果len(STR1)= LEN(STR2)!
  返回最大(LEN(STR1),LEN(STR2))
对于CH1,CH2拉链(STR1,STR2):
  如果CH1 = CH2!
      差异列表+ = 1
返回的diff
 

解决方案 LeetCode 461. 汉明距离

您可以通过添加包含你走到这一步的最小距离的可选参数优化 hamdist 功能,这样,如果的diff 达到该值停止计算距离,因为这种比较会给你比最小一个更大的距离:

 高清hamdist(STR1,STR2,prevMin =无):
diff文件= 0
如果len(STR1)= LEN(STR2)!
  返回最大(LEN(STR1),LEN(STR2))
对于CH1,CH2拉链(STR1,STR2):
  如果CH1 = CH2!
      差异列表+ = 1
      如果prevMin不无和diff文件> prevMin:
          返回None
返回的diff
 

您需要调整您的主回路与 hamdist 返回值来工作:

  DMIN = LEN(反[0])
对我的xrange(LEN(反)):
    对于j中的xrange第(i + 1,LEN(反式)):
            DIST = hamdist(反式[I] [: -  1],反式[J] [: -  1])
            如果DIST不无和DIST< DMIN:
                    DMIN = DIST
 

I have a set of n (~1000000) strings (DNA sequences) stored in a list trans. I have to find the minimum hamming distance of all sequences in the list. I implemented a naive brute force algorithm, which has been running for more than a day and has not yet given a solution. My code is

dmin=len(trans[0])
for i in xrange(len(trans)):
    for j in xrange(i+1,len(trans)):
            dist=hamdist(trans[i][:-1], trans[j][:-1])
            if dist < dmin:
                    dmin = dist

Is there a more efficient method to do this? Here hamdist is a function I wrote to find hamming distances. It is

def hamdist(str1, str2):
diffs = 0
if len(str1) != len(str2):
  return max(len(str1),len(str2))
for ch1, ch2 in zip(str1, str2):
  if ch1 != ch2:
      diffs += 1
return diffs

解决方案

You could optimize your hamdist function by adding an optional parameter containing the minimum distance you have got so far, this way if diffs reaches that value you stop calculating the distance because this comparison will give you a greater distance than the minimum:

def hamdist(str1, str2,prevMin=None):
diffs = 0
if len(str1) != len(str2):
  return max(len(str1),len(str2))
for ch1, ch2 in zip(str1, str2):
  if ch1 != ch2:
      diffs += 1
      if prevMin is not None and diffs>prevMin:
          return None
return diffs

You will need to adapt your main loop to work with None return value from hamdist:

dmin=len(trans[0])
for i in xrange(len(trans)):
    for j in xrange(i+1,len(trans)):
            dist=hamdist(trans[i][:-1], trans[j][:-1])
            if dist is not None and dist < dmin:
                    dmin = dist