
2023-09-11 22:37:50 作者:喂,伱的貞操丟了


I've been researching on finding an efficient solution to this. I've looked into diffing engines (google's diff-match-patch, python's diff) and some some longest common chain algorithms.


I was hoping on getting you guys suggestions on how to solve this issue. Any algorithm or library in particular you would like to recommend?




I don't know what "longest common [[chain? substring?]]" has to do with "percent difference", especially after seeing in a comment that you expect a very small % difference between two strings that differ by one character in the middle (so their longest common substring is about one half of the strings' length).

忽略了时间最长的共同的陌生感和定义百分比差异为由最大长度划分的字符串(时间,当然100 ;-),怎么样的编辑距离:

Ignoring the "longest common" strangeness, and defining "percent difference" as the edit distance between the strings divided by the max length (times 100 of course;-), what about:

def levenshtein_distance(first, second):
    """Find the Levenshtein distance between two strings."""
    if len(first) > len(second):
        first, second = second, first
    if len(second) == 0:
        return len(first)
    first_length = len(first) + 1
    second_length = len(second) + 1
    distance_matrix = [[0] * second_length for x in range(first_length)]
    for i in range(first_length):
       distance_matrix[i][0] = i
    for j in range(second_length):
    for i in xrange(1, first_length):
        for j in range(1, second_length):
            deletion = distance_matrix[i-1][j] + 1
            insertion = distance_matrix[i][j-1] + 1
            substitution = distance_matrix[i-1][j-1]
            if first[i-1] != second[j-1]:
                substitution += 1
            distance_matrix[i][j] = min(insertion, deletion, substitution)
    return distance_matrix[first_length-1][second_length-1]

def percent_diff(first, second):
    return 100*levenshtein_distance(a, b) / float(max(len(a), len(b)))

a = "the quick brown fox"
b = "the quick vrown fox"
print '%.2f' % percent_diff(a, b)


The Levenshtein function is from Stavros' blog. The result in this case would be 5.26 (percent difference).