找到最匹配输入字符串的字符串列表字符串、列表

2023-09-04 08:56:29 作者:遗落旧时光

我有问题,找到最接近的匹配字符串的.NET实施

I have problems finding an implementation of closest match strings for .net

我想匹配字符串列表,例如:

I would like to match a list of strings, example:

输入字符串:PublicznaSzkołaPodstawowa IMBolesławaChrobregoW¯¯Wąsoszu

input string: "Publiczna Szkoła Podstawowa im. Bolesława Chrobrego w Wąsoszu"

字符串列表:

PublicznaSzkołaPodstawowa IM。 B. ChrobregoW¯¯Wąsoszu

Publiczna Szkoła Podstawowa im. B. Chrobrego w Wąsoszu

SzkołaPodstawowa Specjalna

Szkoła Podstawowa Specjalna

SzkołaPodstawowa im.Henryka SienkiewiczaW¯¯Wąsoszu

Szkoła Podstawowa im.Henryka Sienkiewicza w Wąsoszu

SzkołaPodstawowa IM。 Romualda TrauguttaW¯¯WąsoszuGórnym

Szkoła Podstawowa im. Romualda Traugutta w Wąsoszu Górnym

这显然需要与匹配PublicznaSzkołaPodstawowa IM。B. ChrobregoW¯¯Wąsoszu。

This would clearly need to be matched with "Publiczna Szkoła Podstawowa im. B. Chrobrego w Wąsoszu".

是什么算法,有可用于.NET?

What algorithms are there available for .net?

推荐答案

编辑距离

编辑距离是如何量化不同的两个字符串的方法   (例如,字)是彼此通过计数的最小数量   所需的操作变换一根弦到另一个。

Edit distance is a way of quantifying how dissimilar two strings (e.g., words) are to one another by counting the minimum number of operations required to transform one string into the other.

Levenshtein距离

通俗地说,这两个词之间的莱文斯坦距离最小   的单字符编辑数(即插入,缺失或   取代)需要改变一个字成其他

Informally, the Levenshtein distance between two words is the minimum number of single-character edits (i.e. insertions, deletions or substitutions) required to change one word into the other.

快速,高效存储Levenshtein算法

C#莱文斯坦

using System;

/// <summary>
/// Contains approximate string matching
/// </summary>
static class LevenshteinDistance
{
    /// <summary>
    /// Compute the distance between two strings.
    /// </summary>
    public static int Compute(string s, string t)
    {
    int n = s.Length;
    int m = t.Length;
    int[,] d = new int[n + 1, m + 1];

    // Step 1
    if (n == 0)
    {
        return m;
    }

    if (m == 0)
    {
        return n;
    }

    // Step 2
    for (int i = 0; i <= n; d[i, 0] = i++)
    {
    }

    for (int j = 0; j <= m; d[0, j] = j++)
    {
    }

    // Step 3
    for (int i = 1; i <= n; i++)
    {
        //Step 4
        for (int j = 1; j <= m; j++)
        {
        // Step 5
        int cost = (t[j - 1] == s[i - 1]) ? 0 : 1;

        // Step 6
        d[i, j] = Math.Min(
            Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1),
            d[i - 1, j - 1] + cost);
        }
    }
    // Step 7
    return d[n, m];
    }
}

class Program
{
    static void Main()
    {
    Console.WriteLine(LevenshteinDistance.Compute("aunt", "ant"));
    Console.WriteLine(LevenshteinDistance.Compute("Sam", "Samantha"));
    Console.WriteLine(LevenshteinDistance.Compute("flomax", "volmax"));
    }
}