DOTNET的SOUNDEX函数函数、DOTNET、SOUNDEX

2023-09-04 01:23:47 作者:扑通扑通少女心

我有一个数据库表中,有SQLServer的探测法的列恩codeD姓+名。 在我的C#程序,我想转换使用同音用在我的查询字符串。

I have a database table that has a column of SQLServer Soundex encoded last name + first name. In my C# program I would like to convert a string using soundex for use in my query.

有没有一个标准的字符串函数的soundex在DOTNET库或者是一个开源库,实现它(也许是作为字符串扩展方法)?

Is there either a standard string function for soundex in the dotnet library or is the an open source library that implements it (perhaps as an extension method on string)?

推荐答案

我知道这是晚了,但我还需要类似的东西(虽然没有数据库参与),和唯一的答案是不准确的(失败'Tymczak和菲斯特)。

I know this is late, but I also needed something similar (though no database involved), and the only answer isn't accurate (fails for 'Tymczak' and 'Pfister').

这就是我想出了:

class Program
{
    public static void Main(string[] args)
    {
                Assert.AreEqual(Soundex.Generate("H"), "H000");
                Assert.AreEqual(Soundex.Generate("Robert"), "R163");
                Assert.AreEqual(Soundex.Generate("Rupert"), "R163");
                Assert.AreEqual(Soundex.Generate("Rubin"), "R150");
                Assert.AreEqual(Soundex.Generate("Ashcraft"), "A261");
                Assert.AreEqual(Soundex.Generate("Ashcroft"), "A261");
                Assert.AreEqual(Soundex.Generate("Tymczak"), "T522");
                Assert.AreEqual(Soundex.Generate("Pfister"), "P236");
                Assert.AreEqual(Soundex.Generate("Gutierrez"), "G362");
                Assert.AreEqual(Soundex.Generate("Jackson"), "J250");
                Assert.AreEqual(Soundex.Generate("VanDeusen"), "V532");
                Assert.AreEqual(Soundex.Generate("Deusen"), "D250");
                Assert.AreEqual(Soundex.Generate("Sword"), "S630");
                Assert.AreEqual(Soundex.Generate("Sord"), "S630");
                Assert.AreEqual(Soundex.Generate("Log-out"), "L230");
                Assert.AreEqual(Soundex.Generate("Logout"), "L230");
                Assert.AreEqual(Soundex.Generate("123"), Soundex.Empty);
                Assert.AreEqual(Soundex.Generate(""), Soundex.Empty);
                Assert.AreEqual(Soundex.Generate(null), Soundex.Empty);
    }
}

public static class Soundex
{
    public const string Empty = "0000";

    private static readonly Regex Sanitiser = new Regex(@"[^A-Z]", RegexOptions.Compiled);
    private static readonly Regex CollapseRepeatedNumbers = new Regex(@"(\d)?\1*[WH]*\1*", RegexOptions.Compiled);
    private static readonly Regex RemoveVowelSounds = new Regex(@"[AEIOUY]", RegexOptions.Compiled);

    public static string Generate(string Phrase)
    {
        // Remove non-alphas
        Phrase = Sanitiser.Replace((Phrase ?? string.Empty).ToUpper(), string.Empty);

        // Nothing to soundex, return empty
        if (string.IsNullOrEmpty(Phrase))
            return Empty;

        // Convert consonants to numerical representation
        var Numified = Numify(Phrase);

        // Remove repeated numberics (characters of the same sound class), even if separated by H or W
        Numified = CollapseRepeatedNumbers.Replace(Numified, @"$1");

        if (Numified.Length > 0 && Numified[0] == Numify(Phrase[0]))
        {
            // Remove first numeric as first letter in same class as subsequent letters
            Numified = Numified.Substring(1);
        }

        // Remove vowels
        Numified = RemoveVowelSounds.Replace(Numified, string.Empty);

        // Concatenate, pad and trim to ensure X### format.
        return string.Format("{0}{1}", Phrase[0], Numified).PadRight(4, '0').Substring(0, 4);
    }

    private static string Numify(string Phrase)
    {
        return new string(Phrase.ToCharArray().Select(Numify).ToArray());
    }

    private static char Numify(char Character)
    {
        switch (Character)
        {
            case 'B': case 'F': case 'P': case 'V':
                return '1';
            case 'C': case 'G': case 'J': case 'K': case 'Q': case 'S': case 'X': case 'Z':
                return '2';
            case 'D': case 'T':
                return '3';
            case 'L':
                return '4';
            case 'M': case 'N':
                return '5';
            case 'R':
                return '6';
            default:
                return Character;
        }
    }
}