如何从一个字符串.NET删除变音符号(重音)?重音、字符串、符号、NET

2023-09-02 01:16:23 作者::華麗旳謝幕

我试图将一些字符串在加拿大法语,基本上,我想能够拿出法国重音符号的字母,同时保持这封信。 (例如转换电子电子,所以焦糖布丁会成为焦糖布丁

I'm trying to convert some strings that are in French Canadian and basically, I'd like to be able to take out the French accent marks in the letters while keeping the letter. (E.g. convert é to e, so crème brûlée would become creme brulee)

什么是最好的方式实现这一目标?

What is the best method for achieving this?

推荐答案

我没有用这种方法,但迈克尔·卡普兰描述了一种在他的博客中这样做(有一个令人困惑的标题),讨论剥离附加符号: 剥离是一个很有意思的工作(又名 在无意义的意义,又名所有 锰字符非间距,但 有些更非间距比 其他)

I've not used this method, but Michael Kaplan describes a method for doing so in his blog post (with a confusing title) that talks about stripping diacritics: Stripping is an interesting job (aka On the meaning of meaningless, aka All Mn characters are non-spacing, but some are more non-spacing than others)

static string RemoveDiacritics(string text) 
{
    var normalizedString = text.Normalize(NormalizationForm.FormD);
    var stringBuilder = new StringBuilder();

    foreach (var c in normalizedString)
    {
        var unicodeCategory = CharUnicodeInfo.GetUnicodeCategory(c);
        if (unicodeCategory != UnicodeCategory.NonSpacingMark)
        {
            stringBuilder.Append(c);
        }
    }

    return stringBuilder.ToString().Normalize(NormalizationForm.FormC);
}

请注意,这是一个后续对他早期的岗位:剥离附加符号....

Note that this is a followup to his earlier post: Stripping diacritics....

该方法使用 String.Normalize 拆分输入字符串组成字形(基本上分离基地字的变音符号),然后扫描结果,只保留基本特征。这只是一个有点复杂,但真的是你正在寻找一个复杂的问题。

The approach uses String.Normalize to split the input string into constituent glyphs (basically separating the "base" characters from the diacritics) and then scans the result and retains only the base characters. It's just a little complicated, but really you're looking at a complicated problem.

当然,如果你限制自己的法国,你很可能逃脱的How除去口音和波​​浪线在C ++的std :: string的的建议,@大卫Dibben。

Of course, if you're limiting yourself to French, you could probably get away with the simple table-based approach in How to remove accents and tilde in a C++ std::string, as recommended by @David Dibben.