我陷入了令人惊讶的问题。
I fall into a surprising issue.
我装在我的应用程序中的文本文件,我有一些逻辑,比较有μ值。
I loaded a text file in my application and I have some logic which compares the value having µ.
和我意识到,即使文本具有相同的比较值是假的。
And I realized that even if the texts are same the compare value is false.
Console.WriteLine("μ".Equals("µ")); // returns false
Console.WriteLine("µ".Equals("µ")); // return true
在以后行的字符μ的复制粘贴。
In later line the character µ is copy pasted.
不过,这些可能不是都是这样的唯一人物。
However, these might not be the only characters that are like this.
有没有办法在C#中,比较看起来一样,但实际上是不同的?
Is there any way in C# to compare the characters which look the same but are actually different?
在很多情况下,你可以正常化两者的统一code字一定规范化的形式比较之前,他们应该能够匹配。当然,这需要使用规范化形式取决于角色本身;只是因为他们的看的相似并不一定意味着他们重新present相同的字符。您还需要考虑它是否适合你的使用情况 - 见尤卡K. Korpela的评论
In many cases, you can normalize both of the Unicode characters to a certain normalization form before comparing them, and they should be able to match. Of course, which normalization form you need to use depends on the characters themselves; just because they look alike doesn't necessarily mean they represent the same character. You also need to consider if it's appropriate for your use case — see Jukka K. Korpela's comment.
有关这种特殊情况下,如果你指的是链接在Tony's回答,你会看到,该表为 U + 00B5 说:
For this particular situation, if you refer to the links in Tony's answer, you'll see that the table for U+00B5 says:
&分解其中,COMPAT>希腊小写字母MU(U + 03BC)
Decomposition <compat> GREEK SMALL LETTER MU (U+03BC)
这意味着U + 00B5,在原来的比较第二个字符,可以分解为U + 03BC,第一个字符。
This means U+00B5, the second character in your original comparison, can be decomposed to U+03BC, the first character.
所以,你会使用正常化完全兼容分解的特点,与正常化形成KC或KD。下面是一个简单的例子,我写了证明:
So you'll normalize the characters using full compatibility decomposition, with the normalization forms KC or KD. Here's a quick example I wrote up to demonstrate:
using System;
using System.Text;
class Program
{
static void Main(string[] args)
{
char first = 'μ';
char second = 'µ';
// Technically you only need to normalize U+00B5 to obtain U+03BC, but
// if you're unsure which character is which, you can safely normalize both
string firstNormalized = first.ToString().Normalize(NormalizationForm.FormKD);
string secondNormalized = second.ToString().Normalize(NormalizationForm.FormKD);
Console.WriteLine(first.Equals(second)); // False
Console.WriteLine(firstNormalized.Equals(secondNormalized)); // True
}
}
有关统一code正常化的细节和不同的标准化形式是指System.Text.NormalizationForm和的统一code规格。
For details on Unicode normalization and the different normalization forms refer to System.Text.NormalizationForm
and the Unicode spec.