博耶 - 穆尔实用的C#?穆尔、博耶

2023-09-03 08:52:01 作者:夏日倾情

博耶 - 穆尔可能是已知最快的非索引文本搜索算法。所以我实现它在C#中为我的黑带codeR 的网站。

Boyer-Moore is probably the fastest non-indexed text-search algorithm known. So I'm implementing it in C# for my Black Belt Coder website.

我有它的工作,它表明大约比 String.IndexOf预期的性能改进()。然而,当我加了 StringComparison.Ordinal 参数的IndexOf ,它开始超越我的博耶 - 穆尔的实现。有时,通过相当量的

I had it working and it showed roughly the expected performance improvements compared to String.IndexOf(). However, when I added the StringComparison.Ordinal argument to IndexOf, it started outperforming my Boyer-Moore implementation. Sometimes, by a considerable amount.

我不知道是否有人能帮助我找出原因。我明白了为什么 StringComparision.Ordinal 可能会加快速度,但它怎么可能比博耶摩尔定律更快?是不是因为的.NET平台本身的开销,也许是因为数组索引必须经过验证,以确保他们在范围内,或别的东西完全。有一些算法只是在C#.NET不实用?

I wonder if anyone can help me figure out why. I understand why StringComparision.Ordinal might speed things up, but how could it be faster than Boyer-Moore? Is it because of the the overhead of the .NET platform itself, perhaps because array indexes must be validated to ensure they're in range, or something else altogether. Are some algorithms just not practical in C#.NET?

下面是关键code。

// Base for search classes
abstract class SearchBase
{
    public const int InvalidIndex = -1;
    protected string _pattern;
    public SearchBase(string pattern) { _pattern = pattern; }
    public abstract int Search(string text, int startIndex);
    public int Search(string text) { return Search(text, 0); }
}

/// <summary>
/// A simplified Boyer-Moore implementation.
/// 
/// Note: Uses a single skip array, which uses more memory than needed and
/// may not be large enough. Will be replaced with multi-stage table.
/// </summary>
class BoyerMoore2 : SearchBase
{
    private byte[] _skipArray;

    public BoyerMoore2(string pattern)
        : base(pattern)
    {
        // TODO: To be replaced with multi-stage table
        _skipArray = new byte[0x10000];

        for (int i = 0; i < _skipArray.Length; i++)
            _skipArray[i] = (byte)_pattern.Length;
        for (int i = 0; i < _pattern.Length - 1; i++)
            _skipArray[_pattern[i]] = (byte)(_pattern.Length - i - 1);
    }

    public override int Search(string text, int startIndex)
    {
        int i = startIndex;

        // Loop while there's still room for search term
        while (i <= (text.Length - _pattern.Length))
        {
            // Look if we have a match at this position
            int j = _pattern.Length - 1;
            while (j >= 0 && _pattern[j] == text[i + j])
                j--;

            if (j < 0)
            {
                // Match found
                return i;
            }

            // Advance to next comparision
            i += Math.Max(_skipArray[text[i + j]] - _pattern.Length + 1 + j, 1);
        }
        // No match found
        return InvalidIndex;
    }
}

编辑:我已为我所有的测试$ C $对此事的http://www.blackbelt$c$cr.com/Articles/algorithms/fast-text-search-with-boyer-moore.

I've posted all my test code and conclusions on the matter at http://www.blackbeltcoder.com/Articles/algorithms/fast-text-search-with-boyer-moore.

推荐答案

根据我自己的测试,并在此提出的意见,我已经得出结论,其原因 String.IndexOf()执行这么好 StringComparision.Ordinal 是因为该方法调用非托管code,它很可能采用手工优化的汇编语言。

Based on my own tests and the comments made here, I've concluded that the reason String.IndexOf() performs so well with StringComparision.Ordinal is because the method calls into unmanaged code that likely employs hand-optimized assembly language.

我已经跑了许多不同的测试和 String.IndexOf()只是似乎比任何东西,我可以实现使用托管的C#code更快。

I've ran a number of different tests and String.IndexOf() just seems to be faster than anything I can implement using managed C# code.

如果任何人的兴趣,我已经写了一切,我已经发现了这件事,张贴博耶 - 穆尔算法的一些变化在C#中的的http://www.blackbelt$c$cr.com/Articles/algorithms/fast-text-search-with-boyer-moore.

If anyone's interested, I've written everything I've discovered about this and posted several variations of the Boyer-Moore algorithm in C# at http://www.blackbeltcoder.com/Articles/algorithms/fast-text-search-with-boyer-moore.