要找到所有的重复子给定的字符串中有的、要找、字符串

2023-09-12 21:19:37 作者:董先生、

我recetly遇到的面试问题: 为了找到一个给定字符串为2的最小尺寸在所有的重复子串。 该算法应该是高效的。

I recetly come across an interview question : To find all the repeating substring in a given string with a minimal size of 2. The algorithm should be efficient one.

code为以上问题给出如下,但它是没有效率的。

Code for above question is given below but it isn't efficient one.

#include <iostream>
#include <algorithm>
#include <iterator>
#include <set>
#include <string>

using namespace std;

int main()
{
    typedef string::const_iterator iterator;
    string s("ABCFABHYIFAB");
    set<string> found;

    if (2 < s.size())
        for (iterator i = s.begin() + 1, j = s.end(); i != j; ++i)
            for (iterator x = s.begin(); x != i; ++x)
            {
                iterator tmp = mismatch(i, j, x).second;;
                if (tmp - x > 1)
                    found.insert(string(x, tmp));
            }

            copy(found.begin(), found.end(),ostream_iterator<string>(cout, "\n"));
}

我的问题是,是否有可以实现的时间上面的问题的任何数据结构 复杂性为O(N)?

My question is that, is there any data structure which can implement above question in time complexity of O(N)?

如果你的答案是后缀树或散列请详细说明了。

If your answer is Suffix tree or Hashing please elaborate it.

推荐答案

如果你分析输出字符串AAAAAAAAAAAAAA,则有O(N²)字符它,因此算法至少O(N²)。

If you analyze the output for the string "AAAAAAAAAAAAAA", then there are O(n²) characters in it, so the algorithm is at least O(n²).

要实现O(N²),只是建立在后缀树获得第每个后缀(指数为[1..N],[第2..N],[3..n],...,[n..n])。这不要紧,如果其中一个字符串没有自己的终端节点,算了算多久每个节点使用。

To achieve O(n²), just build the suffix tree for each suffix of s (indices [1..n], [2..n], [3..n], ..., [n..n]). It doesn't matter if one of the strings has no own end node, just count how often each node is used.

最后,遍历每个节点计数> 1,并打印其路径。

At the end, iterate over each node with count>1 and print its path.

 
精彩推荐
图片推荐