删除子串递归的事件再度发生递归、发生、事件

2023-09-11 02:28:57 作者:深藏于心i

下面一个问题:

由于字符串A和字符串B,删除子B的第一次出现在字符串中的,直到它是可以这样做。需要注意的是删除一个子,还可以创建一个新的相同的字符串。例除去hehelllloworld'一次'地狱'将产生的HelloWorld,这之后,除去一次将成为oworld,所需的字符串。

Given string A and a substring B, remove the first occurence of substring B in string A till it is possible to do so. Note that removing a substring, can further create a new same substring. Ex. removing 'hell' from 'hehelllloworld' once would yield 'helloworld' which after removing once more would become 'oworld', the desired string.

写一个程序上的长度为10 ^ 6 A,和长度为100的B输入约束。

Write a program for the above for input constraints of length 10^6 for A, and length 100 for B.

这个问题是问我在接受采访时,我给了他们一个简单的算法来解决它,这是做什么的声明是和iteratievly删除它(以decresae在头上电话),我后来才知道有一个因为它是更好的解决方案的速度更快那会是什么?我已经想到了一些优化,但它仍然不是一样快,最快的SOLN的问题(根据公司),所以任何人都可以告诉一个更快的方法我解决这个问题呢?

This question was asked to me in an interview, I gave them a simple algorithm to solve it that was to do exactly what the statement was and remove it iteratievly(to decresae over head calls), I later came to know there's a better solution for it that's much faster what would it be ? I've thought of a few optimizations but it's still not as fast as the fastest soln for the problem(acc. the company), so can anyone tell me of a faster way to solve the problem ?

PS>我知道计算器规则和具有code是更好的,但对于这个问题,我不认为有code会以任何方式benificial ...

P.S> I know of stackoverflow rules and that having code is better, but for this problem, I dont think that having code would be in any way benificial...

推荐答案

您的做法有pretty的不好的复杂性。在一个非常不好的情况下,字符串 A aaaaaaaaabbbbbbbbb 和字符串 B AB ,在这种情况下,您将需要 O(| A |)搜索,每个以 O(| A | + | B |)(使用一些先进的搜索算法假设),从而 0的总复杂(| A | ^ 2 + | A | * | B |),这与它们的约束是年

Your approach has a pretty bad complexity. In a very bad case the string a will be aaaaaaaaabbbbbbbbb, and the string b will be ab, in which case you will need O(|a|) searches, each taking O(|a| + |b|) (assuming using some sophisticated search algorithm), resulting in a total complexity of O(|a|^2 + |a| * |b|), which with their constraints is years.

有关它们的约束一个很好的复杂性,务求将 O(| A | * | B |),也就是大约100万次,将在一秒内完成。下面是接近它的一种方式。对于字符串中的每个位置 A 让我们计算的最大长度 n_i ,使得 A [1 - n_i:我] = B [0:n_i] (换句话说,一个在该位置这是 B )一preFIX。 (| A | + | B |)使用的克努特 - 莫里斯 - 普拉特的算法。

For their constraints a good complexity to aim for would be O(|a| * |b|), which is around 100 million operations, will finish in subsecond. Here's one way to approach it. For each position i in the string a let's compute the largest length n_i, such that the a[i - n_i : i] = b[0 : n_i] (in other words, the longest suffix of a at that position which is a prefix of b). We can compute it in O(|a| + |b|) by using Knuth-Morris-Pratt algorithm.

在我们 n_i 计算,发现了第一次出现b A 是在发现第一个 n_i 等于 | B | 。这将是的事件之一b 在右端 A

After we have n_i computed, finding the first occurrence of b in a is just a matter of finding the first n_i that is equal to |b|. This will be the right end of one of the occurrences of b in a.

最后,我们需要修改克努特 - 莫里斯 - 普拉特略有下降。我们将在逻辑删除出现的 B 当我们计算一个 n_i 等于 | B | 。为了说明一个事实,即有些信件是从 A 删除,我们将依靠该克努特 - 莫里斯 - 普拉特只依赖于的最后一个值的事实n_i (以及计算了 B ),和目前的信 A ,所以我们只需要检索的 n_i 的最后一个值的一个快速方法后,我们在逻辑上删除的的出现b 。这可以用双端队列,存储 n_i 的所有有效的值来实现。每个值将被一次被推入双端队列一次,并从弹出,所以,保持它的复杂性是 O(| A |),而在Knuth-的复杂性莫里斯 - 普拉特是 O(| A | + | B |),致使 O(| A | + | B |)总复杂。

Finally, we will need to modify Knuth-Morris-Pratt slightly. We will be logically removing occurrences of b as soon as we compute an n_i that is equal to |b|. To account for the fact that some letters were removed from a we will rely on the fact that Knuth-Morris-Pratt only relies on the last value of n_i (and those computed for b), and the current letter of a, so we just need a fast way of retrieving the last value of n_i after we logically remove an occurrence of b. That can be done with a deque, that stores all the valid values of n_i. Each value will be pushed into the deque once, and popped from it once, so that complexity of maintaining it is O(|a|), while the complexity of the Knuth-Morris-Pratt is O(|a| + |b|), resulting in O(|a| + |b|) total complexity.

下面是一个C ++实现。它可能有一些关闭接一个错误,但它适用于你的样品,它飞了,我描述开头的最坏情况。

Here's a C++ implementation. It could have some off-by-one errors, but it works on your sample, and it flies for the worst case that I described at the beginning.

#include <deque>
#include <string>
#include <iostream>
#include <vector>
#include <algorithm>

using namespace std;

int main() {
    string a, b;
    cin >> a >> b;

    size_t blen = b.size();

    // make a = b$a
    a = b + "$" + a;

    vector<size_t> n(a.size()); // array for knuth-morris-pratt
    vector<bool> removals(a.size()); // positions of right ends at which we remove `b`s

    deque<size_t> lastN;
    n[0] = 0;

    // For the first blen + 1 iterations just do vanilla knuth-morris-pratt
    for (size_t i = 1; i < blen + 1; ++ i) {
        size_t z = n[i - 1];
        while (z && a[i] != a[z]) {
            z = n[z - 1];
        }
        if (a[i] != a[z]) n[i] = 0;
        else n[i] = z + 1;

        lastN.push_back(n[i]);
    }

    // For the remaining iterations some characters could have been logically
    //     removed from `a`, so use lastN to get last value of n instaed
    //     of actually getting it from `n[i - 1]`
    for (size_t i = blen + 1; i < a.size(); ++ i) {
        size_t z = lastN.back();
        while (z && a[i] != a[z]) {
            z = n[z - 1];
        }
        if (a[i] != a[z]) n[i] = 0;
        else n[i] = z + 1;

        if (n[i] == blen) // found a match
        {
            removals[i] = true;

            // kill last |b| - 1 `n_i`s
            for (size_t j = 0; j < blen - 1; ++ j) {
                lastN.pop_back();
            }
        }
        else {
            lastN.push_back(n[i]);
        }
    }

    string ret;
    size_t toRemove = 0;
    for (size_t pos = a.size() - 1; a[pos] != '$'; -- pos) {
        if (removals[pos]) toRemove += blen;
        if (toRemove) -- toRemove;
        else ret.push_back(a[pos]);
    }
    reverse(ret.begin(), ret.end());

    cout << ret << endl;

    return 0;
}

[in] hehelllloworld
[in] hell
[out] oworld

[in] abababc
[in] ababc
[out] ab

[in] caaaaa ... aaaaaabbbbbb ... bbbbc
[in] ab
[out] cc