从一个字符串中删除指定的字符 - 有效的方法(时间和空间复杂度)复杂度、字符串、字符、有效

2023-09-11 06:44:29 作者:唯ぃ愛

下面的问题是:从给定的字符串中删除指定的字符

Here is the problem: Remove specified characters from a given string.

Input: The string is "Hello World!" and characters to be deleted are "lor"
Output: "He Wd!"

求解这涉及两个子部分:

Solving this involves two sub-parts:

确定是否给定的字符要被删除的 如果是这样,那么删除字符

要解决的第一部分,我读的字符将被删除成的std :: unordered_map ,即我解析字符串LOR,并插入每个字符变成HashMap中。后来,当我分析主要的字符串,我会考虑这个HashMap的每个字符作为关键,如果返回值不为零,那么我删除字符串中的字符。

To solve the first part, I am reading the characters to be deleted into a std::unordered_map, i.e. I parse the string "lor" and insert each character into the hashmap. Later, when I am parsing the main string, I will look into this hashmap with each character as the key and if the returned value is non-zero, then I delete the character from the string.

问题1:这是最好的办法

问2:这将是对这个问题的更好? 的std ::地图的std :: unordered_map ?由于我在订货不感兴趣,我用了一个 unordered_map 。但有创建的哈希表更高的开销?如何才能在这样的情况呢?使用地图(平衡树)或者 unordered_map (哈希表)?

Question 2: Which would be better for this problem? std::map or std::unordered_map? Since I am not interested in ordering, I used an unordered_map. But is there a higher overhead for creating the hash table? What to do in such situations? Use a map (balanced tree) or a unordered_map (hash table)?

现在来的下一部分,即从字符串中删除的字符。一种方法是将删除的字符和一个位置从该点移动数据上,后退。在最坏的情况下,我们必须删除所有的字符,这将需要为O(n ^ 2)。

Now coming to the next part, i.e. deleting the characters from the string. One approach is to delete the character and shift the data from that point on, back by one position. In the worst case, where we have to delete all the characters, this would take O(n^2).

第二种方法是仅所需字符复制到另一缓冲器。这将涉及到分配足够的存储空间来保存原始字符串并复制了每个字符都离开了那些将被删除。虽然这需要额外的内存,这将是一个O(n)的操作。

The second approach would be to copy only the required characters to another buffer. This would involve allocating enough memory to hold the original string and copy over character by character leaving out the ones that are to be deleted. Although this requires additional memory, this would be a O(n) operation.

第三种方法,将开始阅读,并从第0位写,递增源指针时,每次我读并增加目标指针只有当我写的。由于源指针永远是相同或提前目标指针,我可以写在同一个缓冲区。这样可以节省内存,也为O(n)操作。我做同样的,并呼吁调整到底要删除其他不必要的字符?

The third approach, would be to start reading and writing from the 0th position, increment the source pointer when every time I read and increment the destination pointer only when I write. Since source pointer will always be same or ahead of destination pointer, I can write over the same buffer. This saves memory and is also an O(n) operation. I am doing the same and calling resize in the end to remove the additional unnecessary characters?

下面是我写的函数:

// str contains the string (Hello World!)
// chars contains the characters to be deleted (lor)
void remove_chars(string& str, const string& chars)
{
    unordered_map<char, int> chars_map;

    for(string::size_type i = 0; i < chars.size(); ++i)
        chars_map[chars[i]] = 1;

    string::size_type i = 0; // source
    string::size_type j = 0; // destination
    while(i < str.size())
    {
        if(chars_map[str[i]] != 0)
            ++i;
        else
        {
            str[j] = str[i];
            ++i;
            ++j;
        }
    }

    str.resize(j);
}

问题3:有哪些不同的方法,使我可以改进这个功能。或者,这是最好的,我们能做些什么?

Question 3: What are the different ways by which I can improve this function. Or is this best we can do?

谢谢!

推荐答案

假设你正在学习的算法,并在图书馆解决方案并不感兴趣:

Assuming that you're studying algorithms, and not interested in library solutions:

哈希表是最有价值的,当可能的密钥数量很大,但你只需要存储其中的几个。你的哈希表才有意义,如果你是从数字序列中删除特定的32位整数。但是,使用ASCII字符,这是矫枉过正。

Hash tables are most valuable when the number of possible keys is large, but you only need to store a few of them. Your hash table would make sense if you were deleting specific 32-bit integers from digit sequences. But with ASCII characters, it's overkill.

只要256布尔变量数组,并设置一个标志要删除的字符。它只使用每输入一个字符一次查表指令。哈希映射涉及至少一些更多的指令来计算哈希函数。空间的角度来看,他们很可能是没有更加紧凑,一旦你添加了所有的辅助数据。

Just make an array of 256 bools and set a flag for the characters you want to delete. It only uses one table lookup instruction per input character. Hash map involves at least a few more instructions to compute the hash function. Space-wise, they are probably no more compact once you add up all the auxiliary data.

void remove_chars(string& str, const string& chars)
{
    // set up the look-up table
    std::vector<bool> discard(256, false);
    for (int i = 0; i < chars.size(); ++i)
    {
        discard[chars[i]] = true;
    }

    for (int j = 0; j < str.size(); ++j)
    {
        if (discard[str[j]])
        {
            // do something, depending on your storage choice
        }
    }
}

对于您的存储选择:选择2和3之间的选择取决于你是否需要preserve输入数据或没有。 3显然是最有效的,但你不要总想就地过程。

Regarding your storage choices: Choose between options 2 and 3 depending on whether you need to preserve the input data or not. 3 is obviously most efficient, but you don't always want an in-place procedure.