寻找最长回文子序列使用较少的内存回文、较少、序列、最长

2023-09-11 22:49:14 作者:Love is you(爱的就是你)

我想从Cormem的介绍解决动态规划问题的算法第三版 (第405),它要求如下:

I am trying to solve a dynamic programming problem from Cormem's Introduction to Algorithms 3rd edition (pg 405) which asks the following:

一个回文是一个非空字符串结束   一些字母,上面写着相同   前进和后退。示例   回文是长度的所有字符串   1,的公民赛车 aibohphobia   (怕回文)。

A palindrome is a nonempty string over some alphabet that reads the same forward and backward. Examples of palindromes are all strings of length 1, civic, racecar, and aibohphobia (fear of palindromes).

给出一个有效的算法来找到   最长的回文这是一个   亚序列的给定输入串。   例如,给定的输入   字符,你的算法应该   返回 carac

Give an efficient algorithm to find the longest palindrome that is a subsequence of a given input string. For example, given the input character, your algorithm should return carac.

好了,我可以解决这个问题有两种方式:

Well, I could solve it in two ways:

解决方案一:

最长的回文子序列串(LPS)仅仅是本身最长公共子而它的反面。 (我已经解决,询问了最长递增子另一个相关的问题后,构建此解决方案序列)。 因为它只是一个LCS变种,它也需要O(N²)时间和O(N²)内存。

The Longest Palindrome Subsequence (LPS) of a string is simply the Longest Common Subsequence of itself and its reverse. (I've build this solution after solving another related question which asks for the Longest Increasing Subsequence of a sequence). Since it's simply a LCS variant, it also takes O(n²) time and O(n²) memory.

解决方法二:

第二种解决方案是一个比较详细阐述,同时也遵循一般的LCS模板。它来自于以下几个复发:

The second solution is a bit more elaborated, but also follows the general LCS template. It comes from the following recurrence:

lps(s[i..j]) = 
    s[i] + lps(s[i+1]..[j-1]) + s[j], if s[i] == s[j];
    max(lps(s[i+1..j]), lps(s[i..j-1])) otherwise

伪code用于计算LPS的长度如下:

The pseudocode for calculating the length of the lps is the following:

compute-lps(s, n):

    // palindromes with length 1
    for i = 1 to n:
        c[i, i] = 1
    // palindromes with length up to 2
    for i = 1 to n-1:
        c[i, i+1] = (s[i] == s[i+1]) ? 2 : 1

    // palindromes with length up to j+1
    for j = 2 to n-1:
        for i = 1 to n-i:
            if s[i] == s[i+j]:
                c[i, i+j] = 2 + c[i+1, i+j-1]
            else:
                c[i, i+j] = max( c[i+1, i+j] , c[i, i+j-1] )

它仍然需要O(N²)时间和内存,如果我想要有效的构建的脂多糖(因为我将需要对表中的所有单元格)。试析相关的问题,如LIS,它可以与方法比LCS-喜欢用更少的内存等来解决(LIS是可解为O(n)的内存),我想知道是否有可能为O(n)的内存,以解决这个问题,了。

It still takes O(n²) time and memory if I want to effectively construct the lps (because I 'll need all cells on the table). Analysing related problems, such as LIS, which can be solved with approaches other than LCS-like with less memory (LIS is solvable with O(n) memory), I was wondering if it's possible to solve it with O(n) memory, too.

LIS实现这一约束连接候选序列,但回文它很难,因为这里有什么重要的不是$ P子序列$ pvious元素,但第一个。有谁知道,如果可以做到这一点,或者是previous解决方案的内存优化?

LIS achieves this bound by linking the candidate subsequences, but with palindromes it's harder because what matters here is not the previous element in the subsequence, but the first. Does anyone know if is possible to do it, or are the previous solutions memory optimal?

推荐答案

下面是一个非常有效的内存版本。但我还没有证明它是的总是的 O(N)内存。 (使用preprocessing一步就可以比 0更好的(N 2 ) CPU,虽然为O(n < SUP> 2 )是最坏的情况。)

Here is a very memory efficient version. But I haven't demonstrated that it is always O(n) memory. (With a preprocessing step it can better than O(n2) CPU, though O(n2) is the worst case.)

开始。对于每个位置,跟踪最远的表的出点而可以产生反射长度1,2,3等的亚序列(意思是一个子序列,以我们的点左侧被反射到右边。)对于每个反射子序列我们存储指向的子序列的下一部分。

Start from the left-most position. For each position, keep track of a table of the farthest out points at which you can generate reflected subsequences of length 1, 2, 3, etc. (Meaning that a subsequence to the left of our point is reflected to the right.) For each reflected subsequence we store a pointer to the next part of the subsequence.

由于我们的工作我们的方式吧,我们从字符串到当前元素中的任何出现的位置RHS搜索,并尝试用这些比赛来提高我们previously有界限。当我们完成后,我们来看看最长镜像子,我们可以很容易地构建最佳的回文。

As we work our way right, we search from the RHS of the string to the position for any occurrences of the current element, and try to use those matches to improve the bounds we previously had. When we finish, we look at the longest mirrored subsequence and we can easily construct the best palindrome.

让我们看看本作字符

我们先从我们最好的回文是字母'C',而我们的镜像序列被达成了对(0,11)这是关闭的两端字符串。 在接下来的位置是1审议C我们在形式最好的镜像子序列(长度,结束,开始)现在 [( 0,11,0),(1,6,1)] 。 (我就离开了链表你需要生成实际找到回文。 下一步考虑 ^ h 在位置2,我们不改善边界 [(0,11,0),(1,6, 1)] 。 下一步考虑 A 在位置3,我们提高了边界,以 [(0,11,0),(1,6,1 ),(2,5,3)] 。 下一步考虑研究在位置4.完善界限为 [(0,11,0),(1,10,4 ),(2,5,3)] 。 (这是该链接的表将是有益的。 We start with our best palindrome being the letter 'c', and our mirrored subsequence being reached with the pair (0, 11) which are off the ends of the string. Next consider the 'c' at position 1. Our best mirrored subsequences in the form (length, end, start) are now [(0, 11, 0), (1, 6, 1)]. (I'll leave out the linked list you need to generate to actually find the palindrome. Next consider the h at position 2. We do not improve the bounds [(0, 11, 0), (1, 6, 1)]. Next consider the a at position 3. We improve the bounds to [(0, 11, 0), (1, 6, 1), (2, 5, 3)]. Next consider the r at position 4. We improve the bounds to [(0, 11, 0), (1, 10, 4), (2, 5, 3)]. (This is where the linked list would be useful.

通过列表的其余部分的工作,我们不改善这组界。

Working through the rest of the list we do not improve that set of bounds.

因此​​,我们风最长镜像列表是一个长度为2,而我们会按照链表(我没有在这个描述记录来发现这是 C 。因为该名单的目的是在位置(5,3)我们可翻转表,插入字符 4 ,然后追加列表以获得 carac

So we wind up with the longest mirrored list is of length 2. And we'd follow the linked list (that I didn't record in this description to find it is ac. Since the ends of that list are at positions (5, 3) we can flip the list, insert character 4, then append the list to get carac.

在一般它将所需的最大存储器是存储所有的最大的长度的镜像子序列加上存储器存储的链表所述亚序列。典型地,这将是一个非常小的存储量。

In general the maximum memory that it will require is to store all of the lengths of the maximal mirrored subsequences plus the memory to store the linked lists of said subsequences. Typically this will be a very small amount of memory.

在一个典型的内存/ CPU的权衡,你可以preprocess名单一旦时间 O(N)来生成一个为O(n )在哪里特定序列元素出现的数组大小哈希值。这可以让你扫描改善镜像子这个配对,而不必考虑整个字符串,它通常应该是一个重大的节省对CPU进行更长的字符串。

At a classic memory/CPU tradeoff you can preprocess the list once in time O(n) to generate a O(n) sized hash of arrays of where specific sequence elements appear. This can let you scan for "improve mirrored subsequence with this pairing" without having to consider the whole string, which should generally be a major saving on CPU for longer strings.