The shortest common supersequence (SCS) is the problem of finding the shortest supersequence Z of given sequences X and Y such that both X and Y are subsequences of Z.

Consider the two following sequences X and Y

X: ABCBDAB

Y: BDCABA

The length of SCS is 9

SCS are ABCBDCABA, ABDCABDAB and ABDCBDABA

The SCS problem has an **optimal substructure**. That means the problem can be broken down into smaller, simple “subproblems”, which can be broken down into yet simpler subproblems, and so on, until, finally, the solution becomes trivial.

**1. Let us consider two sequences X and Y of length m and n respectively that both end in the same element. **

To find their SCS, we shorten each sequence by removing the last element, find the SCS of the shortened sequences, and to that SCS append the removed element. So we can say that

SCS(X[1..m], Y[1..n]) = SCS(X[1..m-1], Y[1..n-1]) + X[m] if X[m] = Y[n]

For example,

SCS(‘ABCBDA’, ‘BDCABA’) = SCS(‘ABCBD’, ‘BDCAB’) + ‘A’

SCS(‘ABCBDAB’, ‘BDCAB’) = SCS(‘ABCBDA’, ‘BDCA’) + ‘B’

**2. Now suppose that the two sequences do not end in the same element. **

Then the SCS of X and Y is the **shorter **of the two sequences SCS(X[1..m-1], Y[1..n]) + X[m] and SCS(X[1..m], Y[1..n-1]) + Y[n]. To understand this property, let’s consider the two following sequences

X: ABCBDAB *(n elements)*

Y: BDCABA *(m elements)*

The SCS of these two sequences either ends with a B (the last element of sequence X) or with A (the last element of sequence Y).

**Case 1:** If SCS ends with a B, then we add B from sequence X to SCS and the problem reduces to SCS(X[1..m-1], Y[1..n]) + *X[m]*.

**Case 2:** If SCS does not end with a B, that means it ends with A, then we add A from sequence Y to SCS and the problem reduces to SCS(X[1..m], Y[1..n-1]) + *Y[n]*.

For example,

SCS(‘ABCBD’, ‘BDCAB’) = minimum (SCS(‘ABCB’, ‘BDCAB’) + ‘D’,

SCS(‘ABCBD’, ‘BDCA’) + ‘B’)

Below solution finds the length of shortest common supersequence of sequences X[0..m-1] and Y[0..n-1] recursively by using optimal substructure property of SCS problem.

**C++ implementation –**

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
#include <bits/stdc++.h> using namespace std; // Function to find length of shortest Common supersequence of // sequences X[0..m-1] and Y[0..n-1] int SCSLength(string X, string Y, int m, int n) { // if we have reached the end of either sequence, return // length of other sequence if (m == 0 || n == 0) return n + m; // if last character of X and Y matches if (X[m - 1] == Y[n - 1]) return SCSLength(X, Y, m - 1, n - 1) + 1; else // else if last character of X and Y don't match return min(SCSLength(X, Y, m, n - 1) + 1, SCSLength(X, Y, m - 1, n) + 1); } // main function int main() { string X = "ABCBDAB", Y = "BDCABA"; int m = X.length(), n = Y.length(); cout << "The length of shortest Common supersequence is " << SCSLength(X, Y, m, n); return 0; } |

**Output: **

The length of shortest Common supersequence is 9

The worst case time complexity of above solution is O(2^{(m+n)}) and auxiliary space used by the program is O(1). The worst case happens when there is no common character present in X and Y (i.e. SCS length is (m+n)) and each recursive call will end up in two recursive calls.

The SCS problem exhibits **overlapping subproblems**. A problem is said to have overlapping subproblems if the recursive algorithm for the problem solves the same subproblem over and over rather than always generating new subproblems.

Let us consider recursion tree for two sequences of length 6 and 8 whose SCS length is (6+8=14).

As we can see, the same sub-problems (highlighted in same color) are getting computed again and again. We know that problems having optimal substructure and overlapping subproblems can be solved by dynamic programming, in which subproblem solutions are memoized rather than computed again and again. This method is illustrated below –

**Memoized C++ implementation –**

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
#include <bits/stdc++.h> using namespace std; // Function to find length of shortest Common supersequence of // sequences X[0..m-1] and Y[0..n-1] int SCSLength(string X, string Y, int m, int n, auto &lookup) { // if we have reached the end of either sequence, return // length of other sequence if (m == 0 || n == 0) return n + m; // construct a unique map key from dynamic elements of the input string key = to_string(m) + "|" + to_string(n); // if sub-problem is seen for the first time, solve it and // store its result in a map if (lookup.find(key) == lookup.end()) { // if last character of X and Y matches if (X[m - 1] == Y[n - 1]) lookup[key] = SCSLength(X, Y, m - 1, n - 1, lookup) + 1; else // else if last character of X and Y don't match lookup[key] = min (SCSLength(X, Y, m, n - 1, lookup) + 1, SCSLength(X, Y, m - 1, n, lookup) + 1); } // return the subproblem solution from the map return lookup[key]; } // main function int main() { string X = "ABCBDAB", Y = "BDCABA"; int m = X.length(), n = Y.length(); // create a map to store solutions of subproblems // we can also use array instead of map unordered_map<string, int> lookup; cout << "The length of shortest Common supersequence is " << SCSLength(X, Y, m, n, lookup); return 0; } |

**Output: **

The length of shortest Common supersequence is 9

The time complexity of above solution is O(mn) and auxiliary space used by the program is O(mn).

Above *Memo*ized version follows the top-down approach, since we first break the problem into subproblems and then calculate and store values. We can also solve this problem in bottom-up manner. In the bottom-up approach, we calculate the smaller values of SCS(i, j) first, then build larger values from them.

| j (if i == 0)

| i (if j == 0)

SCS[i][j] = | SCS[i-1][j-1] + 1 (if X[i-1] == Y[j-1])

| min (SCS[i-1][j] + 1, SCS[i][j-1] + 1) (if X[i-1] != Y[j-1])

**C++ implementation –**

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
#include <bits/stdc++.h> using namespace std; // Function to find length of shortest Common supersequence of // sequences X[0..m-1] and Y[0..n-1] int SCSLength(string X, string Y) { int m = X.length(), n = Y.length(); // lookup table stores solution to already computed sub-problems // i.e. lookup[i][j] stores the length of SCS of substring // X[0..i-1] and Y[0..j-1] int lookup[m + 1][n + 1]; // initalize first column of the lookup table for (int i = 0; i <= m; i++) lookup[i][0] = i; // initalize first row of the lookup table for (int j = 0; j <= n; j++) lookup[0][j] = j; // fill the lookup table in bottom-up manner for (int i = 1; i <= m; i++) { for (int j = 1; j <= n; j++) { // if current character of X and Y matches if (X[i - 1] == Y[j - 1]) lookup[i][j] = lookup[i - 1][j - 1] + 1; // else if current character of X and Y don't match else lookup[i][j] = min(lookup[i - 1][j] + 1, lookup[i][j - 1] + 1); } } // Uncomment below code to print the lookup table /*for (int i = 0; i <= m; i++) { for (int j = 0; j <= n; j++) cout << lookup[i][j] << " "; cout << endl; }*/ // SCS will be last entry in the lookup table return lookup[m][n]; } // main function int main() { string X = "ABCBDAB", Y = "BDCABA"; cout << "The length of shortest Common supersequence is " << SCSLength(X, Y); return 0; } |

**Output: **

The length of shortest Common supersequence is 9

The time complexity of above solution is O(mn) and auxiliary space used by the program is O(mn). The space complexity of above solution can be improved to O(n) as calculating SCS of a row of the SCS table requires only the solutions to the current row and the previous row.

**Exercise:**

1. Extend above solution for printing shortest Common supersequence.

2. Write space optimized code for iterative version.

**Thanks for reading.**

Please use ideone or C++ Shell or any other online compiler link to post code in comments.

Like us? Please spread the word and help us grow. Happy coding 🙂