The longest common substring problem is the problem of finding the longest string (or strings) that is a substring (or are substrings) of two strings.

The problem differs from problem of finding longest common subsequence. Unlike subsequences, substrings are required to occupy consecutive positions within the original sequences.

The problem differs from problem of finding longest common subsequence. Unlike subsequences, substrings are required to occupy consecutive positions within the original sequences.

For example, the longest common substring of the strings ‘ABABC’, ‘BABCA’ is string ‘BABC’ having length 4. Other common substrings are ‘ABC’, ‘A’, ‘AB’, ‘B’, ‘BA’, ‘BC’ and ‘C’.

Naive solution would be to consider all substrings of the second string and find the longest substring that is also a substring of first string. The time complexity of this solution would be O((m+n)*m^{2}) as it takes (m+n) time for substring search and there are m^{2} substrings of second string. We can optimize this method by considering substrings in order of their decreasing lengths and return as soon any substring matches the first string. But worst case time complexity still remains the same when no common characters are present.

##### Can we do better?

The idea is to find the longest common suffix for all pairs of prefixes of the strings using Dynamic Programming using the relation –

LCSuffix[i][j] = | LCSuffix[i-1][j-1] + 1 (if X[i-1] = Y[j-1])

| 0 (otherwise)

Here,

0 <= i - 1 < m where m is the length of the string X

0 <= j - 1 < n where n is the length of the string Y

For example, consider strings ‘ABAB’ and ‘BABA’.

Finally, the length of the longest common substring would be the maximal of these longest common suffixes of all possible prefixes.

Below solution finds the length of longest repeated Subsequence of sequences X and Y iteratively by using optimal substructure property of LCS problem.

## C++

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
#include <bits/stdc++.h> using namespace std; // Function to find Longest common substring of sequences // X[0..m-1] and Y[0..n-1] string LCS(string X, string Y, int m, int n) { int maxlen = 0; // stores the max length of LCS int endingIndex = m; // stores the ending index of LCS in X // lookup[i][j] stores the length of LCS of substring // X[0..i-1], Y[0..j-1] int lookup[m + 1][n + 1]; // initialize all cells of lookup table to 0 memset(lookup, 0, sizeof(lookup)); // fill the lookup table in bottom-up manner for (int i = 1; i <= m; i++) { for (int j = 1; j <= n; j++) { // if current character of X and Y matches if (X[i - 1] == Y[j - 1]) { lookup[i][j] = lookup[i - 1][j - 1] + 1; // update the maximum length and ending index if (lookup[i][j] > maxlen) { maxlen = lookup[i][j]; endingIndex = i; } } } } // return Longest common substring having length maxlen return X.substr(endingIndex - maxlen, maxlen); } // main function int main() { string X = "ABC", Y = "BABA"; int m = X.length(), n = Y.length(); // Find Longest common substring cout << "The Longest common substring is " << LCS(X, Y, m, n); return 0; } |

**Output: **

Length of Longest Common Substring is 4

The time complexity of above solution is O(n^{2}) and auxiliary space used by the program is O(n^{2}). The space complexity of above solution can be improved to O(n) as calculating LCS of a row of the LCS table requires only the solutions to the current row and the previous row. We can also store only non-zero values in the rows. This can be done using hash tables instead of arrays.

We can also solve this problem in O(m + n) time by using generalized suffix tree. We will be soon discussing suffix tree approach in a separate post.

**Exercise:** Write space optimized code for iterative version.

**References:** https://en.wikipedia.org/wiki/Longest_common_substring_problem

**Thanks for reading.**

Please use ideone or C++ Shell or any other online compiler link to post code in comments.

Like us? Please spread the word and help us grow. Happy coding 🙂

## Leave a Reply

Given above

” For example, the longest common substring of the strings ‘ABABC’, ‘BABCA’ is string ‘ABC’ having length 3. ”

but actual length is 4 .

BABC from ‘ABABC’ and ‘BABCA’

Good catch. Thanks for correcting.. We have updated the example. Please let us know in case you found any more issues. Happy coding!

If you have two strings such as “DABCD” and “BABCA”, for which the substring would be “ABC”, your program prints out “AB” because it doesn’t keep track of the beginning of the substring.

Could you please run the code. It is giving correct output

https://ideone.com/dtjGkc