Shortest Superstring Problem


Given a list of strings where no string is substring of another, find a shortest string that contains each string in given list as a substring.

 

For example,

Input: string[] = “CATGC”, “CTAAGT”, “GCTA”, “TTCA”, “ATGCATC”

Output: The Shortest Superstring is GCTAAGTTCATGCATC

Explanation: GCTAAGTTCATGCATC is the shortest possible string such that it contains every string in input list as its substring.

GCTAAGTTCATGCATC
GCTAAGTTCATGCATC
GCTAAGTTCATGCATC
GCTAAGTTCATGCATC
GCTAAGTTCATGCATC

 


The Shortest Superstring Problem is NP-Hard. But it can be solved by taking a greedy approach. Below is the Greedy Algorithm –


input: A set of strings S

T = S
while |T| > 1 do
    Let a and b be the most overlapping strings of T
    Replace a and b with the string obtained by overlapping and b

T contains shortest superstring of S

For example,

S = T = {CATGC, CTAAGT, GCTA, TTCA, ATGCATC}
T = {CATGCATC, CTAAGT, GCTA, TTCA}
T = {CATGCATC, GCTAAGT, TTCA}
T = {TTCATGCATC, GCTAAGT}
T = {GCTAAGTTCATGCATC}

Now how to find the most overlapping strings of T?

The above greedy algorithm looks simple but real difficulty lies in how to find the most overlapping strings in given set of strings. Below is the naive algorithm that does that –


Check maximum overlap for each pair of strings str1 and str2 by
    a. checking if suffix of str1 matches with prefix of str2 by comparing
        last i characters in str1 with first i characters in str2

    b. checking if prefix of str1 matches with suffix of str2 by comparing
        first i characters in str1 with last i characters in str2

return str1 and str2 involved in maximum overlap

 
C++ implementation –
 

Download   Run Code

Output:

The Shortest Superstring is GCTAAGTTCATGCATC

 
Author: Aditya Goel

 
Thanks for reading.




Please use ideone or C++ Shell or any other online compiler link to post code in comments.
Like us? Please spread the word and help us grow. Happy coding 🙂
 





Leave a Reply

Notify of
avatar
Sort by:   newest | oldest | most voted
Aditya Goel
Member

nice article with very useful comments!!

wpDiscuz