Shortest Superstring Problem

Given a list of strings where no string is substring of another, find a shortest string that contains each string in given list as a substring.


 

For example,

Input: [CATGC, CTAAGT, GCTA, TTCA, ATGCATC]

Output: The Shortest Superstring is GCTAAGTTCATGCATC

GCTAAGTTCATGCATC is the shortest possible string such that it contains every string in input list as its substring.

GCTAAGTTCATGCATC
GCTAAGTTCATGCATC
GCTAAGTTCATGCATC
GCTAAGTTCATGCATC
GCTAAGTTCATGCATC

 

 
The Shortest Superstring Problem is NP-Hard. But it can be solved by taking a greedy approach. Below is the Greedy Algorithm –


Input: A set of strings S

T = S
while |T| > 1 do
    Let a and b be the most overlapping strings of T
    Replace a and b with the string obtained by overlapping and b

T contains shortest superstring of S

 
For example,

S = T = {CATGC, CTAAGT, GCTA, TTCA, ATGCATC}
T = {CATGCATC, CTAAGT, GCTA, TTCA}
T = {CATGCATC, GCTAAGT, TTCA}
T = {TTCATGCATC, GCTAAGT}
T = {GCTAAGTTCATGCATC}

 

Now how to find the most overlapping strings of T?

The above greedy algorithm looks simple but real difficulty lies in how to find the most overlapping strings in given set of strings. Below is the naive algorithm that does that –


Check maximum overlap for each pair of strings s1 and s2 by
    a. checking if suffix of s1 matches with prefix of s2 by comparing
        last i characters in s1 with first i characters in s2

    b. checking if prefix of s1 matches with suffix of s2 by comparing
        first i characters in s1 with last i characters in s2

return s1 and s2 involved in maximum overlap

 
Below are the C++ and Java program that demonstrates it:

C++

Download   Run Code

Output:

The Shortest Superstring is GCTAAGTTCATGCATC

Java

Download   Run Code

Output:

The Shortest Superstring is GCTAAGTTCATGCATC

 
Author: Aditya Goel

 
Thanks for reading.

Please use ideone or C++ Shell or any other online compiler link to post code in comments.
Like us? Please spread the word and help us grow. Happy coding 🙂
 

Get great deals at Amazon




Leave a Reply

avatar
  Subscribe  
newest oldest most voted
Notify of
Top Coder
Guest
Top Coder

nice article with very useful comments!!

foobar
Guest
foobar

Don’t we need to reset the builder in findOverlappingPair function before appending?