Implement Diff Utility

Implement your own diff utility. i.e given two similar strings, efficiently list out all differences between them.

 


The diff utility is a data comparison tool that calculates and displays the differences between two text. It tries to determine the smallest set of deletions and insertions to create one text from the other. Diff is line-oriented rather than character-oriented, unlike edit distance.

For example,


Input:
string X = “XMJYAUZ”;
string Y = “XMJAATZ”;

Output:
X M J -Y A -U +A +T Z

(- indicates that character is deleted from Y but it was present in X)
(+ indicates that character is inserted in Y but it was not present in X)

 


 

We can use Longest Common Subsequence (LCS) to solve this problem. The idea is to find a longest sequence of characters that is present in both original sequences in the same order. From a longest common subsequence it is only a small step to get diff-like output –

  • If an character is absent in the subsequence but present in the first original sequence, it must have been deleted (indicated by the ‘–’ marks).
  • If it is absent in the subsequence but present in the second original sequence, it must have been inserted (indicated by the ‘+’ marks).

 
C++ implementation –
 

Download   Run Code

Output:

A B C D +E F G -H +I J -Q +K +R +X +Y Z

 

The time complexity of above solution is O(mn) and auxiliary space used by the program is O(mn).

 
Exercise: Modify above solution to find differences by reading lookup table in bottom-up manner
 

References: https://en.wikipedia.org/wiki/Diff_utility

Thanks for reading.




Please use ideone or C++ Shell or any other online compiler link to post code in comments.
Like us? Please spread the word and help us grow. Happy coding 🙂