Implement your own diff utility. i.e given two similar strings, efficiently list out all differences between them.

The diff utility is a data comparison tool that calculates and displays the differences between two text. It tries to determine the smallest set of deletions and insertions to create one text from the other. Diff is line-oriented rather than character-oriented, unlike edit distance.

Input:

string X = “XMJYAUZ”;

string Y = “XMJAATZ”;

Output:

X M J -Y A -U +A +T Z

(- indicates that character is deleted from Y but it was present in X)

(+ indicates that character is inserted in Y but it was not present in X)

We can use **Longest Common Subsequence (LCS)** to solve this problem. The idea is to find a longest sequence of characters that is present in both original sequences in the same order. From a longest common subsequence it is only a small step to get diff-like output –

- If an character is absent in the subsequence but present in the first original sequence, it must have been deleted (indicated by the ‘–’ marks).
- If it is absent in the subsequence but present in the second original sequence, it must have been inserted (indicated by the ‘+’ marks).

**C++ implementation –**

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 |
#include <bits/stdc++.h> using namespace std; // define maximum possible length of string X and Y #define M 20 #define N 25 // lookup[i][j] stores the length of LCS of substring X[0..i-1], Y[0..j-1] int lookup[M][N]; // Function to display the differences between two strings void Diff(string X, string Y, int m, int n) { // if last character of X and Y matches if (m > 0 && n > 0 && X[m - 1] == Y[n - 1]) { Diff(X, Y, m - 1, n - 1); cout << " " << X[m - 1]; } // current character of Y is not present in X else if (n > 0 && (m == 0 || lookup[m][n - 1] >= lookup[m - 1][n])) { Diff(X, Y, m, n - 1); cout << " +" << Y[n - 1]; } // current character of X is not present in Y else if (m > 0 && (n == 0 || lookup[m][n - 1] < lookup[m - 1][n])) { Diff(X, Y, m - 1, n); cout << " -" << X[m - 1]; } } // Function to fill lookup table by finding the length of LCS // of substring X[0..m-1] and Y[0..n-1] void LCSLength(string X, string Y, int m, int n) { // first column of the lookup table will be all 0 for (int i = 0; i <= m; i++) lookup[i][0] = 0; // first row of the lookup table will be all 0 for (int j = 0; j <= n; j++) lookup[0][j] = 0; // fill the lookup table in bottom-up manner for (int i = 1; i <= m; i++) { for (int j = 1; j <= n; j++) { // if current character of X and Y matches if (X[i - 1] == Y[j - 1]) lookup[i][j] = lookup[i - 1][j - 1] + 1; // else if current character of X and Y don't match else lookup[i][j] = max(lookup[i - 1][j], lookup[i][j - 1]); } } } // main function int main() { string X = "ABCDFGHJQZ"; string Y = "ABCDEFGIJKRXYZ"; int m = X.length(), n = Y.length(); // fill lookup table LCSLength(X, Y, m, n); // find difference by reading lookup table in top-down manner Diff(X, Y, m, n); return 0; } |

**Output: **

A B C D +E F G -H +I J -Q +K +R +X +Y Z

The time complexity of above solution is O(mn) and auxiliary space used by the program is O(mn).

**Exercise:** Modify above solution to find differences by reading lookup table in bottom-up manner

**References:** https://en.wikipedia.org/wiki/Diff_utility

**Thanks for reading.**

Please use ideone or C++ Shell or any other online compiler link to post code in comments.

Like us? Please spread the word and help us grow. Happy coding 🙂