Implement your own diff utility. i.e given two similar strings, efficiently list out all differences between them.

The diff utility is a data comparison tool that calculates and displays the differences between two text. It tries to determine the smallest set of deletions and insertions to create one text from the other. Diff is line-oriented rather than character-oriented, unlike edit distance.

**Input:**

string X = “XMJYAUZ”;

string Y = “XMJAATZ”;

**Output:**

X M J -Y A -U +A +T Z

(- indicates that character is deleted from Y but it was present in X)

(+ indicates that character is inserted in Y but it was not present in X)

We can use Longest Common Subsequence (LCS) to solve this problem. The idea is to find a longest sequence of characters that is present in both original sequences in the same order. From a longest common subsequence it is only a small step to get diff-like output –

- If an character is absent in the subsequence but present in the first original sequence, it must have been deleted (indicated by the ‘–’ marks).

- If it is absent in the subsequence but present in the second original sequence, it must have been inserted (indicated by the ‘+’ marks).

Below is C++ and Java implementation of the idea:

## C++

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 |
#include <iostream> #include <string> using namespace std; // define maximum possible length of string X and Y #define M 20 #define N 25 // lookup[i][j] stores the length of LCS of substring X[0..i-1], Y[0..j-1] int lookup[M][N]; // Function to display the differences between two strings void diff(string X, string Y, int m, int n) { // if last character of X and Y matches if (m > 0 && n > 0 && X[m - 1] == Y[n - 1]) { diff(X, Y, m - 1, n - 1); cout << " " << X[m - 1]; } // current character of Y is not present in X else if (n > 0 && (m == 0 || lookup[m][n - 1] >= lookup[m - 1][n])) { diff(X, Y, m, n - 1); cout << " +" << Y[n - 1]; } // current character of X is not present in Y else if (m > 0 && (n == 0 || lookup[m][n - 1] < lookup[m - 1][n])) { diff(X, Y, m - 1, n); cout << " -" << X[m - 1]; } } // Function to fill lookup table by finding the length of LCS // of substring X[0..m-1] and Y[0..n-1] void lcs_length(string X, string Y, int m, int n) { // first column of the lookup table will be all 0 for (int i = 0; i <= m; i++) lookup[i][0] = 0; // first row of the lookup table will be all 0 for (int j = 0; j <= n; j++) lookup[0][j] = 0; // fill the lookup table in bottom-up manner for (int i = 1; i <= m; i++) { for (int j = 1; j <= n; j++) { // if current character of X and Y matches if (X[i - 1] == Y[j - 1]) lookup[i][j] = lookup[i - 1][j - 1] + 1; // else if current character of X and Y don't match else lookup[i][j] = max(lookup[i - 1][j], lookup[i][j - 1]); } } } // Implement Diff Utility int main() { string X = "ABCDFGHJQZ"; string Y = "ABCDEFGIJKRXYZ"; int m = X.length(), n = Y.length(); // fill lookup table lcs_length(X, Y, m, n); // find difference by reading lookup table in top-down manner diff(X, Y, m, n); return 0; } |

`Output:`

A B C D +E F G -H +I J -Q +K +R +X +Y Z

## Java

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
class DiffUtility { // Function to display the differences between two Strings public static void diff(String X, String Y, int m, int n, int[][] lookup) { // if last character of X and Y matches if (m > 0 && n > 0 && X.charAt(m - 1) == Y.charAt(n - 1)) { diff(X, Y, m - 1, n - 1, lookup); System.out.print(" " + X.charAt(m - 1)); } // current character of Y is not present in X else if (n > 0 && (m == 0 || lookup[m][n - 1] >= lookup[m - 1][n])) { diff(X, Y, m, n - 1, lookup); System.out.print(" +" + Y.charAt(n - 1)); } // current character of X is not present in Y else if (m > 0 && (n == 0 || lookup[m][n - 1] < lookup[m - 1][n])) { diff(X, Y, m - 1, n, lookup); System.out.print(" -" + X.charAt(m - 1)); } } // Function to fill lookup table by finding the length of LCS // of substring X[0..m-1] and Y[0..n-1] public static void LCSLength(String X, String Y, int m, int n, int[][] lookup) { // first column of the lookup table will be all 0 for (int i = 0; i <= m; i++) { lookup[i][0] = 0; } // first row of the lookup table will be all 0 for (int j = 0; j <= n; j++) { lookup[0][j] = 0; } // fill the lookup table in bottom-up manner for (int i = 1; i <= m; i++) { for (int j = 1; j <= n; j++) { // if current character of X and Y matches if (X.charAt(i - 1) == Y.charAt(j - 1)) { lookup[i][j] = lookup[i - 1][j - 1] + 1; } // else if current character of X and Y don't match else { lookup[i][j] = Integer.max(lookup[i - 1][j], lookup[i][j - 1]); } } } } // main function public static void main(String[] args) { String X = "ABCDFGHJQZ"; String Y = "ABCDEFGIJKRXYZ"; // lookup[i][j] stores the length of LCS of substring // X[0..i-1], Y[0..j-1] int[][] lookup = new int[X.length() + 1][Y.length() + 1]; // fill lookup table LCSLength(X, Y, X.length(), Y.length(), lookup); // find difference diff(X, Y, X.length(), Y.length(), lookup); } } |

`Output:`

A B C D +E F G -H +I J -Q +K +R +X +Y Z

The time complexity of above solution is O(mn) and auxiliary space used by the program is O(mn).

**Exercise:** Modify above solution to find differences by reading lookup table in bottom-up manner

**References:** https://en.wikipedia.org/wiki/Diff_utility

**Thanks for reading.**

Please use our online compiler to post code in comments. To contribute, get in touch with us.

Like us? Please spread the word and help us grow. Happy coding 🙂

## Leave a Reply

This is solved using longest common subsequence