Word Break Problem | Using Trie Data Structure

Given a string and a dictionary of words, determine if string can be segmented into a space-separated sequence of one or more dictionary words.


For example,


Input:
dict[] = { “this”, “th”, “is”, “famous”, “Word”, “break”, “b”,
        “r”, “e”, “a”, “k”, “br”, “bre”, “brea”, “ak”, “problem” };

string = “Wordbreakproblem”

Output: String can be segmented

The segmented string(s) is shown below –

Word b r e a k problem
Word b r e ak problem
Word br e a k problem
Word br e ak problem
Word bre a k problem
Word bre ak problem
Word brea k problem
Word break problem

 


 

We have already discussed recursive solution of word break problem and alternate version where we actually print all sequences in previous post. In this post, we will cover iterative solution using Trie data structure that also offers better time complexity.
 

Consider the problem of breaking a string into component words. Call this string s. Let x be a prefix of s, and y be the remaining characters forming a suffix; so xy (x concatenated with y) is s. Then if we can break x and y into words recursively, we can break xy = s by merging the two sets of words. We can simplify things for ourselves by assuming that x will be a dictionary word; the problem is then to construct such x. We can do this with a trie. Since x is known to be a prefix of s, any candidate word in the dictionary must be found on the path of the trie corresponding to the first few letters of s. To do this using dynamic programming, any time we find an appropriate x, we set our “good” array to have a solution at |x|+1, where |x| is the size of the prefix x. Then we can simply check the last entry to find if the entire string can be broken up.

C++

Download   Run Code

Output:

String can be segmented

Runtime Analysis –

A naive analysis suggests an O(n^2) runtime, but notice that the second loop will break when the node is null. This must occur after k steps, where k is the deepest vertex in the trie (though it could of course occur earlier). It is not too difficult to see that with a dictionary containing a word of maximum length w, we would have k = w+1. So the complexity of the loop is actually O(w); and thus the whole function has a complexity of O(nw). The auxiliary space used is the space necessary to hold a trie, as well as the good array; O(n + sum of word lengths).

 
Author: Whitehead, Spencer Nicholas

 
Thanks for reading.




Please use ideone or C++ Shell or any other online compiler link to post code in comments.
Like us? Please spread the word and help us grow. Happy coding 🙂