Given a string and a dictionary of words, determine if string can be segmented into a space-separated sequence of one or more dictionary words.

For example,

**Input:**

dict[] = { “this”, “th”, “is”, “famous”, “Word”, “break”, “b”,

“r”, “e”, “a”, “k”, “br”, “bre”, “brea”, “ak”, “problem” };

string = “Wordbreakproblem”

**Output:** String can be segmented

The segmented string(s) is shown below –

Word b r e a k problem

Word b r e ak problem

Word br e a k problem

Word br e ak problem

Word bre a k problem

Word bre ak problem

Word brea k problem

Word break problem

We have already discussed recursive solution of word break problem and alternate version where we actually print all sequences in previous post. In this post, we will cover iterative solution using Trie data structure that also offers better time complexity.

Consider the problem of breaking a string into component words. Call this string `s`

. Let `x`

be a prefix of `s`

, and `y`

be the remaining characters forming a suffix; so `xy`

(`x`

concatenated with `y`

) is `s`

. Then if we can break `x`

and `y`

into words recursively, we can break `xy = s`

by merging the two sets of words. We can simplify things for ourselves by assuming that `x`

will be a dictionary word; the problem is then to construct such `x`

. We can do this with a trie. Since `x`

is known to be a prefix of `s`

, any candidate word in the dictionary must be found on the path of the trie corresponding to the first few letters of `s`

. To do this using dynamic programming, any time we find an appropriate `x`

, we set our “good” array to have a solution at `|x|+1`

, where `|x|`

is the size of the prefix `x`

. Then we can simply check the last entry to find if the entire string can be broken up.

## C++

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 |
#include <bits/stdc++.h> using namespace std; // define character size // currently Trie supports lowercase English characters (a - z) #define CHAR_SIZE 26 // A Trie node struct Node { bool exist; // true when node is a leaf node Node* next[CHAR_SIZE]; }; // Function that returns a new Trie node Node* newTrieNode() { Node* node = (Node*)malloc(sizeof(Node)); node->exist = false; for (int i = 0; i < CHAR_SIZE; i++) node->next[i] = NULL; return node; } // Recursive function to delete Trie in postorder manner void freeTrie(Node* node) { if (!node) return; for (int i = 0; i < CHAR_SIZE; i++) freeTrie(node->next[i]); free(node); } // Iterative function to insert a string in Trie void* insertTrie(Node* head, string str) { // start from root node Node* node = head; // do for each character in the string for (int i = 0; i < str.length(); i++) { // create a new node if path doesn't exists if (node->next[str[i] - 'a'] == NULL) node->next[str[i] - 'a'] = newTrieNode(); // go to next node node = node->next[str[i] - 'a']; } // mark last node as leaf node->exist = true; } // Function to determine if string can be segmented into a space-separated // sequence of one or more dictionary words bool wordBreak(Node* head, string str) { // get length of the string int n = str.length(); // good[i] is true if the first i characters of str can be segmented bool good[n + 1]; memset(good, false, sizeof good); good[0] = true; // base case for (int i = 0; i < n; i++) { if (good[i]) { Node* node = head; for (int j = i; j < n; j++) { if (!node) break; node = node->next[str[j] - 'a']; // we can make [0, i] using our known decomposition, // and [i+1, j] using this string in trie if (node && node->exist) good[j + 1] = true; } } } // good[n] would be true if all characters of str can be segmented return good[n]; } // main function int main() { // vector of strings to represent dictionary vector<string> dict = { "this", "th", "is", "famous", "word", "break", "b", "r", "e", "a", "k", "br", "bre", "brea", "ak", "prob", "lem"}; // use trie to store dictionary Node* t = newTrieNode(); for (auto x : dict) insertTrie(t, x); // given string string str = "wordbreakproblem"; // check if string can be segmented or not if (wordBreak(t, str)) cout << "String can be segmented"; else cout << "String can't be segmented"; // deallocate trie memory freeTrie(t); return 0; } |

**Output: **

String can be segmented

##### Runtime Analysis –

A naive analysis suggests an O(n^2) runtime, but notice that the second loop will break when the node is null. This must occur after `k`

steps, where `k`

is the deepest vertex in the trie (though it could of course occur earlier). It is not too difficult to see that with a dictionary containing a word of maximum length `w`

, we would have `k = w+1`

. So the complexity of the loop is actually O(w); and thus the whole function has a complexity of O(nw). The auxiliary space used is the space necessary to hold a trie, as well as the good array; O(n + sum of word lengths).

**Author: **Whitehead, Spencer Nicholas

**Thanks for reading.**

Please use ideone or C++ Shell or any other online compiler link to post code in comments.

Like us? Please spread the word and help us grow. Happy coding 🙂

## Leave a Reply

thanks for your post.

// good[i] is true if the first i characters of str can be segmented

bool good[n + 1];

memset(good, false, sizeof good);

Why use memset?

Could this be initialized with

bool good[n+1] = { false };

?

bool good[n + 1] = { false };means set the first element to false and set the remaining elements to false. It works, but not the correct way of doing things.In C++, we should use

std::fill_n, and in Cmemsetis the preferred. The author preferredmemsethere.Hope that cleared things for you. Happy coding 🙂