Find first k maximum occurring words in given set of strings

Given a huge set of strings with duplicate strings present, find first k-maximum occurring words in it.

 

For example,

Input:  

keys = [“code”, “coder”, “coding”, “codable”, “codec”, “codecs”, “coded”,
“codeless”, “codec”, “codecs”, “codependence”, “codex”, “codify”,
“codependents”, “codes”, “code”, “coder”, “codesign”, “codec”,
“codeveloper”, “codrive”, “codec”, “codecs”, “codiscovered”]

k = 4

Output:

codec occurs 4 times
codecs occurs 3 times
code occurs 2 times
coder occurs 2 times

 

Related Posts: Find maximum occurring word in given set of strings

 


 

The idea is to use Trie (Prefix Tree) to solve this problem. We start by inserting each key into the trie and store its count so far (along with the key itself) in the leaf nodes. After all keys are inserted into the trie, we perform its pre-order traversal (depth first search) and insert all key’s count in a max heap. Then the problem is reduced to removing first k elements from min-Heap.

 
C++ implementation –
 

Download   Run Code

Output:

codec occurs 4 times
codecs occurs 3 times
code occurs 2 times
coder occurs 2 times

 

Thanks for reading.




Please use ideone or C++ Shell or any other online compiler link to post code in comments.
Like us? Please spread the word and help us grow. Happy coding 🙂