Given a string, find first K non-repeating characters in it by doing only single traversal of it.

A simple solution would be to store count of each character in a map or an array by traversing it once. Then we traverse the string once more to find the first k characters having their count as 1.

The time complexity of this solution is O(n^{}) and auxiliary space used is O(n). The problem in this solution is that we are traversing the string twice and it violates the program constraints.

We can solve this problem in single traversal of the string. The idea is to use a map to store each distinct character count and the index of its first or last occurrence in the string. Then we traverse the map and push index of all characters having count 1 into the min-heap. Finally we pop top k keys from the min-heap and that will be our first k non-repeating characters in the string.

Note that in this solution we are doing one complete traversal of the string and the map. Since the size of the map is equal to alphabet size in worst-case (which is a constant), it can be ignored.

**C++ implementation –**

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
#include<iostream> #include<unordered_map> #include<vector> #include<queue> using namespace std; // Function to find the first k non-repeating character in // the string by doing only one traversal of it int firstKNonRepeating(string str, int k) { // map to store character count and the index of its // last occurrence in the string unordered_map<char, pair<int, int>> map; for (int i = 0 ; i < str.length(); i++) { map[str[i]].first++; map[str[i]].second = i; } // create an empty min-heap priority_queue<int, vector<int>, greater<int>> pq; // traverse the map and push index of all characters // having count of 1 into the min-heap for (auto it : map) { int count = it.second.first; int index = it.second.second; if (count == 1) pq.push(index); } // pop top k keys from the min-heap while (k-- && !pq.empty()) { // extract the minimum node from the min-heap int min_index = pq.top(); pq.pop(); cout << str[min_index] << " "; } } // main function int main() { string str = "ABCDBAGHCHFAC"; int k = 3; firstKNonRepeating(str, k); return 0; } |

`Output: `

D G F

The time complexity of above solution is O(nlog(n)) and auxiliary space used by the program is O(n).

Above solution inserts all characters of the map (all having count of 1) into the min-heap. So the heap size becomes O(n) in the worst case. We can reduce the heap size to O(k) in worst case. The idea is to push only first k characters into the max-heap and then for all subsequent elements in the map, if current element is less than the root of the heap, we replace the root with it. After we have processed every key of the map, the heap will contain first k non-repeating characters.

(*Here by character we mean index of a character*)

**C++ implementation –**

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
#include<iostream> #include<unordered_map> #include<vector> #include<queue> using namespace std; // Function to find the first k non-repeating character in // the string by doing only one traversal of it int firstKNonRepeating(string str, int k) { // map to store character count and the index of its // last occurrence in the string unordered_map<char, pair<int, int>> map; int n = str.length(); for (int i = 0 ; i < n; i++) { map[str[i]].first++; map[str[i]].second = i; } // create an empty max-heap (max size k) priority_queue<int, vector<int>> pq; // traverse the map and process index of all characters // having count of 1 for (auto it : map) { int count = it.second.first; int index = it.second.second; if (count == 1) { // if heap has less than k keys in it // push index of current character if (--k >= 0) { pq.push(index); } // else if index of current element is less than the root of the // heap, replace the root with the current element else if (index < pq.top()) { pq.pop(); pq.push(index); } } } // Now the heap contains index of first k non-repeating characters // pop all keys from the max-heap while (!pq.empty()) { // extract the maximum node from the max-heap int max_index = pq.top(); pq.pop(); cout << str[max_index] << " "; } } // main function int main() { string str = "ABCDBAGHECHFAC"; int k = 3; firstKNonRepeating(str, k); return 0; } |

`Output: `

F G D

**Thanks for reading.**

Please use ideone or C++ Shell or any other online compiler link to post code in comments.

Like us? Please spread the word and help us grow. Happy coding 🙂

## Leave a Reply

The second code should have a max-heap instead of a min heap

Thanks Manish. Code is updated.

How come the time complexity of the first algorithm is O(N)?

1. Traverse the string and store count in map O(N).

2. Traverse the map and push elements with count=1 into heap. O(NlogN) [In worst case all the n characters in string can have count=1].

3. pop top K element in heap. O(klogN).

Total: O(NlogN)

Please rectify me if I am wrong.

Hi Venishek,

Building a heap takes O(n) time if all elements are known in advance before initializing the heap. But since that is not the case, the complexity is

`O(nlogn)`

only. Thanks a lot for bringing this to our notice. We will update the complexity.we can also use a list, that way we don’t have to traverse twice, following is the Java implementation

public static List firstKNonRepeating(String str, int k){

/*

map to store char count and the index of its last occurrence in the string

*/

Map map = new HashMap();

List uniqueChar = new LinkedList();

for(int i = 0; i<str.length(); i++){

if(map.get(str.charAt(i))==null){

map.put(str.charAt(i), 1);

uniqueChar.add(str.charAt(i));

}else{

uniqueChar.remove(Character.valueOf(str.charAt(i)));

}

}

return uniqueChar.subList(0,k);

}

The second program yields wrong input for other test cases.

Here: https://ideone.com/m0KkGL

Expected output : D G E

Actual Output: D E F

Program should have a max heap instead of a min heap. Please check and tell if I’m wrong.

Thanks

Thanks Abhishek for bringing this to our notice. We have updated the code.