Find first k non-repeating characters in a string in single traversal

Given a string, find first K non-repeating characters in it by doing only single traversal of it.


 

For example, if the string is ABCDBAGHCHFAC and K = 3, output would be D G F

 
A simple solution would be to store count of each character in a map or an array by traversing it once. Then we traverse the string once more to find the first k characters having their count as 1.
The time complexity of this solution is O(n) and auxiliary space used is O(n). The problem in this solution is that we are traversing the string twice and it violates the program constraints.

 
We can solve this problem in single traversal of the string. The idea is to use a map to store each distinct character count and the index of its first or last occurrence in the string. Then we traverse the map and push index of all characters having count 1 into the min-heap. Finally we pop top k keys from the min-heap and that will be our first k non-repeating characters in the string.

Note that in this solution we are doing one complete traversal of the string and the map. Since the size of the map is equal to alphabet size in worst-case (which is a constant), it can be ignored. The time complexity of below solution is O(nlog(n)) and auxiliary space used by the program is O(n).

C++

Download   Run Code

Output:

D G F

Java

Download   Run Code

Output:

D G F

 
Above solution inserts all characters of the map (all having count of 1) into the min-heap. So the heap size becomes O(n) in the worst case. We can reduce the heap size to O(k) in worst case. The idea is to push only first k characters into the max-heap and then for all subsequent elements in the map, if current element is less than the root of the heap, we replace the root with it. After we have processed every key of the map, the heap will contain first k non-repeating characters.
(Here by character we mean index of a character)

C++

Download   Run Code

Output:

F G D

Java

Download   Run Code

Output:

F G D

 
Thanks for reading.

Please use ideone or C++ Shell or any other online compiler link to post code in comments.
Like us? Please spread the word and help us grow. Happy coding 🙂
 


Get great deals at Amazon




Leave a Reply

avatar
  Subscribe  
newest oldest most voted
Notify of
MANISH PERIWAL
Guest
MANISH PERIWAL

The second code should have a max-heap instead of a min heap

Avishek Dutta
Guest
Avishek Dutta

How come the time complexity of the first algorithm is O(N)?
1. Traverse the string and store count in map O(N).
2. Traverse the map and push elements with count=1 into heap. O(NlogN) [In worst case all the n characters in string can have count=1].
3. pop top K element in heap. O(klogN).

Total: O(NlogN)

Please rectify me if I am wrong.

Appun
Guest
Appun

we can also use a list, that way we don’t have to traverse twice, following is the Java implementation

public static List firstKNonRepeating(String str, int k){
/*
map to store char count and the index of its last occurrence in the string
*/
Map map = new HashMap();

List uniqueChar = new LinkedList();

for(int i = 0; i<str.length(); i++){
if(map.get(str.charAt(i))==null){
map.put(str.charAt(i), 1);
uniqueChar.add(str.charAt(i));
}else{
uniqueChar.remove(Character.valueOf(str.charAt(i)));
}
}
return uniqueChar.subList(0,k);
}

Abhishek
Guest
Abhishek

The second program yields wrong input for other test cases.
Here: https://ideone.com/m0KkGL
Expected output : D G E
Actual Output: D E F

Program should have a max heap instead of a min heap. Please check and tell if I’m wrong.
Thanks