Given a string and a positive number k, find the longest substring of the string containing k distinct characters. If k is more than the total number of distinct characters in the string, return the whole string.

The problem differs from the problem of finding the longest subsequence with k distinct characters. Unlike subsequences, substrings are required to occupy consecutive positions within the original string.

 
For example, consider string abcbdbdbbdcdabd.

For k = 2, o/p is ‘bdbdbbd’
For k = 3, o/p is ‘bcbdbdbbdcd’
For k = 5, o/p is ‘abcbdbdbbdcdabd’

Practice this problem

A simple solution would be to generate all substrings of the given string and return the longest substring containing k distinct characters. The time complexity of this solution is O(n3) since it takes O(n2) time to generate all substrings for a string of length n and O(n) time to process each substring.

 
We can easily solve this problem in O(n) time and O(n) space. The idea is to use a sliding window technique. In the sliding window technique, a window is maintained that satisfies the problem constraints. The window is unstable if it violates the problem constraints, and it tries to stabilize by increasing or decreasing its size.

The window (substring) is stable for the current problem if it contains k distinct characters at any point. If the window has less than k distinct characters, it expands by adding characters to it from the right; otherwise, if the window contains more than k distinct characters, it shrinks by removing characters from the left until it becomes stable again. The steady-state window tends to increase its size by adding characters to it until it becomes unstable again. We continue this process until the window reaches the last character in the string. At each point the window size changes, update the maximum window size.

 
The algorithm can be implemented as follows in C++, Java, and Python:

C++


Download  Run Code

Output:

bdbdbbd

Java


Download  Run Code

Output:

bdbdbbd

Python


Download  Run Code

Output:

bdbdbbd

The time complexity of the above solution is O(n) as it does two traversals of the given string of length n.