Given a string and a positive number k, find the longest substring of given string containing k distinct characters. If k is more than number of distinct characters in the string, return the whole string.

For example, consider string abcbdbdbbdcdabd

For k = 2, o/p is ‘bdbdbbd’

For k = 3, o/p is ‘bcbdbdbbdcd’

For k = 5, o/p is ‘abcbdbdbbdcdabd’

A simple solution would be to generate all substrings of the given string and return longest substring containing k distinct characters. The time complexity of this solution would be exponential.

We can easily solve this problem in O(n^{}) time and O(n) space. The idea is to use **sliding window** technique. In sliding window technique, we maintain a window that satisfies the problem constraints. The window is unstable if it violates the problem constraints and it tries stabilize by increasing or decreasing it’s size.

So for current problem, the window (subarray) is stable if it contains k distinct characters at any point. If window contains less than k distinct characters, it expands by adding characters to it from the right else if window contains more than k distinct characters, it shrinks by removing characters from the left till it becomes stable again. The window in the stable state tends to increase its size by adding characters to it till it becomes unstable again. We continue this process till the window reaches the last character in the string. At each point the window size changes, we update the maximum window size.

**C++ implementation –**

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
#include <bits/stdc++.h> using namespace std; // define character range #define CHAR_RANGE 128 // Function to find longest substring of given string containing // k distinct characters using sliding window string longestSubstr(string str, int k, int n) { // stores longest substring boundaries int end = 0, begin = 0; // set to store distinct characters in a window unordered_set<char> window; // count array to store frequency of characters present in // current window // we can also use a map instead of count array int freq[CHAR_RANGE] = { 0 }; // [low..high] maintain sliding window boundaries for (int low = 0, high = 0; high < n; high++) { window.insert(str[high]); freq[str[high]]++; // if window size is more than k, remove characters from the left while (window.size() > k) { // if the frequency of leftmost character becomes 0 after // removing it in the window, remove it from set as well if (--freq[str[low]] == 0) window.erase(str[low]); low++; // reduce window size } // update maximum window size if necessary if (end - begin < high - low) { end = high; begin = low; } } // return longest substring found at str[begin..end] return str.substr(begin, end - begin + 1); } // main function int main() { string str = "abcbdbdbbdcdabd"; int k = 2; int n = str.length(); cout << longestSubstr(str, k, n); return 0; } |

**Output: **

bdbdbbd

The time complexity of above solution is O(n^{}) as it does 2 traversal of the given string.

**Thanks for reading.**

Please use ideone or C++ Shell or any other online compiler link to post code in comments.

Like us? Please spread the word and help us grow. Happy coding 🙂

## Leave a Reply