Find the longest substring of a string containing `k` distinct characters
Given a string and a positive number k
, find the longest substring of the string containing k
distinct characters. If k
is more than the total number of distinct characters in the string, return the whole string.
The problem differs from the problem of finding the longest subsequence with k
distinct characters. Unlike subsequences, substrings are required to occupy consecutive positions within the original string.
For example, consider string abcbdbdbbdcdabd
.
For k = 3, o/p is ‘bcbdbdbbdcd’
For k = 5, o/p is ‘abcbdbdbbdcdabd’
A simple solution would be to generate all substrings of the given string and return the longest substring containing k
distinct characters. The time complexity of this solution is O(n3) since it takes O(n2) time to generate all substrings for a string of length n
and O(n) time to process each substring.
We can easily solve this problem in O(n) time and O(n) space. The idea is to use a sliding window technique. In the sliding window technique, a window is maintained that satisfies the problem constraints. The window is unstable if it violates the problem constraints, and it tries to stabilize by increasing or decreasing its size.
The window (substring) is stable for the current problem if it contains k
distinct characters at any point. If the window has less than k
distinct characters, it expands by adding characters to it from the right; otherwise, if the window contains more than k
distinct characters, it shrinks by removing characters from the left until it becomes stable again. The steady-state window tends to increase its size by adding characters to it until it becomes unstable again. We continue this process until the window reaches the last character in the string. At each point the window size changes, update the maximum window size.
The algorithm can be implemented as follows in C++, Java, and Python:
C++
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
#include <iostream> #include <string> #include <unordered_set> using namespace std; // Define the character range #define CHAR_RANGE 128 // Function to find the longest substring of a given string containing // `k` distinct characters using a sliding window string findLongestSubstring(string str, int k, int n) { // stores the longest substring boundaries int end = 0, begin = 0; // set to store distinct characters in a window unordered_set<char> window; // Count array `freq` stores the frequency of characters present in the // current window. We can also use a map instead of a count array. int freq[CHAR_RANGE] = { 0 }; // `[low…high]` maintains the sliding window boundaries for (int low = 0, high = 0; high < n; high++) { window.insert(str[high]); freq[str[high]]++; // if the window size is more than `k`, remove characters from the left while (window.size() > k) { // If the leftmost character's frequency becomes 0 after // removing it in the window, remove it from the set as well if (--freq[str[low]] == 0) { window.erase(str[low]); } low++; // reduce window size } // update the maximum window size if necessary if (end - begin < high - low) { end = high; begin = low; } } // return the longest substring found at `str[begin…end]` return str.substr(begin, end - begin + 1); } int main() { string str = "abcbdbdbbdcdabd"; int k = 2; int n = str.length(); cout << findLongestSubstring(str, k, n); return 0; } |
Output:
bdbdbbd
Java
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
import java.util.HashSet; import java.util.Set; class Main { // Define the character range public static final int CHAR_RANGE = 128; // Function to find the longest substring of a given string containing // `k` distinct characters using a sliding window public static String findLongestSubstring(String str, int k) { // base case if (str == null || str.length() == 0) { return str; } // stores the longest substring boundaries int end = 0, begin = 0; // set to store distinct characters in a window Set<Character> window = new HashSet<>(); // Count array `freq` stores the frequency of characters present in the // current window. We can also use a map instead of a count array. int[] freq = new int[CHAR_RANGE]; // `[low…high]` maintains the sliding window boundaries for (int low = 0, high = 0; high < str.length(); high++) { window.add(str.charAt(high)); freq[str.charAt(high)]++; // if the window size is more than `k`, remove characters from the left while (window.size() > k) { // If the leftmost character's frequency becomes 0 after // removing it in the window, remove it from the set as well if (--freq[str.charAt(low)] == 0) { window.remove(str.charAt(low)); } low++; // reduce window size } // update the maximum window size if necessary if (end - begin < high - low) { end = high; begin = low; } } // return the longest substring found at `str[begin…end]` return str.substring(begin, end + 1); } public static void main(String[] args) { String str = "abcbdbdbbdcdabd"; int k = 2; System.out.print(findLongestSubstring(str, k)); } } |
Output:
bdbdbbd
Python
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
# define the character range CHAR_RANGE = 128 # Function to find the longest substring of a given containing # `k` distinct characters using a sliding window def findLongestSubstring(s, k): # stores the longest substring boundaries end = begin = 0 # set to store distinct characters in a window window = set() # `freq` stores the frequency of characters present in the # current window. We can also use a dictionary instead. freq = [0] * CHAR_RANGE # `[low…high]` maintains the sliding window boundaries low = high = 0 while high < len(s): window.add(s[high]) freq[ord(s[high])] = freq[ord(s[high])] + 1 # if the window size is more than `k`, remove characters from the left while len(window) > k: # If the leftmost character's frequency becomes 0 after # removing it in the window, remove it from the set as well freq[ord(s[low])] = freq[ord(s[low])] - 1 if freq[ord(s[low])] == 0: window.remove(s[low]) low = low + 1 # reduce window size # update the maximum window size if necessary if end - begin < high - low: end = high begin = low high = high + 1 # return the longest substring found at `s[begin…end]` return s[begin:end + 1] if __name__ == '__main__': s = 'abcbdbdbbdcdabd' k = 2 print(findLongestSubstring(s, k)) |
Output:
bdbdbbd
The time complexity of the above solution is O(n) as it does two traversals of the given string of length n
.
Find the longest substring of a string containing distinct characters
Construct the longest palindrome by shuffling or deleting characters from a string
Check if a repeated subsequence is present in a string or not
Thanks for reading.
To share your code in the comments, please use our online compiler that supports C, C++, Java, Python, JavaScript, C#, PHP, and many more popular programming languages.
Like us? Refer us to your friends and support our growth. Happy coding :)