Find all substrings containing exactly `k` distinct characters
Given a string and a positive integer k, find all distinct substrings of any length containing exactly k distinct characters.
Please note that the problem specifically targets substrings that are contiguous (i.e., occupy consecutive positions) and inherently maintains the order of elements.
For example,
Output: [abc, abcb, bcbd, cbd]
Input: str = abcadce, k = 4
Output: [abcad, abcadc, bcad, bcadc, cadce, adce]
Input: str = aa, k = 1
Output: [a, aa]
A simple solution would be to generate all substrings of the given string and return substrings containing exactly k distinct characters. The time complexity of this solution is O(n3) since it takes O(n2) time to generate all substrings for a string of length n and O(n) time to process each substring.
We can improve the time complexity using extra space. The idea is to process substrings starting with each index i in the input string. In each iteration of the loop, add each character present in the processed substring to a hash table. If the count of distinct characters becomes k at index j at any point of time, include substring str[i…j] in the result.
Following is the C++, Java, and Python program that demonstrates it:
C++
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
#include <iostream> #include <string> #include <unordered_set> using namespace std; // Function to find all distinct substrings containing exactly `k` distinct characters unordered_set<string> findDistinctSubstrings(string str, int k) { // create a set to store substrings containing exactly `k` distinct characters unordered_set<string> result; // in each iteration of the loop, consider substring starting with `str[i]` for (int i = 0; i < str.length() - k + 1; i++) { // create a set to store distinct characters in the current substring unordered_set<char> chars; // process substring starting with `str[i]` for (int j = i; j < str.length() && chars.size() <= k; j++) { // insert current character `str[j]` into the hash set chars.insert(str[j]); /* If current character `str[j]` is seen before in the substring `str[i…j-1]`, the count remains the same since the hash set only allows unique values */ // if the count of distinct characters becomes `k` if (chars.size() == k) { // add the current substring to the result result.insert(str.substr(i, j - i + 1)); } } } return result; } int main() { string str = "abcadce"; int k = 4; unordered_set<string> result = findDistinctSubstrings(str, k); for (string s: result) { cout << s << endl; } return 0; } |
Output:
adce
abcadc
bcadc
abcad
bcad
cadce
Java
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
import java.util.HashSet; import java.util.Set; class Main { // Function to find all distinct substrings containing exactly `k` distinct // characters public static Set<String> findDistinctSubstrings(String str, int k) { // create a set to store substrings containing exactly `k` distinct characters Set<String> result = new HashSet<>(); // base case if (str == null) { return result; } // in each iteration of the loop, consider substring starting with `str[i]` for (int i = 0; i < str.length() - k + 1; i++) { // create a set to store distinct characters in the current substring Set<Character> chars = new HashSet<>(); // process substring starting with `str[i]` for (int j = i; j < str.length() && chars.size() <= k; j++) { // insert current character `str[j]` into the hash set chars.add(str.charAt(j)); /* If current character `str[j]` is seen before in the substring `str[i…j-1]`, the count remains the same since the hash set only allows unique values */ // if the count of distinct characters becomes `k` if (chars.size() == k) { // add the current substring to the result result.add(str.substring(i, j + 1)); } } } return result; } public static void main(String[] args) { String str = "abcadce"; int k = 4; Set<String> result = findDistinctSubstrings(str, k); System.out.println(result); } } |
Output:
[abcad, abcadc, cadce, adce, bcad, bcadc]
Python
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
# Function to find all distinct substrings containing exactly `k` distinct characters def findDistinctSubstrings(s, k): # create a set to store substrings containing exactly `k` distinct characters result = set() # in each iteration of the loop, consider substring starting with `s[i]` for i in range(len(s) - k + 1): # create a set to store distinct characters in the current substring chars = set() # process substring starting with `s[i]` for j in range(i, len(s)): # insert current character `s[j]` into the hash set chars.add(s[j]) ''' If current character `s[j]` is seen before in the substring `s[i…j-1]`, the count remains the same since the hash set only allows unique values ''' # if the count of distinct characters becomes `k` if len(chars) == k: # add the current substring to the result result.add(s[i: j + 1]) return result if __name__ == '__main__': s = 'abcadce' k = 4 result = findDistinctSubstrings(s, k) print(result) |
Output:
{‘bcad’, ‘adce’, ‘cadce’, ‘bcadc’, ‘abcad’, ‘abcadc’}
The time complexity of the above solution is O(n2) and requires O(n) extra space, where n is the length of the input string.
Exercise: Modify the solution to find substrings of length k containing exactly k distinct characters.
Find the longest substring of a string containing `k` distinct characters
Find the longest substring of a string containing distinct characters
Thanks for reading.
To share your code in the comments, please use our online compiler that supports C, C++, Java, Python, JavaScript, C#, PHP, and many more popular programming languages.
Like us? Refer us to your friends and support our growth. Happy coding :)