Run Length Encoding (RLE) Data Compression Algorithm
Run–length encoding (RLE) is a simple form of lossless data compression that runs on sequences with the same value occurring many consecutive times. It encodes the sequence to store only a single value and its count.
For example, consider a screen containing plain black text on a solid white background. There will be many long runs of white pixels in the blank space and many short runs of black pixels within the text.
WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWWWith a run–length encoding (RLE) data compression algorithm applied to the above hypothetical scan line, it can be rendered as 12W1B12W3B24W1B14W. This can be interpreted as a sequence of twelve W’s, one B, twelve W’s, three B’s, etc.
The idea is to run a linear scan on the string, and for each distinct character, append the character and its consecutive occurrence in the output string.
The algorithm can be implemented as follows in C++, Java, and Python. Note that the output size will double the input size in the worst case, so the algorithm can’t run in-place. e.g. ABCD —> A1B1C1D1.
C++
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
#include <iostream> #include <string> using namespace std; // Perform Run–length encoding (RLE) data compression algorithm // on string `str` string encode(string str) { // stores output string string encoding = ""; int count; for (int i = 0; str[i]; i++) { // count occurrences of character at index `i` count = 1; while (str[i] == str[i + 1]) { count++, i++; } // append current character and its count to the result encoding += to_string(count) + str[i]; } return encoding; } int main() { string str = "ABBCCCD"; cout << encode(str); return 0; } |
Output:
1A2B3C1D
Java
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
class Main { // Perform Run–length encoding (RLE) data compression algorithm // on string `str` public static String encode(String str) { // stores output string String encoding = ""; // base case if (str == null) { return encoding; } int count; for (int i = 0; i < str.length(); i++) { // count occurrences of character at index `i` count = 1; while (i + 1 < str.length() && str.charAt(i) == str.charAt(i + 1)) { count++; i++; } // append current character and its count to the result encoding += String.valueOf(count) + str.charAt(i); } return encoding; } public static void main(String[] args) { String str = "ABBCCCD"; System.out.print(encode(str)); } } |
Output:
1A2B3C1D
Python
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
# Perform Run–length encoding (RLE) data compression algorithm on string `str` def encode(s): encoding = "" # stores output string i = 0 while i < len(s): # count occurrences of character at index `i` count = 1 while i + 1 < len(s) and s[i] == s[i + 1]: count = count + 1 i = i + 1 # append current character and its count to the result encoding += str(count) + s[i] i = i + 1 return encoding if __name__ == '__main__': s = 'ABBCCCD' print(encode(s)) |
Output:
1A2B3C1D
The time complexity of the above solution is O(n), where n is the length of the input string and doesn’t require any extra space.
References: Run–length encoding – Wikipedia
Thanks for reading.
To share your code in the comments, please use our online compiler that supports C, C++, Java, Python, JavaScript, C#, PHP, and many more popular programming languages.
Like us? Refer us to your friends and support our growth. Happy coding :)