Run Length Encoding (RLE) is a simple lossless data compression algorithm that works by replacing consecutive occurrences of a character with that character followed by the number of times it appears consecutively. For example, the string "AAABBBCCCC" would be compressed to "A3B3C4".
RLE is commonly used in situations where the data being compressed contains long runs of identical values, such as in images, audio, or video files.
Compression
def compress(string)
compressed = ""
count = 1
prev_char = string[0]
string.chars.each_with_index do |char, index|
if index == 0
next
end
if char == prev_char
count += 1
else
compressed += prev_char + count.to_s
count = 1
prev_char = char
end
end
compressed += prev_char + count.to_s
if compressed.length < string.length
return compressed
else
return string
end
end
The compress function takes a string as input and initializes two variables, compressed and count, to empty strings and 1, respectively. It also initializes prev_char to the first character in the input string.
The function iterates over each character in the input string using the chars method. It uses the each_with_index method to get both the character and its index in the string.
If the character is the first in the string (i.e. the index is 0), the function skips it and moves on to the next character.
If the current character is the same as the previous character, the function increments the count variable by 1.
If the current character is different from the previous character, the function appends the previous character and its count to the compressed string, resets the count variable to 1, and sets the prev_char variable to the current character.
After the loop finishes, the function appends the last character and its count to the compressed string.
Finally, the function checks whether the length of the compressed string is less than the length of the input string.
If it is, the function returns the compressed string. If not, it returns the original input string.
require 'minitest/autorun'
class TestRLE < Minitest::Test
def test_compresses_simple_string
assert_equal 'A3B3C4', compress('AAABBBCCCC')
end
def test_does_not_compress_short_string
assert_equal 'ABC', compress('ABC')
end
def test_compresses_string_with_spaces
assert_equal 'A2 B2 C2', compress('AA BB CC')
end
def test_compresses_string_with_numbers
assert_equal 'A2B2C2D2', compress('AABBCCDD')
assert_equal '1A2B3C', compress('ABBCCC')
end
def test_does_not_compress_non_ascii_characters
assert_equal 'Α3Β3Γ3', compress('ΑΑΑΒΒΒΓΓΓ')
end
end
Decompression
def decompress(string)
decompressed = ""
i = 0
while i < string.length do
char = string[i]
if char.match?(/[a-zA-Z]/)
if i + 1 < string.length && i + 2 < string.length && string[i+1].match?(/\d/) && string[i+2].match?(/\d/)
count = string[i+1..i+2].to_i
i += 3
elsif i + 1 < string.length && string[i+1].match?(/\d/)
count = string[i+1].to_i
i += 2
else
raise "Invalid input string"
end
decompressed += char * count
else
raise "Invalid input string"
end
end
decompressed
end
The decompress function takes a compressed string as input and initializes two variables, decompressed and i, to empty strings and 0, respectively.
The function enters a while loop that continues as long as i is less than the length of the input string.
Inside the loop, the function gets the current character at index i and checks whether it is a letter using a regular expression. If it is not a letter, the function raises an error.
If the current character is a letter, the function gets the next two characters as a substring starting at index i+1, and converts it to an integer using the to_i method.
The function appends char repeated count times to the decompressed string.
The function increments i by 3, since each letter in the compressed string is followed by two digits representing its count.
After the loop finishes, the function returns the decompressed string.
require 'minitest/autorun'
class TestRLE < Minitest::Test
# ...
def test_decompresses_simple_string
assert_equal 'AAABBBCCCC', decompress('A3B3C4')
end
def test_does_not_decompress_non_ascii_characters
assert_raises(RuntimeError) { decompress('Α3Β3Γ3') }
end
def test_raises_error_with_invalid_input_string
assert_raises(RuntimeError) { decompress('A3B3C4D5') }
end
end
Conclusion
In conclusion, the Run-Length Encoding (RLE) algorithm is a simple and effective method for compressing data that contains long runs of identical characters. It works by replacing each run with a count of the number of consecutive characters, followed by the character itself.
RLE can be easily implemented in any programming language and is particularly useful for compressing data with a high degree of repetition, such as text files, images, and videos. However, RLE may not be effective for compressing data with a low degree of repetition or complex patterns.
To use RLE, we first compress the data using a function that takes the input string as an argument and returns the compressed string. Then, to decompress the data, we use a separate function that takes the compressed string as an argument and returns the original uncompressed string.
When implementing RLE, it's important to handle edge cases, such as strings with special characters or invalid input strings, and to test the functions thoroughly to ensure that they work correctly in all scenarios.
Overall, RLE is a useful and widely used algorithm for data compression, and understanding how it works can help developers create more efficient and optimized applications.
Top comments (0)