🗜️Using Cantor-Pairing as a String Compression

#javascript #algorithms

This compression method may have already been invented, but I'll share it nonetheless.

Cantor-Pairing, an algorithm combining two numbers into one, proves effective for string compression. Through experimentation in JavaScript, I discovered a solution.

During compression, characters are grouped into pairs (or singles):

hello => he, ll, o
world! => wo, rl, d!

These pairs convert to paired numbers using corresponding character Unicode. The resulting string includes non-Latin characters like Chinese, hieroglyphics, Arabic, emojis, etc.

function pair(a, b) {
  return 0.5 * (a + b) * (a + b + 1) + b;
}

For decompression, characters' Unicode reverses via the inverse Cantor-Pairing algorithm, returning the original string.

function unpair(n) {
  var w = Math.floor((Math.sqrt(8 * n + 1) - 1) / 2);
  var t = (w ** 2 + w) / 2;
  return [w - (n - t), n - t];
}

For further information about this algorithm, here are the pros, cons, and considerations:

Pros:

Fast processing.
Effective reduction of string size by half.

Cons:

Limited universality due to non-standard characters.

Considerations:

Avoid compressing an already compressed string to prevent incorrect Unicode.
Exercise caution with short strings, as they may lead to corrupted output.

If you are interested, check out the gist.

DEV Community

🗜️Using Cantor-Pairing as a String Compression

Top comments (0)

Read next

While Loops

Introducing ReExt: Revolutionizing React Development

JavaScript Higher-Order Functions Made Easy: Learn with a Real-Life Example! 💡

Crafting a Futuristic Neon Button with Advanced Animations Gladiators Battle🚀