DEV Community

Cover image for Huffman coding in Javascript
Vincent Corbee
Vincent Corbee

Posted on

Huffman coding in Javascript

Often I find my self fascinated by some piece of software or algorithm and get so excited that I think to my self: I‘m going to build that! And every time thinking: This can’t be so hard to build.. And then realising it is and see all my hopes and dreams shattered. 😭 Ok slight exaggeration for some extra drama. But most of the time what I found is when I embark on such an adventure, I have a hard time finding good sources, especially in Javascript, that I can understand. So with that in mind I decided to write this post, in the hope that someone else who is also interested in these esoteric topics might find this helpful.

When I was trying to figure out how to read a JPEG file in Javascript, there was a segment called DHT: Define Huffman Table. Not knowing what Huffman was, I of course professionally Googled it. It turned out to be a compression algorithm for compressing text. For example, Huffman encoding is also used in the deflate algorithm which in turn is used in, to most familiar, Gzip. Obviously the first thing I thought was, I have to build this in Javascript. I mean how hard could it be right? 🤓

So for anyone who is interested, the following is an exploration into how to implement Huffman coding in Javascript from scratch. At the end of this post we will, hopefully, have created a working library that is able to encode and decode a text with Huffman coding. 🥹

Fyi, I don’t use semicolons when I don’t have to. Hey, where talking about text compression, why waste the extra space. 🧐

Along the way we are also touching on the topic of how to work with binary data in Javascript because we will be needing it in implementing Huffman coding.


So the first question is, what is Huffman coding?

Huffman coding is a lossless compression method for compressing text.
What does this mean? It means that with Huffman coding the original text can be stored with fewer bits, at least most of the time, without the loss of information.

If we assume that we are working with text where are characters are stored as 8bits, 1 byte, then the total amount of bits required to store a text of n characters is n * 8 bits.

So For example if we want to store the text: abracadabra with a length of 11 characters we need 88 bits or 11 bytes. So how can we store the same information in fewer bits?

As the title implies, one way of doing this is through Huffman coding. The output that Huffman coding produces is a variable length code table. This code table maps characters to a code that are generated during the coding process. On average, these codes will be smaller than 8 bits required to store the actual character.

So how does Huffman coding work?

Creating the frequentie table

We first starts by creating a frequentie table. This is a table of the characters based on the occurrence. This table is then ordered based on the frequenties from highest to lowest.

If we take our previous text: abracadabra our frequentie table will look like this:

a: 5
b: 2
r: 2
c: 1
d: 1
Enter fullscreen mode Exit fullscreen mode

Creating a Huffman tree

The next step is to convert this into what is called a Huffman tree.

We start by taking the characters with the lowest frequentie. In our case that is c and d and convert them into an intermediary node, which is the sum of both frequenties, and two leaf nodes with the respective character and frequentie.

                              dc:2

                          d:1    c:1
Enter fullscreen mode Exit fullscreen mode

We then move this new intermediary node back in to the list based on the frequentie of the node. In our case that new node will be inserted after the r. So our new list will look like this:

a:  5
b:  2
r:  2
dc: 2 -> Which is the node we just created.
Enter fullscreen mode Exit fullscreen mode

We then repeat this process until we are left with a single node in our list. The resulting list will look like this:

rdcba: 11
And our Huffman tree looks like this:

                            rdcba:11 

                       rdc:4        ba:7

                     r:2  dc:2    b:2  a:5 

                        d:1  c:1
Enter fullscreen mode Exit fullscreen mode

Note that the root node’s frequentie is the sum of all characters.

Constructing the code table

Now from this binary tree we can construct our code table. So how do we do this? Well, we go down the tree and for every time we go right we wright down a 1 and when we go left we wright down a 0 until we reach a leaf node.

For the character a we first go left then right so we wright down 01.
If we do this for every character in the tree we end up with these codes:

a: 11
b: 10
r: 00
c: 011
d: 010

Compress the data

To compress the data, instead of storing the character as is, we store the code in the associated with the character in the encoding table above. In our example abracadabra we will end up with the following encoding:

11 10 00 11 011 11 010 11 10 00 11

But we can’t store only this compressed data because then we have no way to decompress is. You always need a certain “view” on the data, i.e. you need to know how to interpet it. So to be able to decompress this data, we need to store our encoding table along with te compressed data. Because we need to store this extra data, especially for shorter texts, we will actually expand our data instead of reducing it.

Note that the choice for 1 = right and 0 = left is arbitrary. You could also use 0 for right and 1 for left. Also observe that you could swap the second and third node and end up with different codes but with the same size code sizes for the same characters. Thus you can create more than one tree from the same input. For the end result is doesn’t matter because we add the encoding table along with the compressed data.

Decompress the data

To decompress the data we can simply look up a sequence of bits in the encoding table to see which character to use.


Now let’s implement it in Javascript

You can type along or clone the repo from here: https://github.com/vincentcorbee/Huffman

Fyi, this implementation is based on the premise that every character in the source text can be read as an unsigned 8 bit integer. So if you are feeling adventures, you can the ability to use multibyte characters. Further I try to be as descriptive as possible in my function and variable naming so hopefully the code itself can in large part be self explanatory.

So let’s get typing. First we create a new directory called huffman and initialise a new project. We are going to be making this without any dependencies. But since we are going to be using Typescript we are going to install typescript and ts-node and types/nodes a dev dependency.

touch huffman && cd huffman

npm init -y

npm i -D ts-node typescript @types/node

npx tsc --init

mkdir src && mkdir src/lib && mkdir src/modules && mkdir src/samples

touch src/lib/index.ts && touch src/modules/index.ts && touch src/index.ts && touch src/types.ts
Enter fullscreen mode Exit fullscreen mode

In **index.ts **add the following.

export * from './encode'
export * from './decode'
Enter fullscreen mode Exit fullscreen mode

In lib/index.ts add the following.

export * from './create-frequentie-table'
export * from './create-huffman-tree'
export * from './create-encoding-table'
export * from './encode-data'
export * from './to-array-buffer'
export * from './get-bit-count'
Enter fullscreen mode Exit fullscreen mode

And in modules/index.ts add.

export * from './binary-reader'
export * from './binary-writer'
Enter fullscreen mode Exit fullscreen mode

All of the above functionality we will be creating at the appropriate time.

In our src folder create a new file called encode.ts. In this file we are going to create a function that takes a string and outputs an array buffer with our compressed data along with the encoding table.

import { createFrequentieTable, createHuffmanTree, createEncodingTable, encodeData, toArrayBuffer } from './lib'

export const encode = (input: string) => {
  const frequentieTable = createFrequentieTable(input)
  const huffmanTree = createHuffmanTree(frequentieTable)
  const encodingTable = createEncodingTable(frequentieTable, huffmanTree)

  const encodedData = encodeData(encodingTable)(input)

  return toArrayBuffer(encodedData, encodingTable)
}
Enter fullscreen mode Exit fullscreen mode

Inside our function encode we first create a frequentie table. Then we use this function to create our Huffman tree. With the frequentie table and Huffman tree we construct our encoding table. We can then proceed and encode our original data. The last thing we do is combine our encoded data and encoding table into an array buffer and return that.

So the first step is to create our frequentie tabel. In our lib folder we create a new file called create-frequentie-tabel.ts.

import { FrequentieTable } from '../types'

export const createFrequentieTable = (input: string) => {
  const frequentieTabel: FrequentieTable = {}

  for (let i = 0, l = input.length; i < l; i++) {
    const char = input.charAt(i)

    if (frequentieTabel[char] !== undefined) {
      frequentieTabel[char] ++
    } else {
      frequentieTabel[char] = 1
    }
  }

  return frequentieTabel
}
Enter fullscreen mode Exit fullscreen mode

In types.ts we add the type we imported above.

export type FrequentieTable = Record<string, number>
Enter fullscreen mode Exit fullscreen mode

In this function we first create an empty table. We then loop through every character and when it is in the table, we increment it’s value and if it is not in our table we add it and set it’s value to one. Once the entire input is processed, we return our table.

Next up, the huffman tree. Still in our lib folder, create a file called create-huffman-tree.ts.

import { FrequentieTable, HuffmanTree, IntermediaryNode, Node } from '../types'

const getNewIndex = (entries: Node[], count: number) => {
  let index = entries.length - 1

  while(index > 0) {
    if (count <= entries[index][1]) return index

    index--
  }

  return 0
}

const getSortedEntries = (frequentieTable: FrequentieTable): Node[] =>
  Object.entries(frequentieTable).sort((a, b) => b[1] - a[1])

export const createHuffmanTree = (frequentieTable: FrequentieTable): HuffmanTree => {
  const entries = getSortedEntries(frequentieTable)

  while(entries.length > 1) {
    const left = entries.pop() as Node
    const right = entries.pop() as Node

    const [charLeft, countLeft] = left
    const [charRight, countRight] = right

    const newCount = countLeft + countRight
    const newIndex = getNewIndex(entries, newCount)

    const intermediaryNode: IntermediaryNode = [`${charLeft}${charRight}`, newCount, [left, right]]

    entries.splice(newIndex, 0, intermediaryNode)
  }

  return entries as HuffmanTree
}
Enter fullscreen mode Exit fullscreen mode

So this function needs some more explanation. Because we created our frequentie table as a simple hash map, it is not sorted on frequentie. So first we are going to converted it to a sorted array based on frequentie from greatest to lowest in the form of:
[ [ string, number] ]

Then we our going to loop through our list and on each iteration we create a new intermediary node and put it back in the list at the correct index based on the new frequentie of the new node. The intermediary node that we create is a tripple in the form of [ characters, frequentie, [ left node, right node]].

We continue this process until we end up with a single node, the root of our Huffman tree. When de loop terminates we return the Huffman tree.

We also need to add a few new types to types.ts.

// After previous types

export type IntermediaryNode = [string, number, [Node, Node]]

export type LeafNode = [string, number]

export type Node = IntermediaryNode | LeafNode

export type HuffmanTree = [IntermediaryNode]
Enter fullscreen mode Exit fullscreen mode

No we have created the function for our Huffman tree, it is time to wright the function that constructs the encoding table. In our lib folder, create a new file called create-encoding-table.ts.

import { EncodingTable, HuffmanTree, Node, Encoding, FrequentieTable } from '../types'

const traverseHuffmanTree = (node: Node, char: string, encoding: Encoding = []): Encoding => {
  /* We don't need the count, so will only be using the characters and children */
  const [characters, _count, children] = node

  if (char === characters) return encoding

  if (children) {
    const [left, right] = children

    /* If left node matches go left and add 0 to the encoding */
    if (left[0].includes(char)) return traverseHuffmanTree(left, char, [...encoding, 0])

    /* If right node matches go right and add 1 to the encoding */
    if (right[0].includes(char)) return traverseHuffmanTree(right, char, [...encoding, 1])
  }

  throw Error(`No match found for: ${char}`)
}

const getCharacterEncoding = (huffmanTree: HuffmanTree) => (char: string) => {
  const [rootNode] = huffmanTree

  return traverseHuffmanTree(rootNode, char)
}

export const createEncodingTable = (frequentieTable: FrequentieTable, huffmanTree: HuffmanTree): EncodingTable =>
  Object.keys(frequentieTable).reduce((table, char) => {
    table[char] = getCharacterEncoding(huffmanTree)(char)

    return table
  }, {} as EncodingTable)
Enter fullscreen mode Exit fullscreen mode

In this function we are going to map over our frequentie table and for every character in the table we are going to retrieve our new character code from our Huffman tree. This code is going to be an array of 1’s and 0’s. To do this we have to traverse the Huffman tree and look for a matching leaf node.

So for every character in the table we call getCharacterEncoding with our Huffman tree and the character we want the code for. This function then call traverseHuffmanTree with the root node and the character.

In traverseHuffmanTree, the first thing we do is check if the character matches the character in the node. If it matches we have found a matching leaf node and return the encoding, success. 🥳 Note that at this point we don’t care about the count.

If we don’t have a match, but we do have children, i.e. it is an intermediary node and we check if either the left or right node’s characters includes the character we want our code for.

If the left matches we traverse the left node and add a 0 to the encoding. And if the right matches we traverse the right node and add a 1 to the encoding.

If we don’t have a match and none of the children includes the character, we throw an error because our character does not exist in our Huffman tree and we burst into tears. 😭

In types.ts add the following types.

// After previous types

export type Encoding = number[]

export type EncodingTable = Record<string, Encoding>
Enter fullscreen mode Exit fullscreen mode

So at this point, we have all we need to start encoding our data. We have created our frequentie table, we then used this to construct our Huffman tree and finally we have created our encoding table.

In our lib folder we will create a file called encode-data.ts.

import { EncodedData, EncodingTable } from '../types'

export const encodeData = (encodingTable: EncodingTable) => (input: string): EncodedData =>
  input
    .split('')
    .flatMap(char => encodingTable[char])
Enter fullscreen mode Exit fullscreen mode

Also in types.ts add the following.

// After previous types

export type EncodedData = number[]
Enter fullscreen mode Exit fullscreen mode

The function encodeData takes in an encoding table and returns a function that takes in a string, our input. We then split this string into separate characters and for every character we select our matching code, which is an array of 1’s and 0’s. We use flatMap because we are returning a single array of 1’s and 0's.

Now the last thing we need to do is take our encoded data and our encoding table and put them together. We are going to store our entire output in an ArrayBuffer.

We are going to create two modules to work more easily with our binary data. One is a reader and the other is a writer that extends from the reader. First we are gonna create the binary reader. In modules create a file called binary-reader.ts.

import { getBitCount } from "../lib"

export class BinaryReader {
  protected chunk: number
  protected pos: number
  protected bitPos: number
  protected bitCount: number

  view: DataView

  constructor(arrayBuffer: ArrayBuffer) {
    this.view = new DataView(arrayBuffer)
    this.pos = 0
    this.bitPos = -1
    this.chunk = 0
    this.bitCount = 0
  }

  protected getData(type = 'Uint8') {
    if (this.view.byteLength > this.pos) {
      this.bitPos = -1

      // @ts-ignore
      return this.view[`get${type}`](this.pos++)
    }

    throw new RangeError()
  }

  get buffer () {
    return this.view.buffer
  }

  getBytePosition() {
    return this.pos
  }

  seek(pos: number) {
    const oldPos = this.pos

    this.pos = pos

    return oldPos
  }

  peak(index = this.pos + 1) {
    if (this.view.byteLength > index && index > -1) {
      return this.view.getUint8(index)
    }

    return null
  }

  peakBit() {
    const chunk = this.chunk
    const pos = this.pos
    const bitPos = this.bitPos
    const bitCount = this.bitCount
    const bit = this.getBit()

    this.bitPos = bitPos
    this.chunk = chunk
    this.pos = pos
    this.bitCount = bitCount

    return bit
  }

  getPadSize() {
    if (this.chunk === null) {
      return 0
    } else {
      const bitCount = getBitCount(this.chunk)

      return 8 - bitCount
    }
  }

  getBitPos() {
    return getBitCount(this.chunk) - 1 + this.getPadSize()
  }

  getBit() {
    if (this.bitPos === -1) {
      this.chunk = this.getData()

      this.bitPos = this.getBitPos()
      this.bitCount = getBitCount(this.chunk)
    }

    if (this.chunk === null) return null

    const bitCount = this.bitCount
    const bit = this.bitPos >= bitCount ? 0 : (this.chunk >> this.bitPos) & 1

    this.bitPos--

    return bit
  }

  getUint32() {
    return (
      (this.getData() << 24) |
      (this.getData() << 16) |
      (this.getData() << 8) |
      this.getData()
    )
  }

  getUint16() {
    return (this.getData() << 8) | this.getData()
  }

  getUint8() {
    return this.getData()
  }
}
Enter fullscreen mode Exit fullscreen mode

The constructor of BinaryReader takes an ArrayBuffer as input. What is an ArrayBuffer? It is a fixed length array that contains raw bytes. You can’t really do anything with it. It is simply a container. To read binary data, you need a specific window into your data. To see why you need this, consider the following. Let’s say you have the following sequence of bytes:

011000010110001001100011

How do you know what this means? This all depends on the context, without it you don’t know what to read. Let’s say the data above is part of a text which uses ASCII encoding. Then we know how to interpret it. We can read it as three blocks of unsigned 8 bit integers:

01100001|01100010|01100011

The three blocks would translate to 97 98 99 which represent the characters a b c.

But if a context dictates that we should read them as a unsigned 8 bit integer and a unsigned 16 bit integer we read the sequence as two blocks:

01100001|0110001001100011

The two blocks then translate to 97 and 25187. These two numbers then derive meaning further from the software in which it is used. This context is also what we will be providing for how to read and interpret our data.

In Javascript you use various arrays to get a window into the array buffer, e.g. Uint8Array for unsigned 8 bit integers, Uint16Array for unsigned 16 bit integers etc. Instead of using different types of array’s, our binary reader is going to use the DataView. This is a convenience class for reading and writing different types of numbers. Next to that, the DataView also handles endianness, i.e. the byte order.

Our class exposes methods to read bytes as uint8, uint16 and uint32. Also we have added the ability to read individual bits. Each of these methods internally call getData with the appropriate number type. This function in turns calls the appropriate function on our dataView with the right offset and then updates the internal position.

The reason we keep an internal position is that we can continuously read data without having to manually specify an offset as to where our view needs to look in our ArrayBuffer.

Note that instead of using dataView.getUint32 and dataView.getUint16, we are shift our result from getData by n bits and add the next result if any. The reason we do this is that we can used reuse getData and let our position be updated by that function for the number of bytes we have read.

Next to these reading methods we can also peak bits and peak bytes or go to a specific offset.

Now for our writer. In modules create a file called binary-writer.ts

import { BinaryReader } from './binary-reader'

export class BinaryWriter extends BinaryReader {
  constructor(length: number) {
    super(new ArrayBuffer(length))
  }

  private setData(data: number, type = 'Uint8') {
    let advance = 0

    switch(type) {
      case 'Uint16':
        advance = 2
        break;
      case 'Uint32':
        advance = 4
        break;
      default:
        advance = 1
    }

    if (this.view.byteLength > this.pos) {
      this.bitPos = -1

      // @ts-ignore
      this.view[`set${type}`](this.pos, data)

      this.pos += advance

      return this
    }

    return this
  }

  setUint32(data: number) {
    return this.setData(data, 'Uint32')
  }

  setUint16(data: number) {
    return this.setData(data, 'Uint16')
  }

  setUint8(data: number) {
    return this.setData(data)
  }
}
Enter fullscreen mode Exit fullscreen mode

The constructor of this class takes a length and calls super with a new ArrayBuffer with that length into which we will be writing and or reading our data. This class extends the reader. So next to all the read operations we are also able to write uint8, uint16 and uint32 into the underlying ArrayBuffer. The setData method also advances our current position based on the number type we are writing.

No it’s time to wright the actual function that creates our output. So in our lib folder, create a file called to-array-buffer.ts.

import { BinaryWriter } from '../modules'
import { Encoding, EncodingTable, EncodedData } from '../types'

const getCharacterCode = (char: string) => char.charCodeAt(0)

const mapCodeToInteger = (code: Encoding) =>
  code.reduce((acc, bit) => (acc << 1) | bit, 0)

const getByteCount = (bitCount: number) => {
  const remainder = bitCount % 8

  /* Round to whole bytes*/

  return (bitCount + (remainder ? 8 - remainder : 0)) / 8
}

const mapEncodingTable = (encodingTable: EncodingTable) => {
  let lengthEncodingTable = 0

  const entries = Object.entries(encodingTable).map(([ char, code]) => {
    const { length } = code

    /*
      We add 2 bytes for the length and character and 1 or 2 for the code depending
      on wether it's code length is bigger then 8 bits
    */

    lengthEncodingTable += 2 + (length > 8 ? 2 : 1)

    /* For each code, return a triplet [ number, number, number ] */

    return [length, getCharacterCode(char), mapCodeToInteger(code)]
  })

  return [ entries, lengthEncodingTable ] as const
}

export const toArrayBuffer = (data: EncodedData, encodingTable: EncodingTable) => {
  const BYTE_COUNT_LENGTH_ENCODING_TABLE = 2
  const BYTE_COUNT_LENGTH_ENCODED_DATA = 4
  const PADDING_ENCODED_DATA = 1

  const [ entries, lengthEncodingTable ]= mapEncodingTable(encodingTable)

  const { length: bitCount } = data
  const byteCount = getByteCount(bitCount)
  const padCountEncodedData = bitCount % 8

  const binaryWriter = new BinaryWriter(
    byteCount +
    BYTE_COUNT_LENGTH_ENCODING_TABLE +
    BYTE_COUNT_LENGTH_ENCODED_DATA +
    PADDING_ENCODED_DATA +
    lengthEncodingTable)

  binaryWriter.setUint16(lengthEncodingTable)

  entries.forEach(([length, charCode, code]) => {
    binaryWriter.setUint8(length)
    binaryWriter.setUint8(charCode)

    if (length > 8) binaryWriter.setUint16(code)
    else binaryWriter.setUint8(code)
  })

  let chunk = 0
  let index = 0

  binaryWriter.setUint32(byteCount)
  binaryWriter.setUint8(padCountEncodedData)

  while (index < bitCount) {
    if (index % 8 === 0 && index !== 0) {
      binaryWriter.setUint8(chunk)

      chunk = 0
    }

    /* Shift chunk one bit to the left and add current bit */

    chunk = (chunk << 1) | data[index]

    index++
  }

  if (chunk) {
    /* We add padding to fill out the last chunk to a whole byte */

    for (let pad = 0; pad < padCountEncodedData; pad++) chunk = chunk << 1

    binaryWriter.setUint8(chunk)
  }

  return binaryWriter.buffer
}
Enter fullscreen mode Exit fullscreen mode

The function toArrayBuffer takes as input our encoded data and the encoding table and outputs, as the function name suggests an ArrayBuffer.

As mentioned earlier, we need to store a couple of things in order to interpet our binary data later when we want to decompress it. So what we want to store in our ArrayBuffer is:

  • The encoding table
  • The length of the encoding table
  • The byte size of the source data
  • Number of pad bits if the length of our encoded data is not on a 8 bit boundary.
  • Our encoded data

So in order to be able to decode our encoded data, we need to store some meta data. This extra overhead means that, as we talked about in the beginning, that the size of the data expands. The example abracadabra that we have used in the beginning, actually expands from 11 bytes, the number of characters in the string, to 25 bytes.


Time to decode

Now that we have encoded our data, it is time to decode it again.

In our src folder, create a file called decode.ts. And add the following.

import { BinaryReader } from './modules'
import { HuffmanTree, IntermediaryNode, Node } from './types'

const getDecodedData = (binaryReader: BinaryReader) => (huffmanTree: HuffmanTree, bitCountEncodedData: number) => {
  const [root] = huffmanTree

  let bitsRead = 0

  const traverse = (node: Node, binaryReader: BinaryReader): string => {
    if (bitsRead > bitCountEncodedData) return ''

    const nodes = node[2]

    if (!nodes) return node[0]

    const bit = binaryReader.getBit()

    bitsRead++

    if (bit !== null) {
      if (nodes) return traverse(nodes[bit], binaryReader)
      else return node[0]
    } else {
      return ''
    }
  }

  let decodedData = '';

  while (bitsRead < bitCountEncodedData && binaryReader.peakBit() !== null) {
    decodedData += traverse(root, binaryReader)
  }

  return decodedData
}

const decodeHuffmanTree = (binaryReader: BinaryReader): HuffmanTree => {
  const root: IntermediaryNode = ['', 0, []]

  const huffmanTreeLength = binaryReader.getUint16()

  let indexHeader = 0

  while (indexHeader < huffmanTreeLength) {
    let lengthCode = binaryReader.getUint8()
    let indexCode = lengthCode - 1

    const char = String.fromCharCode(binaryReader.getUint8())
    const code = lengthCode > 8 ? binaryReader.getUint16() : binaryReader.getUint8()

    let entry: Node = root

    while (indexCode > -1) {
      const bit = (code >> indexCode) & 1

      entry[0] += char
      entry[1] += 1

      const nodes = entry[2] as Node[]
      const node = nodes[bit]

      if (node) {
        if (indexCode === 0) {
          nodes[bit] = [char, 1]
        } else {
          entry = node
        }
      } else if (indexCode === 0) {
        nodes[bit] = [char, 1]
      } else {
        const newEntry: IntermediaryNode = ['', 0, []]

        nodes[bit] = newEntry

        entry = newEntry
      }

      indexCode--
    }

    indexHeader += 2 + (lengthCode > 8 ? 2 : 1)
  }

  return [root]
}

const getBitCountEncodedData = (byteCountEncodedData: number, paddingSizeEncodedData: number) => byteCountEncodedData * 8 - paddingSizeEncodedData

export const decode = (buffer: ArrayBuffer) => {
  const binaryReader = new BinaryReader(buffer)

  const huffmanTree = decodeHuffmanTree(binaryReader)

  const byteCountEncodedData = binaryReader.getUint32()
  const paddingSizeEncodedData = binaryReader.getUint8()

  return getDecodedData(binaryReader)(huffmanTree, getBitCountEncodedData(byteCountEncodedData, paddingSizeEncodedData))
}
Enter fullscreen mode Exit fullscreen mode

The decode function takes in an array buffer. We first instantiate a new BinaryReader with our ArrayBuffer. Then we can start decoding our data.

The first thing we are going to do is decode our Huffman tree. Inside decodeHuffmanTree, the first thing we do is extract our Huffman tree length as a uint16. When we have our length, we can read out the data from our buffer. Remember that we stored every entry in our table as a triplet [ length of code: uint8, character: uint8, code: uint8 | uint16 (depending on the size of the code)].

After we created our Huffman tree, we read out our byte count for the encoded data as a uint32 and our padding size as a uint8. Armed with that information, we can start decoding our encoded data.

Inside the function getEncodedData, we keep traversing the Huffman tree for ever bit that we read until we reach a leaf node. If we do, we return it. We continue to do this for as long as we have bits to read.

When all bits have been processed, we should have decoded our encoded data back into the original.

Moment of truth
Now it is time to see wether or not this actually works.

In our src folder create a file called test.ts. We import three samples from the samples directory. You can find these samples in the github repo: https://github.com/vincentcorbee/Huffman/tree/main/src/samples.

import assert from 'assert'

import { encode, decode } from './'
import { sampleOne, sampleTwo, sampleThree } from './samples'

const encodedDataSampleOne = encode(sampleOne)
const encodedDataSampleTwo = encode(sampleTwo)
const encodedDataSampleThree = encode(sampleThree)

const decodedSampleOne = decode(encodedDataSampleOne)
const decodedSampleTwo = decode(encodedDataSampleTwo)
const decodedSampleThree = decode(encodedDataSampleThree)

assert(decodedSampleOne === sampleOne)
assert(decodedSampleTwo === sampleTwo)
assert(decodedSampleThree === sampleThree)

console.log({
  encoded: encodedDataSampleOne.byteLength,
  decoded: decodedSampleOne.length,
  orignal: sampleOne.length
})

console.log({
  encoded: encodedDataSampleTwo.byteLength,
  decoded: decodedSampleTwo.length,
  orignal: sampleTwo.length
})

console.log({
  encoded: encodedDataSampleThree.byteLength,
  decoded: decodedSampleThree.length,
  orignal: sampleThree.length
})
Enter fullscreen mode Exit fullscreen mode

Now if from our project root we run:

npx ts-node src/test.ts

{ encoded: 13268, decoded: 24543, orignal: 24543 }
{ encoded: 787, decoded: 1087, orignal: 1087 }
{ encoded: 25, decoded: 11, orignal: 11 }
Enter fullscreen mode Exit fullscreen mode

Our assertions should pass and we should get the output above. 🎉 We verified that our original and decoded samples our still the same. Our output gives us the encoded, decoded and original sizes. 🥹

From our samples, we can see that the largest sample yields the biggest compression. The largest sample gives us ~46% compression, the second one ~26% compression and the last one, abracadabra, actually expands with ~127%.


Conclusion

I have found it an intriguing exercise in understanding the algorithm and implement it in Javascript. I hope that if you have gotten this far you might have found it useful and not have the feeling you have just wasted an our of your life.

Top comments (0)