loading...
Cover image for Manipulating Files in the Browser

Manipulating Files in the Browser

kitanga_nday profile image Kitanga Nday ・13 min read

I've been wanting to write a tutorial for a while now, so I figured what better way to get into posting than to write about a topic that almost had me growing grey hair.

Now, this isn't going to be just a tutorial though, but also a personal experiment to see if I can achieve this simple task and hopefully teach you some of the issues I ran into whilst attempting this.

Anyways, to get you into learning how to read, write, and parse a custom file format, we'll be making a pack file: a file that holds different files in it (Think ZIP file without the compression step). This is extremely useful on the web, since you'll want to make as few requests to the server as possible. Though, if I remember correctly, HTTP/2 should have resolved this (Spoiler Alert: yep, it did. You can request files asynchronously now).

Our pack file contains image (PNG) and audio (OGG) files and concatenates them into one file. You can also alternatively use any other file types you like (e.g. replace OGG with MP3).

On to the Packer

Now for this demo I'll be using Codepen. But, you can use whatever environment you feel comfortable with, so long as you can follow along. If you get lost, you can check out my pen here to keep up.

The HTML/CSS

For brevity's sake, we'll add a file input element that accepts multiple files, but is limited to only PNG and OGG files, an add button for adding the selected files into a temporary "store", and a download button that we use to start the whole pack process.

<div id="wrapper">
    <input type="file" accept=".png, .ogg" multiple>
    <div><button id="download">Download</button></div>
    <button type="button" id="add">Add</button>
</div>

Apart from the basic setup style, our CSS will mainly center the input element horizontally using flexbox.

* {
   box-sizing: border-box;
}

html, body, #wrapper {
   width: 100%;
   height: 100%;
   margin: 0;
   padding: 0;
   overflow: hidden;
}

#wrapper {
   padding-top: 1%;
   padding-left: 10%;
   padding-right: 10%;
   display: flex;
   justify-content: center;
   flex-wrap: wrap;
}

#wrapper>* {
   flex-grow: 1;
}

Now for some JavaScript

First, we'll create an onclick event listener for the add button that adds the file names and their sizes to the page. It also adds them to our "store" variable which we'll call files for now.

const VERSION = 1;

// Get ref to the input and the files display elements.
const input = document.querySelector('input');
const filesContainer = document.querySelector('.files');
const addBtn = document.getElementById('add');
const download = document.getElementById("download");
// We'll use this to download the file
const anchor = document.createElement('a');


// We'll store files here
let files = [];

addBtn.onclick = () => {
   // Disable the file uploader and add button
   input.disabled = true;
   addBtn.innerText = "...";
   addBtn.disabled = true;

   // Convert files list to an array
   const selectedFiles = Array.prototype.slice.call(input.files);

   if (selectedFiles.length) {
      // Clear the files list on screen
      filesContainer.innerHTML = "";

      // We'll run this each time a file is done processing
      let count = selectedFiles.length;
      const callback = () => {
         // If our count is zero then we are done
         if (!--count) {
            // Enable the file uploader and add button
            input.disabled = false;
            addBtn.innerText = "Add";
            addBtn.disabled = false;
         }
      };

      // Add into files array
      selectedFiles.forEach(_file => {
         // We need to get the byte array of the file first
         _file.arrayBuffer().then(buffer => {
            let file = {
               name: _file.name,
               size: _file.size,
               type: _file.type,
               buffer
            };
            files.push(file);

            // Add files to the page
            filesContainer.innerHTML += `<div>${file.name} ${Math.round(
               file.size / 1024
            )}kb</div>`;
            callback();
         });
      });
   }
}

The code comments are pretty self explanatory really, I only ask that you pay close attention to the VERSION constant and anchor element since we'll use them later in the HEADER's creation and when we want to parse the file.

I do think having the byte array (a.k.a. ArrayBuffer) of each file in memory might be bad, but hey, whose complaining, right?

Quick note: I was having a few issues using Babel in Codepen, so I switched that off and started coding more defensively (i.e. using more ES5 techniques). I know most of you probably didn't turn on Babel transpilation, but for those who did it's just a warning.

Now you should have seen a filesContainer constant in the code. This is a reference to the HTML we are about to add:

<div id="wrapper">
    <input type="file" accept=".png, .ogg" multiple>
    <div><button id="download">Download</button></div>
    <button type="button" id="add">Add</button>
    <div class="files"></div>
</div>

And it's CSS

/* ...previous styles... */

.files {
   width: 100%;
   height: 100%;
   padding-top: 1%;
   text-align: center;
}

This is simply to show the files on screen.

Let's Talk File Format Design

Before we move on to creating the file, we need to design our file's structure. This requires that you understand two things: primitive and structured data types are.

Primitive data types give us a sort of standard to use in order to specify how many bits a property (more on this in a moment) takes up. Since we know that computers work in bytes, 1 byte (i.e. 8 bits) will be our smallest primitive. Now these bytes come in two flavours: they can either be signed or unsigned. Signed means that our byte can represent a positive or negative number. Unsigned means it's all positive numbers. In our case though, we'll only be using unsigned bytes since we have no use for negative numbers. Here's a list of some data types to give you a better idea of what I'm talking about:

List of primitive types

Please do try to remember the abbreviated versions (i.e. U8 for unsigned 8 bit integer), since I'll be using those in this tutorial a lot.

Structured data types, on the other hand, represent chunks of our binary file and consist of primitive data types represented by properties/attributes in the design documentation.

// e.g
Structure {
    property   U16
    property2  U8
}

So the structure above will take up 24 bits (16 + 8) or 3 bytes in total and the first 16 bits (2 bytes) will represent the property property, whilst the remaining 8 bits (1 byte) will represent the property property2.

OK! On to designing this thing.

Pack File Format Design

The PACK file will have a HEADER and a list of FILEs.

PACK {
    header     HEADER
    files      FILE[HEADER.count]
}

The HEADER will hold information on the version number of the packer and how many files are in the pack.

HEADER {
    version    U8
    count      U8
}

The FILE structure will hold information on the file type and length in bytes.

FILE {
    type       U24
    length     U32
    data       [length]
}

The type property (prop) is a 3 character ASCII string that will inform the parser of what file this is. At the moment, type can be 1 of 2 things: 'IMG' or 'AUD'. The length, on the other hand, just tells us how long the file is in bytes. And the data prop holds the bytes of the file.

Pack file implementation

Now that we have our design, let's move on to implementing our header and adding the packed files.

But, before we do anything else we have to talk about TypedArrays. And by "talk about" I mean you'll have to go read up on it, because me explaining it here would definitely bloat this article up. Sorry 😉.

Done? OK, let's continue then.

We'll first have to create our file's header, which as you obviously still remember, has a version and a count number both of which are represented by 8-bit unsigned integers.

HEADER {
    version    U8
    count      U8
}

Now for the code:

function pack(VERSION, selectedFiles) {
   // Making sure that we have files selected
   if (!selectedFiles.length) {
      return false;
   }

   // Create the header with a version number and file count
   let header = new Uint8Array([VERSION, selectedFiles.length]);

   // ...
}

Here, we create a function called pack that takes two parameters: the version of our packer and the list of file objects that we previously stored. The first thing we do in the function is check if any files have been selected, if not then we end the process. Afterwards, we create the HEADER structure using a Uint8Array and set our two prop values: version and count.

Before creating our final file, we'll have to find what our final file size will be.

function pack(VERSION, selectedFiles) {
   // Making sure that we have files selected
   if (!selectedFiles.length) {
      return false;
   }

   // Create the header with a version number and file count
   let header = new Uint8Array([VERSION, selectedFiles.length]);

   // Set our final file's size
   let fileSize = header.byteLength;
   fileSize += selectedFiles
      .map(file => file.size)
      .reduce((acc, currentVal) => acc + (currentVal + 7));

   // ...
}

After setting the initial file size to the header's byte count, we then reduce the file list we have so that all the sizes are added together. The reason why we do currentVal + 7 is because each FILE has a header of 56 bits or 7 bytes.

After that, we create our file, add the header to it, and create an offset variable that we'll use when adding the files in the next step:

function pack(VERSION, selectedFiles) {
   // ...

   // Our final file
   let finalFile = new Uint8Array(fileSize);

   // First add the header
   finalFile.set(header);

   // Keep count of the last file's offset
   let offset = header.byteLength;

   // ...
}

Following this we'll be our file concatenating code. Just in case you forgot how the FILE looks like:

FILE {
    type       U24
    length     U32
    data       [length]
}

Now to implement this:

function pack(VERSION, selectedFiles) {
   // ...

   // Now, take files and concatenate them
   selectedFiles.forEach(file => {
      // The FILE structure
      let outputFile = new Uint8Array(7 + file.size);

      // File's type
      if (file.type.includes("image")) {
         // Fill the ASCII characters IMG
         outputFile.set([73, 77, 71]);
      } else {
         // Fill the ASCII characters AUD
         outputFile.set([65, 85, 68]);
      }

      // Create a 32 bit integer...
      const fileLength32 = new Uint32Array([file.size]);
      // ...add finally add it as the length
      outputFile.set(new Uint8Array(fileLength32.buffer), 3);

      // And now add the file itself
      outputFile.set(new Uint8Array(file.buffer), 7);

      // Concatenate the file to the final file
      finalFile.set(outputFile, offset);

      // Update the offset value
      offset += outputFile.byteLength - 1;
   });

   // ...
}

Just a quick note, I run into an interesting gotcha trying to do this outputFile.set(new Uint32Array([file.size]), 3), because the implementation only adds the first byte. In order to make sure the the length is correct you'll have to create a U32 integer and then use it's buffer to create a Uint8Array.

At the end of our logic for this function, we return our pack file as a blob:

function pack(VERSION, selectedFiles) {
   // ...

   return new Blob([finalFile.buffer]);
}

The final code for our pack function should look like this.

function pack(VERSION, selectedFiles) {
   // Making sure that we have files selected
   if (!selectedFiles.length) {
      return false;
   }

   // Create the header with a version number and file count
   let header = new Uint8Array([VERSION, selectedFiles.length]);

   // Set our final file's size
   let fileSize = header.byteLength;
   fileSize += selectedFiles
      .map(file => file.size)
      .reduce((acc, currentVal) => acc + (currentVal + 7));

   // Our final file
   let finalFile = new Uint8Array(fileSize);

   // First add the header
   finalFile.set(header);

   // Keep count of the last file's offset
   let offset = header.byteLength + 1;

   // Now, take files and concatenate them
   selectedFiles.forEach(file => {
      // The FILE structure
      let outputFile = new Uint8Array(7 + file.size);

      // File's type
      if (file.type.includes("image")) {
         // Fill the ASCII characters IMG
         outputFile.set([73, 77, 71]);
      } else {
         // Fill the ASCII characters AUD
         outputFile.set([65, 85, 68]);
      }

      // Add the length
      outputFile.set(new Uint32Array([file.size]), 3);

      // And now add the file itself
      outputFile.set(new Uint8Array(file.buffer), 7);

      // Concatenate the file to the final file
      finalFile.set(outputFile, offset);

      // Update the offset value
      offset += outputFile.byteLength - 1;
   });

   return new Blob([finalFile.buffer]);
}

At the moment, it isn't all that useful. We'll have to start the process when the user presses the download button. Then, after creating our file, verify that we have a file and add the right attributes to our anchor element that we created earlier so that we can download a "resource.packed" file:

download.onclick = () => {
   // Disable all the action elements onscreen
   const actionEles = document.querySelectorAll('#wrapper > *');
   actionEles.forEach(ele => (ele.disabled = true));

   // Pack the files
   let packedFile = pack(VERSION, files);

   // If there isn't a file exported then we don't do anything
   if (!packedFile) {
      // Re-enable the action buttons
      actionEles.forEach(ele => (ele.disabled = false));
      return false;
   }

   // Set the download and href values
   anchor.download = "resource.packed";
   anchor.href = window.URL.createObjectURL(packedFile);
   // Prompt user to save file
   anchor.click();

   // Re-enable the action buttons
   actionEles.forEach(ele => (ele.disabled = false));
};

And that should do it. If you want to verify that the file has been created correctly you can use a Hex Editor like HxD to inspect your '.packed' file.

Learning to use a hex editor has proven to be extremely useful since I found that it is really easy to see where the file wasn't built correctly (I'm talking about the header for the pack file by the way, trying to see where the next file starts is really difficult). But yea, I think you should look into using one too.

(By the way, you could have also just created the different structures as TypedArray components and then pushed them into an array. You would then create a blob file using this array. I haven't tried this, but it should be simpler than what I've implemented here)

Now... the parser

Our parser will extract the header and the files. We'll also write some custom code that takes these extracted files and posts them onto the page. I'll put all our structure designs here so as to make remembering the structure a little bit easier:

PACK {
    header     HEADER
    files      FILE[HEADER.count]
}

HEADER {
    version    U8
    count      U8
}

FILE {
    type       U24
    length     U32
    data       [length]
}

Now, what we'll do is create a second pen where we can upload the pack file and immediately see the different files once they've been parsed.

If you get lost, just check out my new pen over here for help.

For brevity's sake, I'll post the HTML/CSS and move straight to our Javascript.

<div id="wrapper">
    <input type="file" accept=".packed">
    <div class="files"></div>
</div>
.file {
   width: 100vw;
}

img {
   width: 43vw;
}

OK, now we create our DOM references to these elements and then add an onchange event to the input element so that we can start our parse process.

const VERSION = 1;

// Get ref to the input and the files display elements.
const input = document.querySelector("input");
const filesContainer = document.querySelector(".files");

input.onchange = () => {
   // First we get our pack file if it was selected
   let file = input.files[0];

   // Make sure that we have our file
   if (file) {
      // We get the buffer
      file.arrayBuffer().then(buffer => {
         // Parse our file
         let unpackedFiles = parse(VERSION, buffer);
         // Display files on screen to make sure everything is OK
         displayFiles(unpackedFiles);
      });
   }
}

We'll follow this up with our parse function which will return an array of blob files. But first, some setup:

function parse(VERSION, buffer) {
   // Check if big/little endian
   const isLittleEndian = new Uint8Array(new Uint32Array([0x12345678]).buffer)[0] === 0x78;

   // Create a view of file's byte data
   let file = new DataView(buffer);

   // ...
}

Endianness. Yes, that's a word. Apparently different computers read bytes differently (who would have guessed). Some start from the right most byte others from the left most byte. That first line in the our parse function checks if the computer is little endian (right most byte). Using the DataView constructor allows us to write code that can work on both big-endian and little-endian architecture. Important because you don't want to read your file from the wrong end.

Now that that's all done, let's move on to reading our header and setting up file extraction:

function parse(VERSION, buffer) {
   // ...

   // If the version number of the parser is the same as the version number of packer then continue
   if (file.getUint8(0, isLittleEndian) === VERSION) {
      // How many files we have to get
      let count = file.getUint8(1 , isLittleEndian);
      // Our offset from the start of the buffer
      let offset = 2;
      // An array of Blob objects
      let unpackedFiles = [];

      // ...
}

We can start extracting our files:

function parse(VERSION, buffer) {
   // ...
   // If the version number of the parser is the same as the version number of packer then continue
   if (file.getUint8(0, isLittleEndian) === VERSION) {
      // ...

      // Get the files
      for (let index = 0; index < count; index++) {
         // Get the type first
         let type = "";
         for (let char = 0; char < 3; char++) {
            type += String.fromCharCode(file.getUint8(offset++, isLittleEndian));
         }

         // Get the length of file
         const fileLength = file.getUint32(offset, isLittleEndian);
         offset += 4;

         // Our array of byte values
         let byteArray = [];

         // Get the bytes
         for (let fileIndex = 0; fileIndex < fileLength; fileIndex++) {
            //
            byteArray.push(file.getUint8(offset + fileIndex, isLittleEndian));
         }

         offset += fileLength - 1;

         // Create a temporary store for our bytes
         let fileArray = new Uint8Array(byteArray);

         // Create our options object based on file type
         let options = { type: "" };
         if (type === 'IMG') {
            options.type = 'image/png';
         } else {
            options.type = 'audio/ogg';
         }

         // Add the file to our collection of files
         unpackedFiles.push(new Blob([fileArray.buffer], options));
      }

      return unpackedFiles;
   }
}

OK, that's a lot of code. Let me give you the gist of what's happening here:

  1. We get the type which can be one of two possible types: IMG or AUD.
  2. Get the file's length
  3. Create an array to hold the bytes and create a loop that adds them.
  4. Create a blob file and add it to our array of blob files

The function then ends with a line to export the unpacked files, these then get displayed in our displayFiles function:

function displayFiles(unpackedFiles) {
   // Clear the files list
   filesContainer.innerHTML = "";

   unpackedFiles.forEach(file => {
      // Create a div element
      let div = `<div class="file">[FILE]</div>`;

      // Check type
      if (file.type.includes('image')) {
         // It's an image
         div = div.replace('[FILE]', `<img src="${window.URL.createObjectURL(file)}"/>`);
      } else {
         // It's an audio file
         div = div.replace('[FILE]', `<audio src="${window.URL.createObjectURL(file)}" controls></audio>`);
      }
      // Update DOM
      filesContainer.innerHTML += div;
   })
}

This simply takes our blob files, creates a link to them, and then adds them to the DOM. That's pretty much it.

And finally...

YOU. ARE. AWESOME. 😎

Thanks for making it this far. I wrote a lot I know, just hope that this was helpful to someone. I know I learnt a lot.

Exercise

I think it would be nice if I left you with something to do after all of that. There is a lot missing from our packer/parser duo: adding identifiers so that we know what file we are dealing with, grouping files instead of having to read the type each time, supporting different file types by adding the mime type into the FILE's header, also using a STRING structure for the version prop in headers so that we can use semver strings, and making the code better in general.

So your optional exercise would be to add as many of these features/optimizations as is possible to the packer/parser code.

Any Queries?

If you have any questions you can also leave a comment below. I'll definitely be checking in.

I do have a question for you though, could this tutorial be better? Maybe it can be shorter, I do feel like I didn't explain the initial sections well. Would this be better off as a series instead?

I don't know much about this community, so if my writing style looks off.

Some articles you might want to read

Posted on by:

Discussion

markdown guide