DEV Community

ndesmic
ndesmic

Posted on • Updated on

Building a PWA Music Player Part 4: ID3 Tags

So far we have a player but now we'd like to give better displays of the data within the actual file as file names are not always the nicest. To do so we can get metadata attached to the file that has all the nice info for display, categorizing and sorting. These are a format called ID3 and it's an absolute mess.

ID3 Versions

ID3 was never supposed to be an official format, it just arose as it was needed and software added support for it. There are 2 primary versions of ID3: v1 and v2. Most files you'll find now have v2. In v1 tags are part of a 128-byte block at the end of the file prefixed with the ASCII string "TAG". I won't be dealing with this at this point as all of the stuff I'm using uses ID3v2 however if you want to make a more robust system you'll need to consider adding support for that.

Id3v2 has 4 minor version and they are slight different to read.

  • Id3v2.2 was the first public release meaning 2.0 and 2.1 don't exist. These use 3 byte field header values per field (name and length).

  • Idv2.3 uses 4 byte header values per field (name and length). This is the most common supported version.

  • Idv2.4 can have multiple values per field with a NULL separator and can use UTF-8 instead of UTF-16.

For this I'm only going to target ID3v2.3 as it is the most common.

ID3v2 Header

Id3v2 is always in the first part of the file starting at byte index 0. numbers are big endian.

Byte Index Length Format Description
0 3 bytes ASCII Header (always ID3)
3 1 byte Integer Major Version
4 1 byte Integer Minor Version
5 1 byte bitflags File Flags

Let's stop for a second. In general all files have short codes to identify them and ID3-labeled files do so as well. If the first 3 bytes are ID3 you know you have an ID3-labeled file of some type. The next 2 bytes are the major and minor version of the tag format. These should always read 3 for ID3 though perhaps one day this will no longer be the case. The minor version can be significant to determine the fields but I found it unreliable. Files come from all over the place and aren't always tagged well. The last byte is a block of bitflags.

  • The highest bit is the unsynchronize flag which is used in older software for compatibility and you can ignore it.

  • The second highest bit is the extended header which indicates we have an extended header. I didn't see this is practice but it's something that could come up in some files.

  • The third highest bit is the experimental bit. We can ignore this too unless you're testing out some ID3 spec changes.

  • The rest of the bits are reserved but unused.

Byte Index Length Format Description
6 4 bytes 28-bit Integer Length of tag section

The length field is completely stupid. It's 4 bytes with each byte having its highest bit set to zero and ignored. So if you squash these 7-bit values together you get an 28-bit number. Seriously I'm curious how this happened. In any case here's how to read it:

function readLength(dataView, index){
    return (dataView.getUint8(index) << 21) | 
    (dataView.getUint8(index + 1) << 14) | 
    (dataView.getUint8(index + 2) << 7) |
    (dataView.getUint8(index + 3));
}
Enter fullscreen mode Exit fullscreen mode

Id3v2 Fields

Immediately following the length is a series of frames. For v2.3 the headers are going to look like:

Byte Index (from start of field) Length Format Description
0 4 bytes string field type
4 4 bytes integer length of field
2 2 bytes BitFlags Field Bit Flags

The first 2 parts are rather self explanatory. The 3rd is a set of bit flags.

Following the bit flags is the frame data. The first byte in the data will either be 0 or 1. 0 if the frame uses ISO-8859-1 text encoding and 1 if it uses utf-16 encoding.

ISO-8859-1

Values 0-127 will be the same as ASCII so any ASCII-like encoding will work though for values 128-256 you should be sure you are using the right encoding. String may or may not be terminated. Either it'll end with a 0 byte or you'll simply hit the length limit.

UTF-16

This is a bit more complicated. The first thing you'll see following the encoding byte is the byte-order mark which is two bytes 0xff and 0xfe where the order determines the endianness.

  • If you see þÿ (0xFE, 0xFF) that means it's UTF-16 big endian. - If you see ÿþ (0xFF, 0xFE) that means it's UTF-16 little endian.

Like with IS0-8859-1 it can null terminated (0x0000) or it will simply stop at the length limit.

Here's what I came up with to read the tag field:

function readIso8859(dataView, offset, length = Infinity){
    const bytes = [];
    let i = 0;
    while (dataView.getUint8(offset + i) !== 0 && i < length) {
        bytes.push(dataView.getUint8(offset + i));
        i += 1;
    }
    return [new TextDecoder("iso-8859-1").decode(Uint8Array.from(bytes)), i];
}

function readUtf16(dataView, offset, length = Infinity){
    const bytes = [];
    let i = 0;
    while (dataView.getUint16(offset + i) !== 0 && i < length) {
        bytes.push(dataView.getUint8(offset + i));
        bytes.push(dataView.getUint8(offset + i + 1));
        i += 2;
    }
    let encoding;
    if (bytes[0] === 0xFF && bytes[1] === 0xFE) {
        encoding = "utf-16le";
    } else if (bytes[0] === 0xFF && bytes[1] === 0xFE) {
        encoding = "utf-16be";
    }
    return [new TextDecoder(encoding).decode(Uint8Array.from(bytes.slice(2))), i];
}

function readField(dataView, offset, length){
    const encodingType = dataView.getUint8(offset);
    return encodingType === 1
        ? readUtf16(dataView, offset + 1, length - 1)[0]
        : readIso8859(dataView, offset + 1, length - 1)[0]
}
Enter fullscreen mode Exit fullscreen mode

What's probably a little weird here is the final length parameter and the return type. The length parameter is optional as it allows us to cut off even if we don't find the null terminator. This could happen in the case of a malformed tag. We definitely do not want to go past the size or every other calculation will be wrong.

The second is the return type. In the case we don't use a length and just go until a null terminator we need to know how far we've actually gone. This will be useful in the next section. Other than that it should be a straightforward implementation of the algorithm outlined. I could technically cache the TextDecoders so I don't need a new one every time but without needing such performance yet, I think this is clearer.

Images

The field APIC contains image data. Unlike other fields this can appear multiple times as there are various types of images. As with other fields it's also annoying to read. First you'll get the encoding bytes which is 0 or 1 for ISO-8859-1 or UTF-16 respectively. The next part is a string that will terminate with 0x00 or 0x0000 depending on the encoding. This string contains the MIME type and from the spec it sounds like this could be a partial MIME type too (though I didn't encounter this and so I don't deal with it). Following that is a byte that represents the type of image, you can find a reference here: https://id3.org/id3v2.3.0#Attached_picture. Then you need to read another string using the same text encoding as the MIME type. I found that this value was often just empty. Finally, the rest of the bytes are a raw data payload for the image.

function readPicture(dataView, offset, length){
    const image = {};
    const encodingType = dataView.getUint8(offset);
    let i = 1;

    const mimeType = encodingType === 1
        ? readUtf16(dataView, offset + 1)
        : readIso8859(dataView, offset + 1)

    image.mimeType = mimeType[0];
    i += mimeType[1] + 1; //the one is the null byte
    image.imageType = dataView.getUint8(offset + i);
    i += 1;

    const description = encodingType === 1
        ? readUtf16(dataView, offset + i)
        : readIso8859(dataView, offset + i);

    image.description = description[0];
    i += description[1] + 1;

    image.data = dataView.buffer.slice(offset + i, offset + length);
    return image;
}
Enter fullscreen mode Exit fullscreen mode

This is where that 2 value return type with the length comes in handy because the subfields do not terminate at length so we don't know beforehand where it stops. It does make the code a little awkward but it was easier to do it this way than to duplicate code with slight API changes.


For a full list of what each field means see: https://id3.org/id3v2.3.0#Declared_ID3v2_frames

To find out when you have finished reading all the tags, you need to compare where you are to the length of the ID3 tag section as specified in the ID3 header. Note that this length does not include the 4-byte field headers, just the content so adding up the field lengths should be sufficient.

Updating the player

The refactorings become so scattered throughout the player that it's hard to really highlight what changed. You'll want to check the code. But for the sake of this post perhaps the easiest way is to look at the playFile method and walk it back:

async playFile({ file, id3 = {} }){
    const fileData = await file.getFile();
    this.updateDisplay({ file, id3 });
    const url = URL.createObjectURL(fileData);
    this.dom.audio.src = url;

    this.togglePlay(true);
}
Enter fullscreen mode Exit fullscreen mode

Looking at usages of playFile this made sense. Instead of passing around file handles we pass are fileWithMeta objects which are a fileHandle + id3 metadata. State holders like #files and #fileLinks have also been updated to use the new object. This way when we do things like generate links we can use the id3 data.

Displaying Metadata

I made a new right panel which can display the metadata for the currently playing track. To update the data we called the updateDisplay method:

updateDisplay({ file, id3 = {}}){
    this.dom.title.textContent = id3["TIT2"] ?? file.name;
    this.dom.infoTitle.textContent = id3["TIT2"] ?? file.name;
    this.dom.infoAlbum.textContent = id3["TALB"] ?? "";
    this.dom.infoArtist.textContent = id3["TPE1"] ?? "";
    this.dom.infoYear.textContent = id3["TYER"] ?? "";


    if(id3["APIC"]){
        const url = URL.createObjectURL(new Blob([id3["APIC"][0].data]));
        this.dom.albumArt.src = url;
    } else {
        this.dom.albumArt.src = "";
    }
}
Enter fullscreen mode Exit fullscreen mode

The title will either be the TIT2 id3 field or file name. We also have field for artist, album, and year released. There's tons more data but I didn't see a need to go all out with it and the rest is also less likely to be filled in.

The album art image is a little more complicated but that's mostly due to how the APIs work. We first need to get the APIC field which is an array of images (with some met. I'm only grabbing the first but technically we should really have a preference (eg. 3 - Album Cover) and fallback when it doesn't exist. What I have found is that most tracks only contain one image anyway and it is the album cover (too much metadata takes up a lot of unnecessary space) so this is probably fine. We need the image data, and because it's an ArrayBuffer we have to convert it to a Blob. This is done via the Blob constructor a handy but slightly odd API.

new Blob([arrayBuffer]);
Enter fullscreen mode Exit fullscreen mode

It needs and array (I always forget and think rest parameters work) and that array can have several types including ArrayBuffers, raw text and some other things and it just jams it altogether. In our case we have all the data in one block so it's an array of one and then we create the object URL from the Blob. It should also be pointed out that I really should revoke the URLs after usage using URL.revokeObjectURL but that requires tracking them until the track changes and I'm too lazy to implement that right now. There will be memory leaks. Finally, if we can't find an image we'll just use an empty source and show nothing. This could also be optimized by setting a placeholder image.

Issues

One problem this presents is that the unpermissioned player can no longer show data. Since we want to show ID3 title data we need to actually read the file but we can't until we have permission so this data is useless for display. In order to get around this, we'd need to build an indexedDb database of track info. If you've used a music player like iTunes, Amazon Music etc. you'll know that they have all this weird syncing going on. That's because it's storing all that metadata (including other metadata like play counts etc) and storing it in a big database so it doesn't have to parse it out. Perhaps at some point we can tackle that issue.

The other problem you'll notice is that mp4a files don't have track data. They have other means to store that which was deeper than I was able to get into.

Anyway, that was a messy introduction to ID3 tags. But at least we have something that works now even if it has some sharp edges. Hopefully with this we can move on to a more interesting feature: Media Session API.

Source code is here: https://github.com/ndesmic/music-player/tree/v0.5

https://gh.ndesmic.com/music-player/

Sources:

Top comments (0)