DEV Community

Cover image for Reverse-engineering a weird video file
Patrick Winters
Patrick Winters

Posted on

Reverse-engineering a weird video file

Recently, I happened across a rather unusual video file whilst browsing Discord: a video that would show something different depending on your platform.

Screenshot on Firefox

Investigating

After sending it to a suitable number of other servers and friends, I decided to try to figure out how it worked. FFmpeg is a great tool (which you should definitely have), and includes a program called ffprobe. ffprobe can, as the name suggests, probe media files and dump their data. Here's what it gave me:

[matroska,webm @ 0x555f75cd8f00] Unknown or unsupported track type 3
    Last message repeated 1 times
Input #0, matroska,webm, from 'discord_moment_1-1.webm':
  Metadata:
    title           : https://pious.dev
    encoder         : Lavf58.76.100
  Duration: 00:00:00.00, start: 0.000000, bitrate: 48328952 kb/s
    Stream #0:0: Video: vp9 (Profile 0), yuv420p(tv), 480x480, SAR 1:1 DAR 1:1, 30 fps, 30 tbr, 1k tbn, 1k tbc
    Stream #0:1: Audio: vorbis, 48000 Hz, stereo, fltp (default)
Enter fullscreen mode Exit fullscreen mode

Shout out to pious.dev; great advertising.

Looking at the main data near the bottom shows us nothing unusual. Most video files follow this format: one video stream, and one audio stream. In the usual sense of the word, "video files" are actually containers for multiple streams. They can only contain the actual data, and only of one type (out of audio, video, subtitles, etc.). In our day-to-day usage, we only tend to interact with container formats, such as MP4 or WebM - under the hood, these formats most commonly use codecs with weird names like H. 265, AAC, Vorbis, or VP9.

In full, ffprobe is telling us that we have two streams:

  1. A video stream encoded as VP9, pixel format of YUV420P, at 30 FPS
  2. An audio stream encoded as Vorbis, at 48000 samples per second, in stereo audio

However, something more interesting is the warning at the top:

[matroska,webm @ 0x555f75cd8f00] Unknown or unsupported track type 3
    Last message repeated 1 times
Enter fullscreen mode Exit fullscreen mode

What does this mean? What's a "track type"? We need some docs. The first port of call would be some documentation about the WebM format, and a quick search leads us to the Container Guidelines on the WebM Project site. Hit Ctrl+F, and we get to the TrackType element of a Track:

A set of track types coded on 8 bits (1: video, 2: audio, 3: complex, 0x10: logo, 0x11: subtitle, 0x12: buttons, 0x20: control).

Interesting - the "track type 3" that ffprobe is complaining about is a "complex" track. Container formats often also contain unstructured data in addition to video and audio data as you'd expect., and the fact that such data is present in this file looks like it might be a clue.

We can use this information in conjunction with some confused messages around the original post to infer that there are three video tracks. One shows Chromium, one shows Firefox, and one shows Android. This means it must be taking advantage of how each of those platforms' parsers works differently.

The FFmpeg set of tools includes another useful one: ffplay can play videos how FFmpeg would interpret them. And it shows... Chromium. Now we know that Chromium purportedly only shows the video track which has the correct track type of 1 (video). This also means that Firefox is using one of the tracks that has a complex track type. To understand further, we need to dig deeper.

Doing some more research eventually shows up that WebM is based on Matroska, another video format, which in turn has its specification based on EBML (created specifically for it, it seems). EBML claims to be a "simplified binary extension of XML" - so we can parse it like markup.

Rust is great, let's use that. If we search crates.io for an EBML parser there's webm_iterable, perhaps that can help? Quickly creating a Rust project and adding webm_iterable to it, a basic program can be produced which just dumps the data in a file specified:

use std::env::args;
use std::fs::File;
use webm_iterable::WebmIterator;

fn main() {
    let src = File::open(args().nth(1).unwrap()).unwrap();
    let tag_iterator = WebmIterator::new(src, &[]);

    for tag in tag_iterator {
        println!("{:?}", tag?);
    }

    Ok(())
}
Enter fullscreen mode Exit fullscreen mode

Running that on the file gives us a very long list of all of the tags in the file, though we can narrow it down to the bit we care about: the track list. You can try to do it yourself if you want the raw log, but translating it loosely to XML, it looks like this:

<Tracks>
    <TrackEntry>
        <TrackNumber>1</TrackNumber>
        <TrackUid>6533231215085170651</TrackUid>
        <CodecId>V_VP9</CodecId>
        <TrackType>3</TrackType>
        <Video>
            <PixelWidth>480</PixelWidth>
            <PixelHeight>480</PixelHeight>
        </Video>
    </TrackEntry>
    <TrackEntry>
        <TrackNumber>2</TrackNumber>
        <TrackUid>15605971589599203008</TrackUid>
        <FlagDefault>0</FlagDefault>
        <CodecId>V_VP9</CodecId>
        <TrackType>1</TrackType>
        <Video>
            <PixelWidth>480</PixelWidth>
            <PixelHeight>480</PixelHeight>
        </Video>
        <TrackType>3</TrackType>
    </TrackEntry>
    <TrackEntry>
        <TrackNumber>3</TrackNumber>
        <TrackUid>1</TrackUid>
        <FlagDefault>0</FlagDefault>
        <CodecId>V_VP9</CodecId>
        <TrackType>1</TrackType>
        <Video>
            <PixelWidth>480</PixelWidth>
            <PixelHeight>480</PixelHeight>
        </Video>
    </TrackEntry>
    <TrackEntry>
        <TrackNumber>4</TrackNumber>
        <TrackUid>79470792871596424</TrackUid>
        <CodecId>A_VORBIS</CodecId>
        <TrackType>2</TrackType>
        <Audio>
            <Channels>2</Channels>
            <SamplingFrequency>48000.0</SamplingFrequency>
            <BitDepth>32</BitDepth>
        </Audio>
    </TrackEntry>
</Tracks>
Enter fullscreen mode Exit fullscreen mode

And hey, this looks familiar! This echoes exactly what we saw in what ffprobe gave us - the ones it showed us are seen here as track numbers 3 and 4 (if you want, you can check that all of their values are the same) and now we have two extras. These two must be the ones that had track type 3, and it ignored. Indeed, for track number 1, it has a track type element with value 3. However, track number 2 has two of these elements: the first one has a value of 1, and the second has a value of 3. This is weird, and I'm pretty sure this isn't valid - likely key to understanding it.

Let's forego some of this fancy tooling for now, and open up the beloved hex editor. Here, I'm using Okteta, though it's not because it's good in any way (sorry). Using those handy codec ID strings as a guide, we can pinpoint the location of this track data.

Hex editor data

Since we know that FFmpeg will ignore any tracks which have a track type of 3, how about we try to change one of the tracks with a track type of 3 to not be? We'll target track number 2, so we need to find that second track type element and change it to be a 1. Starting at the second obvious V_VP9 string, we can work forward until we see the two 480 values for width and height. Sure enough, pretty soon afterwards there is a lone 03 in the hex data. Switch that to a 01, save, and run ffplay with it, and...

 raw `ffplay` endraw  showing _Firefox_

Awesome. So now we know that the second track is intended for Firefox, which implies that Firefox treats the duplicate track type element differently to FFmpeg. It only sees the first one (1, video), whereas FFmpeg uses the last (3, complex, which we changed to a 1 just now). It also tells us that the player will choose the first track which it considers valid, and ignore any other tracks, even if they are also valid.

This editing process with the hex editor can be repeated with the first track to extract the Android version; the reader is invited to attempt doing so themselves.

Since Android apparently picks the first track despite it not being a video track, we assume that it just ignores it entirely. Weird.

Let's do a quick recap: we have a file with multiple video tracks in, and these tracks each have a track type (or multiple); and each parser that this video targets interprets them differently, either picking the first or last when there is duplicates, or ignoring the value entirely. This is a pretty cool discovery.

Extending

Now that we have acquired knowledge, let's apply it. At this stage, we could make our own video by manually modifying a specially-crafted WebM file in the hex editor. But let's try to automate that.

Going back to our little Rust project, we'll add a simple bit of code to the start that will create the basic WebM file with the multiple tracks. Should be as simple as ffmpeg -i video1.mp4 -i video2.mp4 -i video3.mp4 -i audio.mp3 output.webm, right? Er... no.

Save me, StackOverflow!

Okay, seems we need to use the -map option. So ffmpeg -i video1.mp4 -i video2.mp4 -i video3.mp4 -i audio.mp3 -map 0 -map 1 -map 2 -map 3 output.webm? Right!

This is fairly trivial to add to the code, though we may also want to throw in a -q to make FFmpeg shut up - it's rather verbose.

Once we've created this file, we can go through it to make the changes that we originally made via the hex editor. Luckily for us, that library we were using earlier - webm_iterable - also provides a way for us to write files as well as reading them.

Let's take our existing tag-dump code and modify it to write the tags to the output instead. Now it'll just directly copy one file to the other.

Making all of these tiny modifications is pretty boring, and it essentially boils down to matching one thing and switching it for another. Here's how we insert the track type, for example:

match tag { // tag is an element in the file
    // if it's a track type element
    Spec::TrackType(_) => {
        // Chromium is the only one who does it "correctly"
        if
            platforms[i] == Platform::All ||
            platforms[i] == Platform::Chromium
        {
            tag_output.write(tag)?;
        }
    },

    // if it's the codec ID element
    Spec::CodecId(id) => {
        // copy out the tag
        tag_output.write(tag)?;

        // figure out what the actual type should be
        // for example, "V_VP9" is a video codec
        let true_type = match &id[..2] {
            "A_" => 2, // audio
            "V_" => 1, // video
            _ => bail!("codec id is not video or audio"),
        };

        let track_type = match platforms[i] {
            // Android is only 3
            Platform::Android => 3,

            // Firefox initially should have the real type
            Platform::Firefox => true_type,

            // otherwise, just ignore this
            _ => continue,
        };

        // write the tag
        tag_output.write(&Spec::TrackType(track_type))?;
    },

    // if we're at the end of the track
    Spec::Video(_) | Spec::Audio(_) => {
        // copy this over, we're just using it as an anchor
        tag_output.write(tag)?;

        // write the second Firefox tag
        if platforms[i] == Platform::Firefox {
            tag_output.write(&Spec::TrackType(3))?;
        }
    },

    // ...
Enter fullscreen mode Exit fullscreen mode

There's a few other minor tweaks like this that need to be made for a full, useful program, but you get the gist. Notably, our program also supports changing the audio on each platform, though the original video did not. The original creator likely did know about this though, since a friend later linked me another example from his Github which does take advantage of the audio changing.

If we bring all this together with a nice CLI interface and some video and audio files, we get our own amazing oddity of a video. Brilliant.

Conclusion

This was pretty fun. I've put all of the code in a GitHub Gist (despite how bad it is) so you can go off and make your own.

I'd still like to stress that I didn't discover this (I wish), and all of this is just reverse-engineering that file that I saw on Discord a couple of days ago.

Also, first DEV article! Written at 4am, feels fitting. Comments, feedback, insults, whatever - appreciated below :)

Top comments (2)

Collapse
 
hugroo profile image
Hugo Ramirez

awesome

Collapse
 
zplusfour profile image
zplusfour

very nice