mateusfmcota

Posted on Jan 7, 2023

Reading binary files with go. A pratical example using wave files

#go #programming #codenewbie #tutorial

Source-code used here: https://github.com/mateusfmcota/reading-wave-go
Portuguese version: https://dev.to/mateusfmcota/leitura-de-arquivos-binarios-em-go-um-guia-pratico-em-como-ler-arquivos-wav-kk0

Introduction

A few weeks ago I was talking to a colleague about programming, and one of the subjects that came up was the reading and parsing of files. While thinking about this I decided to make a simple program for reading and writing binary files in Go.

The format chosen was WAV files (PCM to be exact).

Understanding the structure of a wav file

PCM WAV is a file that follows Microsoft's RIFF specification for storing multimedia files. The canonical form of the file consists of these 3 sections:

The first structure, in purple, is called the RIFF Header, which has the following 3 fields:

ChunkID: This is used to specify the type of the chunk, since it is of type RIFF, the expected value is the string "RIFF".
ChunkSize: Total size of the file - 8. Since ChunkId and Chunk size are 4 bytes each, the easiest way to calculate this field is to take the total size of the file and subtract 8 from it.
Format: The type of file format, in this case it is the string "WAVE".

The next section, in green, is called fmt. This structure specifies the format and the metadata of the sound file.

SubChunk1Id: Contains the string "fmt ", which has a space at the end because the id fields are 4 bytes long and since "fmt" is 3, a space was added.
Subchunk1Size: This is the total size of the following fields, in the case of WAV PCM this value is 16.
AudioFormat: For values other than 1(PCM), indicates a compression form.
NumChannels: Number of channels, 1 = mono, 2 = stereo, ...
SampleRate: Sample rate of sound e.g. 8000, 44100, ...
ByteRate: SampleRate * NumChannels * BitsPerSample / 8, is the number of bytes in 1 second of sound
BlockAlign: NumChannels * BitsPerSample / 8, is the amount of bytes per sample including all channels
BitsPerSample: Amount of bits per sample, 8 bits, 16 bits, ...

The third session, in orange, is the data structure where the sound is stored itself, in which it has the following fields:

Subchunk2ID: Contains the string "data".
Subchunk2Size: NumSamples * NumChannels * BitsPerSample/8, this is also the number of bytes left in the file.
data: The sound data.

LIST Chunk

When I created a sound to test the program, using ffmpeg, I realized that it had an extra header, although this header is not in the canonical specification, I ended up creating a basic structure for it.

This structure is of type LIST, which follows the following specification:

ChunkId: Contains the string "LIST"
Size: The size of the LIST structure - 8. Basically it tells you the size in bytes remaining in the LIST structure.
listType: Various ASCII characters, they depend on the type of the file, some examples are: WAVE, DLS, ...
data: Depends on listType, but in this case does not apply to this program

Details of each header:

One detail that I decided not to explain in the last topic is the size and bit order, little-endian and big-endian, of each field for simplicity. So I created this table with all these fields, size and byte-order:

RIFF Header:

Offset	Field	Size	Byte-order
0	ChunkId	4	big
4	ChunkSize	4	little
8	Format	4	big

FMT Header:

Offset	Field	Size	Byte-order
12	Subchunk1ID	4	big
16	Subchunk1Size	4	little
20	AudioFormat	2	little
22	NumChannels	2	little
24	SampleRate	4	little
28	ByteRate	4	little
32	BlockAlign	2	little
34	BitsPerSample	2	little

LIST Header:

Offset	Field	Size	Byte-order
*	chunkID	4	big
*	size	4	big
*	listType	4	big
*	data	Variable	big

* Because it's platform specific and I will not use this field in the creation, I will ignore their offset calculation.

Data Header:

Offset	Field	Size	Byte-order
36	SubChunk2ID	4	big
40	SubChunk2Size	4	big
44	Data	Variable	big

Creating the program

After this great explanation of how a WAVE file works, now it is time to get down to business and to make the job easier I will use Go's native encoding/binary library to help.

Creating the structs

The first thing I did in the application was to create 4 structs, one for each header as follows:

type RIFF struct {
    ChunkID     []byte
    ChunkSize   []byte
    ChunkFormat []byte
}

type FMT struct {
    SubChunk1ID   []byte
    SubChunk1Size []byte
    AudioFormat   []byte
    NumChannels   []byte
    SampleRate    []byte
    ByteRate      []byte
    BlockAlign    []byte
    BitsPerSample []byte
}

type LIST struct {
    ChunkID  []byte
    size     []byte
    listType []byte
    data     []byte
}

type DATA struct {
    SubChunk2Id   []byte
    SubChunk2Size []byte
    data          []byte
}

Creating a function to help reading bytes

Although the encoding/binary library is very helpful for reading binary files, one problem with it is that it doesn't have a method implemented to read a number N of bytes from a given file.

For this I created a function that just reads the n bytes from an os.File and returns these values.

func readNBytes(file *os.File, n int) []byte {
    temp := make([]byte, n)

    _, err := file.Read(temp)
    if err != nil {
        panic(err)
    }

    return temp
}

Reading e parsing a wave file

Now we are going to read the file and for this we use os.Open:

    file, err := os.Open("audio.wav")

    if err != nil {
        panic(err)
    }

To parse the file, we first create a variable for each structure and use the readNBytes function to read each field:

// RIFF Chunk
    RIFFChunk := RIFF{}

    RIFFChunk.ChunkID = readNBytes(file, 4)
    RIFFChunk.ChunkSize = readNBytes(file, 4)
    RIFFChunk.ChunkFormat = readNBytes(file, 4)

    // FMT sub-chunk
    FMTChunk := FMT{}

    FMTChunk.SubChunk1ID = readNBytes(file, 4)
    FMTChunk.SubChunk1Size = readNBytes(file, 4)
    FMTChunk.AudioFormat = readNBytes(file, 2)
    FMTChunk.NumChannels = readNBytes(file, 2)
    FMTChunk.SampleRate = readNBytes(file, 4)
    FMTChunk.ByteRate = readNBytes(file, 4)
    FMTChunk.BlockAlign = readNBytes(file, 2)
    FMTChunk.BitsPerSample = readNBytes(file, 2)

    subChunk := readNBytes(file, 4)
    var listChunk *LIST

    if string(subChunk) == "LIST" {
        listChunk = new(LIST)
        listChunk.ChunkID = subChunk
        listChunk.size = readNBytes(file, 4)
        listChunk.listType = readNBytes(file, 4)
        listChunk.data = readNBytes(file, int(binary.LittleEndian.Uint32(listChunk.size))-4)
    }

    // Data sub-chunk
    data := DATA{}

    data.SubChunk2Id = readNBytes(file, 4)
    data.SubChunk2Size = readNBytes(file, 4)
    data.data = readNBytes(file, int(binary.LittleEndian.Uint32(data.SubChunk2Size)))

One detail I would like to explain is the line that contains the code:

if string(subChunk) == "LIST"

This line was put in because the LIST header is not a standard header in the canonical specification of a WAVE file, so I check to see if it exists or not, if it does I create the field, otherwise I ignore it.

Printing the fields:

Although we didn't use the encoding/binary library for reading, it will be very useful for printing, in the table I put above that explains the size and byte order type of each file, it is very useful to indicate which field is little-endian and which is big-endian.

To print the fields on the screen I created these 4 functions, 1 for each header type, that prints the field according to its byte-order:

func printRiff(rf RIFF) {
    fmt.Println("ChunkId: ", string(rf.ChunkID))
    fmt.Println("ChunkSize: ", binary.LittleEndian.Uint32(rf.ChunkSize)+8)
    fmt.Println("ChunkFormat: ", string(rf.ChunkFormat))

}

func printFMT(fm FMT) {
    fmt.Println("SubChunk1Id: ", string(fm.SubChunk1ID))
    fmt.Println("SubChunk1Size: ", binary.LittleEndian.Uint32(fm.SubChunk1Size))
    fmt.Println("AudioFormat: ", binary.LittleEndian.Uint16(fm.AudioFormat))
    fmt.Println("NumChannels: ", binary.LittleEndian.Uint16(fm.NumChannels))
    fmt.Println("SampleRate: ", binary.LittleEndian.Uint32(fm.SampleRate))
    fmt.Println("ByteRate: ", binary.LittleEndian.Uint32(fm.ByteRate))
    fmt.Println("BlockAlign: ", binary.LittleEndian.Uint16(fm.BlockAlign))
    fmt.Println("BitsPerSample: ", binary.LittleEndian.Uint16(fm.BitsPerSample))
}

func printLIST(list LIST) {
    fmt.Println("ChunkId: ", string(list.ChunkID))
    fmt.Println("size: ", binary.LittleEndian.Uint32(list.size))
    fmt.Println("listType: ", string(list.listType))
    fmt.Println("data: ", string(list.data))
}

func printData(data DATA) {
    fmt.Println("SubChunk2Id: ", string(data.SubChunk2Id))
    fmt.Println("SubChunk2Size: ", binary.LittleEndian.Uint32(data.SubChunk2Size))
    fmt.Println("data", data.data)
}

Since we are reading a file, which is read from "left to right", we can say that the default byte order is big-endian, so there is no need to convert these values to big-endian.

Optimizations

Although we didn't use the encoding/binary library for the above example, it is possible to use it to read files in a faster, and more elegant, but not initially intuitive, way.

It has a read method that lets you read the values from an io.Reader directly into a struct. Although it sounds simple, binary.read() has 2 quirks.

binary.read requires that the struct be well defined, with the sizes and types of each field already instantiated
binary.read requires you to pass it the byte order (big or little-endian)

With this in mind, we can improve the code this way.

Refactoring the structs

One of the first things we need to do is to create the structs with the fields at their predefined sizes when possible. Since some of them require variable size fields, I will leave them blank.

type RIFF struct {

    ChunkID     [4]byte
    ChunkSize   [4]byte
    ChunkFormat [4]byte

}

type FMT struct {
    SubChunk1ID   [4]byte
    SubChunk1Size [4]byte
    AudioFormat   [2]byte
    NumChannels   [2]byte
    SampleRate    [4]byte
    ByteRate      [4]byte
    BlockAlign    [2]byte
    BitsPerSample [2]byte
}

type LIST struct {
    ChunkID  [4]byte
    size     [4]byte
    listType [4]byte
    data     []byte
}

type DATA struct {
    SubChunk2Id   [4]byte
    SubChunk2Size [4]byte
    data          []byte
}

As noted above the date fields of the LIST and DATA headers were left empty, so we will deal with this in another way later on.

Making the print functions belong to struct and not to the package

The next step will be to couple the print functions to their respective structs, so that it will be easier to call them up in the future:

func (r RIFF) print() {
    fmt.Println("ChunkId: ", string(r.ChunkID[:]))
    fmt.Println("ChunkSize: ", binary.LittleEndian.Uint32(r.ChunkSize[:])+8)
    fmt.Println("ChunkFormat: ", string(r.ChunkFormat[:]))
    fmt.Println()
}

func (fm FMT) print() {
    fmt.Println("SubChunk1Id: ", string(fm.SubChunk1ID[:]))
    fmt.Println("SubChunk1Size: ", binary.LittleEndian.Uint32(fm.SubChunk1Size[:]))
    fmt.Println("AudioFormat: ", binary.LittleEndian.Uint16(fm.AudioFormat[:]))
    fmt.Println("NumChannels: ", binary.LittleEndian.Uint16(fm.NumChannels[:]))
    fmt.Println("SampleRate: ", binary.LittleEndian.Uint32(fm.SampleRate[:]))
    fmt.Println("ByteRate: ", binary.LittleEndian.Uint32(fm.ByteRate[:]))
    fmt.Println("BlockAlign: ", binary.LittleEndian.Uint16(fm.BlockAlign[:]))
    fmt.Println("BitsPerSample: ", binary.LittleEndian.Uint16(fm.BitsPerSample[:]))
    fmt.Println()
}

func (list LIST) print() {
    fmt.Println("ChunkId: ", string(list.ChunkID[:]))
    fmt.Println("size: ", binary.LittleEndian.Uint32(list.size[:]))
    fmt.Println("listType: ", string(list.listType[:]))
    fmt.Println("data: ", string(list.data))
    fmt.Println()
}

func (data DATA) print() {
    fmt.Println("SubChunk2Id: ", string(data.SubChunk2Id[:]))
    fmt.Println("SubChunk2Size: ", binary.BigEndian.Uint32(data.SubChunk2Size[:]))
    fmt.Println("first 100 samples", data.data[:100])
    fmt.Println()
}

From now on you will be able to call the print functions just by calling the print() method in the struct.

Reading structs with defined field sizes

With the structs well defined, reading them using the encoding/binary package is done by the Read function.

func binary.Read(r io.Reader, order binary.ByteOrder, data any))

This Read function expects you to pass it a data stream (such as a file), the byte order (big or little), and where to store the data.

If this place to store the data is a struct with defined sizes, it will scroll through the data field by field and store the amount of bytes there.

// RIFF Chunk
    RIFFChunk := RIFF{}
    binary.Read(file, binary.BigEndian, &RIFFChunk)

    FMTChunk := FMT{}
    binary.Read(file, binary.BigEndian, &FMTChunk)

In the case of undefined byte arrays, it would read the rest of the file, which is not correct for this application.

Reading structs with undefined fields

One of the simplest ways to read dynamic field sizes is to read them after figuring out their size. To do this, I have created functions inside the LIST and DATA structs called read() that handle this reading.

func (list *LIST) read(file *os.File) {

    listCondition := make([]byte, 4)
    file.Read(listCondition)
    file.Seek(-4, 1)

    if string(listCondition) != "LIST" {
        return
    }

    binary.Read(file, binary.BigEndian, &list.ChunkID)
    binary.Read(file, binary.BigEndian, &list.size)
    binary.Read(file, binary.BigEndian, &list.listType)
    list.data = make([]byte, binary.LittleEndian.Uint32(list.size[:])-4)
    binary.Read(file, binary.BigEndian, &list.data)
}

func (data *DATA) read(file *os.File) {
    binary.Read(file, binary.BigEndian, &data.SubChunk2Id)
    binary.Read(file, binary.BigEndian, &data.SubChunk2Size)
    data.data = make([]byte, binary.LittleEndian.Uint32(data.SubChunk2Size[:]))
    binary.Read(file, binary.BigEndian, &data.data)
}

In the read function of LIST, I first check the first 4 bytes to see if it contains the string "LIST", which is what identifies the header, if it exists I continue the function, otherwise I return. After this check I read the first 3 fields separately using binary.Read() and then I use the read size field and declare the dynamic size fields with their respective sizes.

Having done all this, you have a simple program that can read and interpret the data from a .wav file.

DEV Community