Source-code used here: https://github.com/mateusfmcota/reading-wave-go
Portuguese version: https://dev.to/mateusfmcota/leitura-de-arquivos-binarios-em-go-um-guia-pratico-em-como-ler-arquivos-wav-kk0
Introduction
A few weeks ago I was talking to a colleague about programming, and one of the subjects that came up was the reading and parsing of files. While thinking about this I decided to make a simple program for reading and writing binary files in Go.
The format chosen was WAV files (PCM to be exact).
Understanding the structure of a wav file
PCM WAV is a file that follows Microsoft's RIFF specification for storing multimedia files. The canonical form of the file consists of these 3 sections:
The first structure, in purple, is called the RIFF Header, which has the following 3 fields:
- ChunkID: This is used to specify the type of the chunk, since it is of type RIFF, the expected value is the string "RIFF".
- ChunkSize: Total size of the file - 8. Since ChunkId and Chunk size are 4 bytes each, the easiest way to calculate this field is to take the total size of the file and subtract 8 from it.
- Format: The type of file format, in this case it is the string "WAVE".
The next section, in green, is called fmt. This structure specifies the format and the metadata of the sound file.
- SubChunk1Id: Contains the string "fmt ", which has a space at the end because the id fields are 4 bytes long and since "fmt" is 3, a space was added.
- Subchunk1Size: This is the total size of the following fields, in the case of WAV PCM this value is 16.
- AudioFormat: For values other than 1(PCM), indicates a compression form.
- NumChannels: Number of channels, 1 = mono, 2 = stereo, ...
- SampleRate: Sample rate of sound e.g. 8000, 44100, ...
- ByteRate: SampleRate * NumChannels * BitsPerSample / 8, is the number of bytes in 1 second of sound
- BlockAlign: NumChannels * BitsPerSample / 8, is the amount of bytes per sample including all channels
- BitsPerSample: Amount of bits per sample, 8 bits, 16 bits, ...
The third session, in orange, is the data structure where the sound is stored itself, in which it has the following fields:
- Subchunk2ID: Contains the string "data".
- Subchunk2Size: NumSamples * NumChannels * BitsPerSample/8, this is also the number of bytes left in the file.
- data: The sound data.
LIST Chunk
When I created a sound to test the program, using ffmpeg, I realized that it had an extra header, although this header is not in the canonical specification, I ended up creating a basic structure for it.
This structure is of type LIST, which follows the following specification:
- ChunkId: Contains the string "LIST"
- Size: The size of the LIST structure - 8. Basically it tells you the size in bytes remaining in the LIST structure.
- listType: Various ASCII characters, they depend on the type of the file, some examples are: WAVE, DLS, ...
- data: Depends on listType, but in this case does not apply to this program
Details of each header:
One detail that I decided not to explain in the last topic is the size and bit order, little-endian and big-endian, of each field for simplicity. So I created this table with all these fields, size and byte-order:
RIFF Header:
Offset | Field | Size | Byte-order |
---|---|---|---|
0 | ChunkId | 4 | big |
4 | ChunkSize | 4 | little |
8 | Format | 4 | big |
FMT Header:
Offset | Field | Size | Byte-order |
---|---|---|---|
12 | Subchunk1ID | 4 | big |
16 | Subchunk1Size | 4 | little |
20 | AudioFormat | 2 | little |
22 | NumChannels | 2 | little |
24 | SampleRate | 4 | little |
28 | ByteRate | 4 | little |
32 | BlockAlign | 2 | little |
34 | BitsPerSample | 2 | little |
LIST Header:
Offset | Field | Size | Byte-order |
---|---|---|---|
* | chunkID | 4 | big |
* | size | 4 | big |
* | listType | 4 | big |
* | data | Variable | big |
* Because it's platform specific and I will not use this field in the creation, I will ignore their offset calculation.
Data Header:
Offset | Field | Size | Byte-order |
---|---|---|---|
36 | SubChunk2ID | 4 | big |
40 | SubChunk2Size | 4 | big |
44 | Data | Variable | big |
Creating the program
After this great explanation of how a WAVE file works, now it is time to get down to business and to make the job easier I will use Go's native encoding/binary
library to help.
Creating the structs
The first thing I did in the application was to create 4 structs, one for each header as follows:
type RIFF struct {
ChunkID []byte
ChunkSize []byte
ChunkFormat []byte
}
type FMT struct {
SubChunk1ID []byte
SubChunk1Size []byte
AudioFormat []byte
NumChannels []byte
SampleRate []byte
ByteRate []byte
BlockAlign []byte
BitsPerSample []byte
}
type LIST struct {
ChunkID []byte
size []byte
listType []byte
data []byte
}
type DATA struct {
SubChunk2Id []byte
SubChunk2Size []byte
data []byte
}
Creating a function to help reading bytes
Although the encoding/binary
library is very helpful for reading binary files, one problem with it is that it doesn't have a method implemented to read a number N of bytes from a given file.
For this I created a function that just reads the n bytes from an os.File
and returns these values.
func readNBytes(file *os.File, n int) []byte {
temp := make([]byte, n)
_, err := file.Read(temp)
if err != nil {
panic(err)
}
return temp
}
Reading e parsing a wave file
Now we are going to read the file and for this we use os.Open
:
file, err := os.Open("audio.wav")
if err != nil {
panic(err)
}
To parse the file, we first create a variable for each structure and use the readNBytes
function to read each field:
// RIFF Chunk
RIFFChunk := RIFF{}
RIFFChunk.ChunkID = readNBytes(file, 4)
RIFFChunk.ChunkSize = readNBytes(file, 4)
RIFFChunk.ChunkFormat = readNBytes(file, 4)
// FMT sub-chunk
FMTChunk := FMT{}
FMTChunk.SubChunk1ID = readNBytes(file, 4)
FMTChunk.SubChunk1Size = readNBytes(file, 4)
FMTChunk.AudioFormat = readNBytes(file, 2)
FMTChunk.NumChannels = readNBytes(file, 2)
FMTChunk.SampleRate = readNBytes(file, 4)
FMTChunk.ByteRate = readNBytes(file, 4)
FMTChunk.BlockAlign = readNBytes(file, 2)
FMTChunk.BitsPerSample = readNBytes(file, 2)
subChunk := readNBytes(file, 4)
var listChunk *LIST
if string(subChunk) == "LIST" {
listChunk = new(LIST)
listChunk.ChunkID = subChunk
listChunk.size = readNBytes(file, 4)
listChunk.listType = readNBytes(file, 4)
listChunk.data = readNBytes(file, int(binary.LittleEndian.Uint32(listChunk.size))-4)
}
// Data sub-chunk
data := DATA{}
data.SubChunk2Id = readNBytes(file, 4)
data.SubChunk2Size = readNBytes(file, 4)
data.data = readNBytes(file, int(binary.LittleEndian.Uint32(data.SubChunk2Size)))
One detail I would like to explain is the line that contains the code:
if string(subChunk) == "LIST"
This line was put in because the LIST header is not a standard header in the canonical specification of a WAVE file, so I check to see if it exists or not, if it does I create the field, otherwise I ignore it.
Printing the fields:
Although we didn't use the encoding/binary
library for reading, it will be very useful for printing, in the table I put above that explains the size and byte order type of each file, it is very useful to indicate which field is little-endian and which is big-endian.
To print the fields on the screen I created these 4 functions, 1 for each header type, that prints the field according to its byte-order:
func printRiff(rf RIFF) {
fmt.Println("ChunkId: ", string(rf.ChunkID))
fmt.Println("ChunkSize: ", binary.LittleEndian.Uint32(rf.ChunkSize)+8)
fmt.Println("ChunkFormat: ", string(rf.ChunkFormat))
}
func printFMT(fm FMT) {
fmt.Println("SubChunk1Id: ", string(fm.SubChunk1ID))
fmt.Println("SubChunk1Size: ", binary.LittleEndian.Uint32(fm.SubChunk1Size))
fmt.Println("AudioFormat: ", binary.LittleEndian.Uint16(fm.AudioFormat))
fmt.Println("NumChannels: ", binary.LittleEndian.Uint16(fm.NumChannels))
fmt.Println("SampleRate: ", binary.LittleEndian.Uint32(fm.SampleRate))
fmt.Println("ByteRate: ", binary.LittleEndian.Uint32(fm.ByteRate))
fmt.Println("BlockAlign: ", binary.LittleEndian.Uint16(fm.BlockAlign))
fmt.Println("BitsPerSample: ", binary.LittleEndian.Uint16(fm.BitsPerSample))
}
func printLIST(list LIST) {
fmt.Println("ChunkId: ", string(list.ChunkID))
fmt.Println("size: ", binary.LittleEndian.Uint32(list.size))
fmt.Println("listType: ", string(list.listType))
fmt.Println("data: ", string(list.data))
}
func printData(data DATA) {
fmt.Println("SubChunk2Id: ", string(data.SubChunk2Id))
fmt.Println("SubChunk2Size: ", binary.LittleEndian.Uint32(data.SubChunk2Size))
fmt.Println("data", data.data)
}
Since we are reading a file, which is read from "left to right", we can say that the default byte order is big-endian, so there is no need to convert these values to big-endian.
Optimizations
Although we didn't use the encoding/binary
library for the above example, it is possible to use it to read files in a faster, and more elegant, but not initially intuitive, way.
It has a read method that lets you read the values from an io.Reader
directly into a struct. Although it sounds simple, binary.read()
has 2 quirks.
-
binary.read
requires that the struct be well defined, with the sizes and types of each field already instantiated -
binary.read
requires you to pass it the byte order (big or little-endian)
With this in mind, we can improve the code this way.
Refactoring the structs
One of the first things we need to do is to create the structs with the fields at their predefined sizes when possible. Since some of them require variable size fields, I will leave them blank.
type RIFF struct {
ChunkID [4]byte
ChunkSize [4]byte
ChunkFormat [4]byte
}
type FMT struct {
SubChunk1ID [4]byte
SubChunk1Size [4]byte
AudioFormat [2]byte
NumChannels [2]byte
SampleRate [4]byte
ByteRate [4]byte
BlockAlign [2]byte
BitsPerSample [2]byte
}
type LIST struct {
ChunkID [4]byte
size [4]byte
listType [4]byte
data []byte
}
type DATA struct {
SubChunk2Id [4]byte
SubChunk2Size [4]byte
data []byte
}
As noted above the date fields of the LIST and DATA headers were left empty, so we will deal with this in another way later on.
Making the print functions belong to struct and not to the package
The next step will be to couple the print functions to their respective structs, so that it will be easier to call them up in the future:
func (r RIFF) print() {
fmt.Println("ChunkId: ", string(r.ChunkID[:]))
fmt.Println("ChunkSize: ", binary.LittleEndian.Uint32(r.ChunkSize[:])+8)
fmt.Println("ChunkFormat: ", string(r.ChunkFormat[:]))
fmt.Println()
}
func (fm FMT) print() {
fmt.Println("SubChunk1Id: ", string(fm.SubChunk1ID[:]))
fmt.Println("SubChunk1Size: ", binary.LittleEndian.Uint32(fm.SubChunk1Size[:]))
fmt.Println("AudioFormat: ", binary.LittleEndian.Uint16(fm.AudioFormat[:]))
fmt.Println("NumChannels: ", binary.LittleEndian.Uint16(fm.NumChannels[:]))
fmt.Println("SampleRate: ", binary.LittleEndian.Uint32(fm.SampleRate[:]))
fmt.Println("ByteRate: ", binary.LittleEndian.Uint32(fm.ByteRate[:]))
fmt.Println("BlockAlign: ", binary.LittleEndian.Uint16(fm.BlockAlign[:]))
fmt.Println("BitsPerSample: ", binary.LittleEndian.Uint16(fm.BitsPerSample[:]))
fmt.Println()
}
func (list LIST) print() {
fmt.Println("ChunkId: ", string(list.ChunkID[:]))
fmt.Println("size: ", binary.LittleEndian.Uint32(list.size[:]))
fmt.Println("listType: ", string(list.listType[:]))
fmt.Println("data: ", string(list.data))
fmt.Println()
}
func (data DATA) print() {
fmt.Println("SubChunk2Id: ", string(data.SubChunk2Id[:]))
fmt.Println("SubChunk2Size: ", binary.BigEndian.Uint32(data.SubChunk2Size[:]))
fmt.Println("first 100 samples", data.data[:100])
fmt.Println()
}
From now on you will be able to call the print functions just by calling the print()
method in the struct.
Reading structs with defined field sizes
With the structs well defined, reading them using the encoding/binary
package is done by the Read function.
func binary.Read(r io.Reader, order binary.ByteOrder, data any)
)
This Read function expects you to pass it a data stream (such as a file), the byte order (big or little), and where to store the data.
If this place to store the data is a struct with defined sizes, it will scroll through the data field by field and store the amount of bytes there.
// RIFF Chunk
RIFFChunk := RIFF{}
binary.Read(file, binary.BigEndian, &RIFFChunk)
FMTChunk := FMT{}
binary.Read(file, binary.BigEndian, &FMTChunk)
In the case of undefined byte arrays, it would read the rest of the file, which is not correct for this application.
Reading structs with undefined fields
One of the simplest ways to read dynamic field sizes is to read them after figuring out their size. To do this, I have created functions inside the LIST and DATA structs called read()
that handle this reading.
func (list *LIST) read(file *os.File) {
listCondition := make([]byte, 4)
file.Read(listCondition)
file.Seek(-4, 1)
if string(listCondition) != "LIST" {
return
}
binary.Read(file, binary.BigEndian, &list.ChunkID)
binary.Read(file, binary.BigEndian, &list.size)
binary.Read(file, binary.BigEndian, &list.listType)
list.data = make([]byte, binary.LittleEndian.Uint32(list.size[:])-4)
binary.Read(file, binary.BigEndian, &list.data)
}
func (data *DATA) read(file *os.File) {
binary.Read(file, binary.BigEndian, &data.SubChunk2Id)
binary.Read(file, binary.BigEndian, &data.SubChunk2Size)
data.data = make([]byte, binary.LittleEndian.Uint32(data.SubChunk2Size[:]))
binary.Read(file, binary.BigEndian, &data.data)
}
In the read function of LIST, I first check the first 4 bytes to see if it contains the string "LIST", which is what identifies the header, if it exists I continue the function, otherwise I return. After this check I read the first 3 fields separately using binary.Read()
and then I use the read size field and declare the dynamic size fields with their respective sizes.
Having done all this, you have a simple program that can read and interpret the data from a .wav
file.
Top comments (0)