Shrijal Acharya

Posted on Apr 28 • Edited on Apr 30

My speaker broke, so I built a LAN speaker

#go #opensource #programming #productivity

Built with Golang and WebSockets

What's happening?

This was around a year back when I started the project, after a small speaker that I had broke out of nowhere. Won't connect.

I used it to listen to music every night before going to sleep (not sure if anybody else does the same, but it's one of my only fixed routines since childhood :D).

The idea

Okay, so what's the idea?

I had a thought. Why not build something resembling a speaker? Although I had never worked on a project that required working with audio and all that, I knew what I'd use to implement something like that, which was WebSocket (I'll explain why I chose it in a moment).

And I did start on it.

The irony is that I had just finished Tour of Go, Let's Go, Let's Go Further, and had just started on 100 Go Mistakes and How to Avoid Them.

So... I had to do this in Golang.

I had quite a good idea of how to work with Golang, so it had to be done with it, also because I wanted to get away from the JavaScript ecosystem (the same Node.js, React, Next.js, yada yada was just too much).

Why WebSocket

So, let's come to the plan. Why WebSocket?

The main reason is that when I had the idea, it was WebSocket that came to my mind, and it's what I thought of before even starting to code the project.

And WebSocket kind of makes sense as well. This was my thought process.

Lets you connect multiple connections from one device? Yes
Is synchronous enough when streaming over multiple devices in LAN? Yes
Something I've worked with a lot? Totally Yes

That's pretty much the reason I chose WebSocket. It might not be the best option, but it's what I had and something I was confident with when I started.

One more thing I'll add here: since I'm streaming audio frames sequentially (PCM chunks, one after another), I actually need them to arrive in order.

WebSocket runs over TCP, so that's handled for me. I don't have to think about it. That alone made it feel like the right call.

Why not WebRTC or UDP?

Let me tell you something pretty frankly: I never even knew there was something like WebRTC. I came across this term while building the project halfway through it.

So, why not UDP?

Okay, so UDP is connectionless, which means there's no guarantee that every packet arrives, and more importantly, no guarantee they arrive in order. For something like video calls or games, that's totally fine. You drop a frame, move on, nobody cares.

But for audio? If a chunk goes missing or arrives out of order, you either get silence or a glitch.

And since I'm streaming raw PCM (basically just a stream of bytes that represent sound), every single chunk matters. You can't just skip them.

So UDP was out as well.

And WebRTC? Also, WebRTC is mostly built around browser-to-browser, peer-to-peer stuff. My setup is a server broadcasting to multiple clients over LAN, which isn't really what WebRTC is designed for. So even if I knew about it from day one, I'm not sure it would've been the right fit anyway.

WebSocket was fine. It worked. And sometimes that's enough. I didn't want to overengineer.

👀 I learned about all these new terms like PCM, WebRTC, and all that stuff during the build. So, I might say something wrong. I'm not really that familiar with them. So, just hit me in the comments if so.

The Architecture

Okay, so the high level is pretty simple.

There's a server and there are clients. The server is where the music is stored, and the clients are the devices that play it (could be the same device).

Here's what actually happens when you press play:

The server takes the MP3 file, decodes it into raw PCM (basically just bytes of sound data), and starts broadcasting those bytes over WebSocket to every connected client. No client-side decoding. The server does all of that.

Here's a high level architecture:

The tricky part is sync. If you just start streaming, each client will start playing at slightly different times, and it'll sound like a little echo.

So what I did is, before playback starts, the server sends a timestamp to all clients saying, "Start playing at exactly this moment in time." Every client gets that timestamp, buffers the audio frames, and waits. When the clock hits that time, everyone starts together.

It works because on a LAN with NTP, all the device clocks are usually within a millisecond or two of each other. Close enough that you can't tell the difference (much).

That's the whole thing, honestly.

Server decodes -> broadcasts -> clients sync -> play

What I still don't know

Okay, so during the build, I ran into something I didn't fully understand.

Turns out your speaker doesn't play audio the exact moment you write data to it. There's some time it spends sitting in a system buffer before it actually comes out. And that delay varies by OS and audio system. On Linux, it's something around 50ms, I feel.

Honestly, I had no idea there was something like this that you need to account for. This thing was completely debugged by GPT-5.4. There's a hardcoded 50ms constant in the code that is counted when clients actually start playing.

There's also something in the code that keeps checking, roughly every second, whether the audio playing on your device is slightly ahead or behind.

If it is, it quietly adds or removes a tiny bit of audio to bring it back in line. So small you'd never hear it.

Both of these kind of work. I tested it. I just couldn't tell you exactly why the numbers are what they are.

How I usually run it

Since my plan is to use my whole laptop as a "speaker", I usually have the server and client on the same system (my personal laptop).

And I usually have it like this:

Start the server:

gophercast serve # now follow the TUI setup....

Connect clients (each in a tmux split). Usually, I have around 2 clients when running on the same machine. More than 2 kind of distorts the audio quality.

gophercast play --host <ip_from_serve> --port 8080 --name "client 1"
gophercast play --host <ip_from_serve> --port 8080 --name "client 2"

The steps are the same when connecting from multiple machines. Just make sure that all of them are connected to the same LAN.

Here's a quick demo on a single machine, working as the speaker.
(Ignore the audio quality)

My take

I'm pretty happy with how this worked out.

This was my very first time working with Charm's Bubbletea and the whole audio stuff.

I started this project just because I wanted to DIY. It was just the perfect time: my speaker broke, and I had somehow finished learning basic Go programming.

There's probably still a ton of bugs. I've tried to test most of it. I just picked the tool I knew and figured it out as I went.

It plays music across multiple devices in sync, or the same one if you connect through multiple terminals. My laptop, another computer, whatever's connected, all playing together.

That's what I wanted. That's what it does.

(I haven't tested how well it works on Mac, so I can't tell there. Also, I'm not sure Windows will work either due to oto's limitation and our hardcoded 50ms delay.)

You can find the repo here: shricodev/gophercast