DEV Community

Nigisa Yamada
Nigisa Yamada

Posted on

Building a Browser-Based Focus Timer with Web Audio API and Picture-in-Picture

I built Pomoria, a browser-based focus timer that combines custom work and break sequences, ambient sound generation, notes, and Picture-in-Picture.

At first glance, it may look like another Pomodoro timer. But the part I found most interesting was not the countdown itself. It was building a small focus environment directly inside the browser.

Most Pomodoro apps are based on a fixed 25/5 loop. That works for some tasks, but not for everything. Studying for an exam, writing, coding, or reviewing a long project often has a different rhythm.

For example, a study session might look like this:

setup
60-minute attempt
short break
review
correction
cooldown

I wanted a timer where each step could have its own duration, notes, and sound context.

The audio engine

The core audio system is built with the Web Audio API.

Instead of only playing background audio files, Pomoria generates and mixes several sound layers locally in the browser:

white noise
pink noise
brown noise
rain-like ambience
binaural tones
metronome sounds
premium soundscape loops

The app creates an AudioContext and builds a simple mastering chain with gain, EQ, compression, an analyser, and output nodes.

Conceptually, the audio graph looks like this:

sound layers
-> atmosphere / binaural / generative buses
-> master gain
-> EQ
-> compressor
-> output gain
-> analyser
-> speakers

This makes it possible to control the whole sound environment from one place while still letting each layer behave independently.

Noise colors

White noise is generated by filling an audio buffer with random values.

Pink noise and brown noise need a little more shaping.

Pink noise feels softer and more balanced than white noise. Brown noise is deeper and warmer. This difference matters because in a focus app, sound is not just decoration. It changes how aggressive or calm the work environment feels.

I wanted users to be able to choose the kind of masking that fits the session.

Smooth transitions

One thing I learned quickly: sudden audio changes are distracting.

So volume changes are scheduled gradually instead of jumping immediately. For longer soundscape loops, I also used crossfading so the loop does not restart in an obvious way.

This is a small detail, but it matters a lot for an app that is supposed to help people stay focused.

Picture-in-Picture without a real video

Another feature I wanted was a timer that stays visible while working in another tab or app.

To do this, Pomoria draws the timer interface to a canvas, turns that canvas into a video source, and sends the video element into Picture-in-Picture.

So the PiP window is not a normal video. It is a live timer view.

It can show the current step, remaining time, visual progress, and different timer styles.

I also connected the timer state to the Media Session API, so play and pause controls can reflect the session state.

Why build it this way?

Because I think a focus timer is not only about measuring time.

It is also about rhythm, sound, visibility, and flow.

A normal timer asks: how much time is left?

I wanted Pomoria to ask a slightly different question: what kind of environment helps you stay with the work?

That is why I am trying to describe it as a focus environment rather than only a Pomodoro timer.

What I learned

Web Audio API is powerful enough for lightweight generative ambience.

Picture-in-Picture can be useful even when the app is not a video app.

Browser autoplay restrictions force you to design around user interaction.

Small UX details, like fade-ins and persistent visibility, matter a lot in productivity tools.

A timer becomes more interesting when time, sound, notes, and session structure are designed together.

Try it

Pomoria is live here:

https://pomoria.app

I would love feedback, especially on the audio controls, the Picture-in-Picture experience, and whether “focus environment” makes sense as positioning.

Top comments (1)

Collapse
 
harjjotsinghh profile image
Harjot Singh

Using Picture-in-Picture for a focus timer is a genuinely clever trick - it's the one way to keep a small always-on-top element visible without a native app or a browser extension, so the timer floats over whatever you're working in. Most people only think of PiP for video, so repurposing it as a persistent UI surface is the kind of platform-API creativity that makes a web app feel native. Pairing it with Web Audio (real scheduled tones instead of a janky setTimeout beep) shows you went after the details that separate a toy from something pleasant to actually use daily.

The thing I appreciate here is it's a great example of "the browser can do more than people use it for" - reaching for the right platform primitive instead of reaching for a framework or a dependency. That instinct (use what the platform gives you, keep it lean) is the same one I value in how I build Moonshift, the thing I work on - a multi-agent pipeline that takes a prompt to a deployed SaaS, favoring the right primitive over bloat. Multi-model routing keeps a build ~$3 flat, first run free no card. Really nice little build. Did PiP fight you on the rendering (you have to draw to a canvas/video to put arbitrary UI in the PiP window, right)? That canvas-to-PiP plumbing is the part I'd expect to be fiddlier than it looks.