Aditya Anuragi

Posted on May 5

Audio-to-haptics: perfectly syncing phone vibration to audio on the web — how I made it

#javascript #mobile #showdev #webdev

The circle fills and pulses in sync with the audio — this is what your phone is feeling. The GIF shows it, but you won't really get it until you feel it. Open this on Android and try it yourself →

Other links -
View on Github
View on npm

The haptics landscape

Native platforms have solid haptics support, and if haptics are the core of your product, the native APIs are worth learning. But there are very few apps where haptics are the focus, instead most haptics are an addition to polish the UX instead. While native APIs on iOS and Android can create a more polished experience, they come with their own constraints.

iOS Core Haptics lets you author precise AHAP files — you define the exact timing, intensity, and sharpness of every pulse. That level of control is what makes native iOS haptics feel polished, both in the API and in what the user actually feels. The Taptic Engine is high-quality hardware, and Core Haptics is built to take full advantage of it — the result, when authored well, is haptics that feel genuinely premium. The trade-off is that it's entirely manual: deriving patterns from audio isn't something the API does for you, so syncing haptics to arbitrary audio means authoring by hand and re-authoring whenever the audio changes.

Android 12+ ships HapticGenerator — hardware-level automatic analysis, no code required. The HAL derives vibration patterns from audio directly, and the timing is exact. It's the most capable approach to audio-driven haptics that exists. It's also native-only.

A few other gotchas worth knowing. Cross-platform coverage means two separate native implementations — though frameworks like Expo partially address this. expo-haptics gives you a unified JS API that maps to the right native backend under the hood, which is a genuine improvement. The catch is that it only exposes preset haptic types: impact (light/medium/heavy), notification (success/warning/error), selection. It's designed for UI feedback — a tap, a confirmation, an error state — not audio analysis. If you want haptics to sync with what's actually playing, you'd still be manually triggering calls based on audio events, which is back to the same hand-authored timing problem. Audio-derived pattern analysis isn't in scope for any of these APIs. Beyond that: any audio change on iOS means re-authoring AHAP files from scratch, and every tweak ships through the app store review cycle.

The web gets navigator.vibrate(pattern). You pass an array of millisecond durations alternating between vibrate and pause — [200, 100, 200] means "on 200ms, off 100ms, on 200ms." The motor fires at full power for each on-duration. No amplitude parameter, no intensity control, no automatic analysis. If you want haptics to match specific moments in the audio, you write that pattern array yourself.

None of that is a criticism of the web platform — navigator.vibrate does exactly what it says. The gap is that there's no equivalent of HapticGenerator for the web: nothing that takes audio and derives a pattern automatically. That gap is what this library fills.

A few places where it comes up:

Landing pages and product launches — your Show HN link, Product Hunt page, or landing page opens in a mobile browser. There's no app to install. If you want haptics, you're building them yourself from timing arrays.

Web games — browser games already have audio: explosions, impacts, pickups. The audio element is already there. Without a library, syncing haptics to it means manually mapping game events to vibrate() calls and maintaining that mapping every time the audio changes.

PWAs — technically web, live on the home screen, run in the browser engine. navigator.vibrate works identically. You could wrap a PWA in Capacitor or a native shell to access HapticGenerator and Core Haptics — but that means app store submissions, separate iOS/Android builds, and native maintenance overhead just to add haptics. Whether that trade-off is worth it comes down to one question: are haptics the feature of your product, or are they an enhancement?

If haptics are your core product — haptics are why someone is using the app — the native path is worth the investment. The quality difference is real and it will matter to your users.

But most apps aren't "haptics apps." A music player, a game, a product demo — haptics are the layer of polish on top, not the reason someone is there. A well-timed vibration on a beat, a gunshot, a UI interaction adds to the experience. For that use case, the overhead of native builds, AHAP authoring, and cross-platform implementation costs more than the enhancement is worth. Two lines of JS gets you there.

Web-based audio and video players — any site that embeds audio or video with impactful sound. The <audio> or <video> element is already there.

Rapid prototyping for native — even if you're eventually shipping iOS Core Haptics with hand-authored AHAP files, iterating in a browser first is much faster. Use the library to find what feels right, then port those timings to AHAP once you know the answer. The manual authoring step becomes a lot less painful when the creative work is already done.

audio-to-haptics fills that gap: analyze any audio URL or file, get a derived vibration pattern, attach it to a <video> or <audio> element — haptics fire automatically in sync with playback. Swap the audio and re-analyze; the haptics update to match. No manual authoring required.

The Web Audio API gives you a Float32Array of per-sample amplitude data across the full frequency spectrum to work with. Turning that into something that actually feels right on an on/off motor turns out to be harder than it looks — and produced some browser behavior findings that aren't in any documentation.

The algorithm

The core problem: the Web Audio API gives you per-sample amplitude data across the full frequency spectrum, and navigator.vibrate fires a motor. Deciding when to fire — and how hard to simulate — from raw amplitude data turns out to have a lot of wrong answers before you find a right one.

The approach went through a few iterations. Fixed amplitude thresholds, bass filtering, frame-by-frame FFT analysis — each worked for some audio and broke on others. What eventually worked is a combination of three things: past-only neighbor comparison for onset detection, a sustain mechanism for decay tails, and PWM for intensity simulation. The sections below cover each in order, including why the earlier approaches failed.

Why fixed thresholds don't work

The obvious first approach: fire navigator.vibrate() whenever amplitude exceeds a fixed threshold. Works fine for isolated sounds. Here's what it does on real audio:

Audio	Duration	Haptic events (fixed threshold)
Bike rev	~10s	10
Chainsaw	~15s	52
Death metal drumming	~30s	396
Chippin' In — Refused / Cyberpunk 2077 (guitar)	~3min	290

The goal is to fire haptics on distinct moments — a beat, an impact, a transient. Haptics get their value from contrast: the motor punching on a loud moment feels meaningful because there was silence before it. For a 10-second bike rev, 10 events is reasonable — a handful of distinct peaks. For a 30-second clip, 396 means the motor is running almost continuously. There's no contrast, no punch — just a constant buzz that most users will turn off within seconds.

A bass filter helps marginally but doesn't fix the root problem: sustained bass guitar still triggers constantly.

Root cause: music has no absolute quiet. An absolute threshold answers "is this loud?" when the real question is "is this louder than what just happened?"

Why FFT doesn't work either

The next instinct is to use the Web Audio API's AnalyserNode for real-time FFT — read getByteFrequencyData() each RAF frame, average the bass bins (the low-frequency buckets where kicks and impacts live), and vibrate when that energy is high. Targeting specific frequency ranges seemed like a cleaner approach than raw amplitude — you could isolate the bass frequencies that actually feel like impacts and ignore the mids and highs.

It helped a little. But it introduced a new failure mode alongside the old one.

The old problem: sustained bass notes and bass transients look identical in a single frequency frame. Both show high energy in the low bins. The static threshold problem was still there, just shifted to a different signal.

The new problem: anything outside the targeted frequency range doesn't register at all, even when it clearly should. Testing against a simple beep-beep audio made this obvious — four beeps, same audio, same perceived loudness. The first three sent haptics. The fourth didn't, because it sat at a slightly different frequency and fell outside the bass bins being watched. From a haptic standpoint all four beeps are the same event. The frequency-based approach treated them as fundamentally different things.

This is the core issue with targeting specific frequency ranges: you're deciding what kind of sound triggers haptics, when what actually matters is whether a sound is louder than what just happened. That's a relative amplitude question, not a frequency question.

The next variant added frame-to-frame spike detection — instead of asking "is bass energy high?" it asked "did bass energy jump from the previous frame?" That's closer to the right question. But at 16ms per RAF frame, a transient spans 3–4 frames. The jump you're looking for arrives spread across multiple frames, and comparing frame n to frame n-1 doesn't reliably catch it. Extending to a running average of recent frames helped at the edges but introduced its own lag.

Five variants in total were tested. They all had the same structural problem: real-time frame-by-frame analysis can only see a tiny window at a time. It can't see the full shape of a transient — the rise and fall across 60–120ms that makes a kick drum identifiable as a spike relative to what came before it. Each frame only knows its immediate neighbors.

Pre-computed analysis sidesteps this entirely. Before playback starts, the entire audio file is processed into 60ms buckets. When the RAF loop runs, it's reading from a pre-built map — it knows exactly what the last 240ms of audio looked like before firing a single vibration. You could theoretically build something similar with real-time FFT using a rolling look-back buffer, but by this point the more fundamental problem was already clear: frequency targeting was the wrong frame for the question we were asking. Even with perfect historical context, watching bass bins specifically would still miss the fourth beep.

Past-only neighbor comparison

Split the audio into ~60ms buckets (2646 samples at 44100Hz — small enough for timing precision, large enough for the motor to spin up). For each bucket, compare its peak amplitude to the mean of the last 4 buckets (~240ms of past audio):

ratio = trend.max / mean(last 4 buckets)
vibrate if ratio >= 1.5

Two scenarios that illustrate why this works:

Scenario A — quiet past, big spike: pastAvg = 0.08, current.max = 0.29, ratio = 3.6x → fires

Scenario B — loud past, modest rise: pastAvg = 0.41, current.max = 0.45, ratio = 1.1x → silent

This is local edge detection applied to audio. A kick drum in a wall of guitar stands out locally even when everything is globally loud.

Why past-only, not symmetric neighbors? Symmetric averaging pulls future silence backward into the window — it attenuates a spike's ratio by averaging in the quiet that follows. Past-only: a genuine onset always clears the threshold regardless of what comes after it. With symmetric neighbors, spikeRatio needed to be ~2.0 for music but ~1.5 for isolated sounds — the same formula couldn't serve both. With past-only, 1.5 works for everything.

Results after switching:

Audio	Duration	Symmetric neighbors	Past-only
Bike rev	~10s	22	10
Chainsaw	~15s	108	52
Death metal	~30s	396	71
Chippin' In (Refused / Cyberpunk 2077)	~3min	290	82

The death metal clip is the clearest case: 396 events in 30 seconds is roughly 13 per second — the motor never stops, it's just on. Past-only brings that to 71, which is about 2–3 per second. For a fast drum track that's actually right: you feel distinct hits, not a constant buzz. Chippin' In tells a similar story from the other direction — 290 over 3 minutes sounds manageable until you realize that's still one haptic every 0.6 seconds for an entire song. At 82 it's roughly one every 2 seconds, which is what punctuation feels like.

Sustain: catching decay tails

Past-only comparison drops off immediately after a spike's peak. But a thunderclap or an explosion has a natural decay tail — there's the initial bang, and then the rumble that follows for a second or two afterward. You want the haptics to fade with it, not cut off the moment the peak bucket passes.

Natural decays are geometric: each bucket is roughly prev × k for some constant k < 1. That shape is scale-invariant, which means a percentage threshold works at any absolute amplitude. So before the spike check:

if current.max >= prev.max × 0.75 → sustain (continue vibrating)

sustainLowerBound=0.75 catches the full decay tail. It's also self-terminating: as the chain decays geometrically, eventually current.max < prev.max × 0.75 and the chain ends naturally.

sustainUpperBound=1.01 blocks rising sections from being sustained through — only true decay tails qualify. The sustain check runs before the noise floor gate so quiet tails still fire.

PWM intensity simulation

navigator.vibrate(duration) has no amplitude parameter. Simulating intensity requires PWM: shorter duty cycles for soft vibration, longer for hard.

Each 20ms cycle is split into on and off based on intensity:

Intensity	on	off	Pattern (one cycle)
100%	20ms	0ms	`[20, 0, ...]` — effectively solid on
75%	15ms	5ms	`[15, 5, 15, 5, ...]`
50%	10ms	10ms	`[10, 10, 10, 10, ...]`
25%	5ms	15ms	`[5, 15, ...]` — motor stalls, too short to spin up

That pattern repeats to fill the full chain duration. A 200ms chain at 75% becomes [15, 5, 15, 5, 15, 5, ...] — 10 entries, the motor cycling on and off fast enough that inertia smooths it into perceived partial amplitude. The same principle as hardware PWM motor control, implemented in JS.

Two firing modes:

Short chains (< 4 buckets, ~240ms): solid [remainingMs] pulse. Transients need impact; there's no room for PWM texture in a 1–2 bucket chain.
Long chains: PWM pattern. Sustained audio gets textured vibration at intensity × duty cycle.

intensityFloor=0.5 because below ~40% duty cycle the motor stalls rather than spinning weakly.

Credit: PWM for pseudo-intensity on the web was first demonstrated by web-haptics by Lochie. audio-to-haptics applies the same technique to audio-driven timing rather than manually authored patterns.

Browser behavior worth knowing

The mute window is queryable via AudioContext, the decodeAudioData transfer behaviour is in the Web Audio spec, and MDN does acknowledge that pattern arrays get truncated — all documented, all easy to miss. The specific truncation limit isn't stated anywhere; you only find the number by hitting it. All of it was found through testing on Android Chrome on a Samsung Galaxy budget phone.

The mute window

When you call audio.play(), audioEl.currentTime starts advancing immediately — but the audio hardware pipeline has latency. The sound doesn't physically reach the speakers for outputLatency + baseLatency milliseconds. If you fire haptics the moment currentTime hits a spike in the waveform, the vibration fires before the associated sound does. Early haptics feel awful — instead of a satisfying punch that lands with the audio, you get a phantom vibration followed by the sound a beat later. The sync illusion breaks completely. The library suppresses all navigator.vibrate() calls for exactly outputLatency + baseLatency ms after every play, seek, and pause to keep haptics aligned with the audio.

The following examples illustrate when sound is being played and when haptics are being played. However since gifs don't support audio, the gif essentially visualises when the audio and haptics are being fired in the UI. However you can test it out yourself by cloning issue and the mute window and execute the following commands to try it yourself.

npm install
npm run dev

As you can see in the gif above, the haptics are being fired even before the sound is being played.

For an even more exaggerated example. You can rapidly press play/pause and the audio is never even heard yet somehow the haptics will still be fired nonetheless.

Now here's the same audio with the mute window implemented

Here with the mute window implemented, the haptics are much more in sync with the audio being played.

This isn't gonna be a noticble difference on a flagship phone. However, older devices can have more of a delay which can cause an extremely poor UX. The phone shown above is a Samsung Galaxy M32 (Appx 4 years old now).

[PS - Why did I record this on another phone instead of screen capture you might ask? I recorded in a mp4 format with another device so you can hear the vibration by setting the phone on a table and hear some rattling noise, only to then realise I can't upload mp4 here 😅, not directly I mean]

Both values are properties on the same AudioContext you're already using to decode audio — ctx.outputLatency is the latency added by the browser's audio rendering pipeline, ctx.baseLatency is the latency from the underlying hardware audio buffer. Together they give you the total delay between currentTime advancing and sound reaching the speakers:

const ctx = new AudioContext()
const muteWindowMs = (ctx.outputLatency + ctx.baseLatency) * 1000

This varies significantly by device. On the budget Samsung tested: ~168ms + ~171ms = ~339ms total. Modern flagship Android devices can sit at sub-3ms — over 100× lower. Without this compensation, haptics that feel roughly synced on a flagship will fire noticeably early on older budget hardware. More noticeable on older devices, but worth fixing regardless.

`decodeAudioData` transfers the ArrayBuffer

The Web Audio API transfers the ArrayBuffer into the decoder rather than copying it. The original reference is zeroed out once decodeAudioData completes.

const buffer = await fetchArrayBuffer(url);
await ctx.decodeAudioData(buffer);
// buffer.byteLength === 0 — it's empty now

// Re-analyzing with different options requires cloning first:
await ctx.decodeAudioData(buffer.slice(0));
await ctx.decodeAudioData(buffer.slice(0));

This is spec-compliant behavior but easy to miss. The symptom is silent or garbage analysis on the second call with no error thrown — the kind of bug that takes a while to trace.

Android pattern array limit

Chrome caps navigator.vibrate() pattern arrays at around 128 entries in practice (the W3C spec says 10, which Chrome clearly ignores — MDN acknowledges truncation happens but leaves the limit as "implementation-dependent"). A 45-bucket sustained chain (45 × 60ms = 2700ms) at 50% intensity generates [10, 10] × 135 = 270 entries — well over the ~128 limit, so the motor stops mid-chain.

In practice this is fine — nobody wants 10 seconds of continuous haptics anyway. But if you're working with long sustained events, cap the pattern at 190 entries or re-fire mid-chain.

The API

import { useRef, useEffect } from 'react';
import { useHaptics } from 'audio-to-haptics/react';

function VideoPlayer({ src }: { src: string }) {
  const videoRef = useRef<HTMLVideoElement>(null);
  const { analyze } = useHaptics(videoRef);

  useEffect(() => { void analyze(src) }, [src]);

  return <video ref={videoRef} controls />;
}

Haptics fire automatically on play and stop on pause or seek. The hook calls detach() on unmount.

For non-React projects, the vanilla JS API works the same way:

<video id="player" controls src="your-video.mp4"></video>

import { HapticEngine } from 'audio-to-haptics';

const video = document.getElementById('player');
const engine = new HapticEngine();

await engine.analyze(video.src);
engine.attach(video);

// call when done — stops the RAF loop, cancels vibration, removes listeners
// unlike the React hook, this won't happen automatically
engine.detach();

Works on Android Chrome, Samsung Internet, Opera Mobile. Desktop browsers don't implement the Vibration API.

Results

A library that lets you sync your phone's vibration motor with any audio or video playing in the browser — point it at a <audio> or <video> element, and the haptics follow automatically. A thunderclap, a punch landing, a jump scare, an explosion fading into rumble — whatever the audio has, the haptics follow. On the web, with two lines of code.

Try it here

Wrapping up

The core insight that made this work — comparing each audio bucket to its recent past rather than an absolute threshold — turned out to be the same principle as edge detection. Once that clicked, the rest followed: sustain for decay tails, PWM for intensity, pre-computed analysis for timing precision. The platform findings along the way (audio pipeline latency, decodeAudioData's transfer semantics, the pattern array cap) were the kind of things you only find by testing on real hardware.

If you're building something on the web that already has audio, adding haptics is two lines. Give it a try on Android and see how it feels — it's one of those things that's hard to appreciate until you actually feel it.

Read the full docs here

DEV Community

Audio-to-haptics: perfectly syncing phone vibration to audio on the web — how I made it

The haptics landscape

The algorithm

Why fixed thresholds don't work

Why FFT doesn't work either

Past-only neighbor comparison

Sustain: catching decay tails

PWM intensity simulation

Browser behavior worth knowing

The mute window

Now here's the same audio with the mute window implemented

`decodeAudioData` transfers the ArrayBuffer

Android pattern array limit

The API

Results

Wrapping up

Top comments (0)

The haptics landscape

The algorithm

Why fixed thresholds don't work

Why FFT doesn't work either

Past-only neighbor comparison

Sustain: catching decay tails

PWM intensity simulation

Browser behavior worth knowing

The mute window

Now here's the same audio with the mute window implemented

decodeAudioData transfers the ArrayBuffer

Android pattern array limit

The API

Results

Wrapping up

`decodeAudioData` transfers the ArrayBuffer