DEV Community

Cover image for Going native for voice recording
Amanda Gama
Amanda Gama

Posted on

Going native for voice recording

When you build a recording feature in React Native, the obvious move is to reach for one of the popular audio libraries. I did. Then I tried another. Then a third.

Every one of them fell over on the same two scenarios: the OS backgrounding the app, and a memory warning while a long recording was in flight. Files would land on disk corrupt, half-finalized, or just gone. Internal state would desync from reality. There was no recovery hook. Once the lifecycle went sideways, you got back a false from a promise and that was the end of it.

A 90-second voice memo silently disappearing is a hard failure. I needed control of the audio session and the interruption surface. So I wrote my own native module.

The audio session config

The session setup is small but every flag earns its place:

- (BOOL)configureAudioSession:(NSError **)error {
    AVAudioSession *session = [AVAudioSession sharedInstance];

    BOOL success = [session setCategory:AVAudioSessionCategoryPlayAndRecord
                            withOptions:AVAudioSessionCategoryOptionDefaultToSpeaker |
                                        AVAudioSessionCategoryOptionAllowBluetooth |
                                        AVAudioSessionCategoryOptionMixWithOthers
                                  error:error];
    if (!success) return NO;

    success = [session setMode:AVAudioSessionModeSpokenAudio error:error];
    if (!success) return NO;

    return [session setActive:YES error:error];
}
Enter fullscreen mode Exit fullscreen mode

PlayAndRecord is the only category that lets me record and play back from the same module. DefaultToSpeaker keeps audio off the earpiece. Recordings get reviewed on speaker, not held to the ear. AllowBluetooth so AirPods users don't have to take them out. MixWithOthers so I don't kill the user's podcast or music when the recorder simply opens.

AVAudioSessionModeSpokenAudio pays for itself: iOS tunes signal processing for voice when it's set, and you can hear the difference on playback.

The catch: setActive: can fail on first call if another audio client is mid-handoff. A short sleep and one retry recovers from that race almost every time:

- (BOOL)configureAudioSessionWithRetry:(NSError **)error {
    if ([self configureAudioSession:error]) return YES;
    [NSThread sleepForTimeInterval:0.6];
    return [self configureAudioSession:error];
}
Enter fullscreen mode Exit fullscreen mode

Surviving interruptions and memory warnings

This is the part the libraries got wrong.

AVAudioSessionInterruptionNotification fires when a phone call comes in, when Siri activates, when another app grabs the audio session. The default behavior is "your recorder stops, good luck." I pause cleanly, persist state, and emit an event back to JS so the UI can show "recording paused (call interruption)":

- (void)handleInterruption:(NSNotification *)notification {
    NSDictionary *info = notification.userInfo;
    AVAudioSessionInterruptionType type =
        [info[AVAudioSessionInterruptionTypeKey] unsignedIntegerValue];

    if (type == AVAudioSessionInterruptionTypeBegan) {
        if (audioRecorder && audioRecorder.isRecording) {
            [audioRecorder pause];
            [self stopProgressTimer];
            [self persistRecordingState];
            [self emitStateChange:@"paused"];
            [self emitInterruption:@"phone_call"];
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

When the interruption ends, I re-activate the session (with the same retry logic) and signal JS whether the system wants the recording resumed.

Memory warnings are the more interesting case. iOS will tear the process down without ceremony if pressure stays high. But the warning itself is a chance to land cleanly. I stop the recorder, which causes AVFoundation to finalize the M4A header so the partial file is playable, then emit a distinct state so the UI can offer recovery:

- (void)handleMemoryWarning:(NSNotification *)notification {
    if (audioRecorder && (audioRecorder.isRecording || isPaused)) {
        [self persistRecordingState];
        [audioRecorder stop];
        [self stopProgressTimer];
        [self sendEventWithName:@"onRecordingStateChange" body:@{
            @"state": @"stopped_memory_warning",
            @"filePath": currentFilePath ?: @"",
            @"reason": @"Recording stopped due to low memory."
        }];
    }
}
Enter fullscreen mode Exit fullscreen mode

The libraries' failure mode here was silence. The OS killed them and the user lost the take. Stopping early with a finalized file on disk is a much better trade.

Level metering math

For a waveform UI, AVAudioRecorder gives you averagePowerForChannel: and peakPowerForChannel:. They return dBFS (full-scale referenced), so 0 dB is clipping and the floor is around -160 dB.

The math everyone gets wrong on the first try is the normalization. You can't just divide by 160. Real voice rarely registers below -50 dB; using the full range gives you a bar that lives in the bottom 10% of its travel and barely moves. Pick a practical floor (-50 works) and clamp:

[audioRecorder updateMeters];
float averagePower = [audioRecorder averagePowerForChannel:0];
float peakPower = [audioRecorder peakPowerForChannel:0];

float normalizedLevel = (averagePower + 50.0) / 50.0;
normalizedLevel = MAX(0.0, MIN(1.0, normalizedLevel));

float normalizedPeak = (peakPower + 50.0) / 50.0;
normalizedPeak = MAX(0.0, MIN(1.0, normalizedPeak));
Enter fullscreen mode Exit fullscreen mode

I sample at 50 ms: NSTimer at 0.05 s on the main run loop. Faster looks jittery, slower looks laggy. 50 ms is also roughly the cadence the RN bridge can deliver events at without batching, so there's no point emitting more.

The boring numbers that matter

The recorder settings are a five-line dictionary, but each value is a deliberate choice:

- (NSDictionary *)audioRecorderSettings {
    return @{
        AVFormatIDKey: @(kAudioFormatMPEG4AAC),
        AVSampleRateKey: @44100.0,
        AVNumberOfChannelsKey: @1,
        AVEncoderAudioQualityKey: @(AVAudioQualityMedium),
        AVEncoderBitRateKey: @64000,
    };
}
Enter fullscreen mode Exit fullscreen mode

AAC in an M4A container plays everywhere: iOS, Android, browsers. PCM/WAV is an order of magnitude bigger. Opus still has gaps on older Safari.

44.1 kHz over 48: a hair smaller on disk, and downstream voice pipelines will resample anyway. Mono halves file size again and matches reality. No one is recording a stereo voice memo.

64 kbps AAC at AVAudioQualityMedium is transparent for speech. Apple's own defaults sit at 128 to 256 kbps, which is overkill for voice and produces files four times bigger than they need to be.

What I got back

About 150 lines of Objective-C replaced three failed library integrations. The recorder now survives a phone call, a memory warning, and a backgrounded app, and emits structured events JS can react to instead of swallowing the failure.

I own this surface now, though. The libraries handled enough cases that you could ship without thinking about audio sessions at all; I chose to think about them, and that bill comes due whenever iOS changes interruption semantics. I accept it because the alternative was silently losing recordings.

Top comments (1)

Collapse
 
guilherme44 profile image
Guilherme Yamakawa de Oliveira

Really enjoyed this one, @aoligama. The AVAudioSession interruption + memory warning bit is where most third-party libs quietly fall apart, and losing a 90-second memo is the kind of bug users don't forgive, like the dBFS normalization for the waveform, too. It's a small detail, but a big difference when you're staring at a flat line, wondering if the mic is working.

Honestly refreshing to read a "why I went native" post that actually explains the why instead of just dropping a wrapper and moving on.