From Scaling Data to Transcribing Voices: Building Resilience Under Pressure

Elijah Arhinful — Sat, 13 Jun 2026 12:48:17 +0000

As my backend engineering internship wraps up, I’ve been reflecting on the tasks that pushed me the hardest. Building minimum viable products is one thing, but making them resilient, scalable, and fault-tolerant is an entirely different beast.

Here are two of the most memorable tasks from my time here—one solo dive into system scaling, and one team effort tackling asynchronous voice processing.

Task 1: Scaling the Insighta Labs+ Query Engine (Individual)

What it was

Insighta Labs+ is a demographic intelligence platform where analysts and engineers run structured queries on user profiles via a CLI and a Web Portal (backed by GitHub OAuth and RBAC). My task was to take a functional MVP and evolve it into a robust query engine capable of handling tens of millions of records and hundreds of concurrent queries per minute.

The problem it was solving

The initial architecture worked flawlessly for a few thousand records, but under scale, it started showing cracks.

Latency: Without indexing, every filter query triggered a full-table scan.
Redundancy: Identical queries from different users wasted CPU and DB cycles.
Write-Pressure: Users needed to bulk-upload CSVs containing up to 500,000 rows. Processing these synchronously locked the database, bringing read operations to a halt.

How I approached it

Instead of blindly throwing more server power at the problem, I focused on doing less work.

Targeted Indexing: I added indexes only to frequently filtered columns.
Caching & Normalization: I introduced Redis for TTL-based caching. To maximize cache hits, I built a query normalization layer. Whether a user queried "young males" or "men under 30", the parser normalized the filter object into a canonical form before hashing the cache key.
Connection Pooling: I set up PgBouncer to manage database connections and prevent exhaustion under high concurrency.
Chunked Ingestion: For the massive CSV uploads, I implemented chunked streaming. Rows were validated individually; valid rows streamed in, while invalid rows were skipped and reported in a summary.

What broke and how I fixed it

During testing, I noticed our Redis cache hit rate was suspiciously low. It turned out that slight variations in how the frontend constructed query objects (e.g., ordering of keys) generated completely different cache hashes for functionally identical queries. I fixed this by rigorously enforcing the query normalizer to sort and stringify query parameters deterministically before touching Redis.

Another issue was the bulk inserts starving read operations. By switching to streaming and chunking the inserts, the database breathed easier, and read operations remained snappy even during a 500k row upload.

What I took away from it

Scaling isn't always about distributed systems jargon or microservices. Often, the best scaling techniques are boring: good indexes, connection pooling, and canonical caching. Doing things the "simple" way usually yields the most maintainable code.

Why I picked it

It forced me to transition from a "does the code work?" mindset to "how does the system behave under stress?" It was a masterclass in pragmatic, constraint-driven system design.

Task 2: Resilient Voice Transcription for Onboarding (Team Task)

What it was

For the Flowbrand API, our team built an asynchronous voice onboarding feature. Instead of typing out long business descriptions and target audiences, users could simply record an audio snippet detailing their business.

The problem it was solving

Long text forms are conversion killers. Voice input reduces friction dramatically, but audio processing is notoriously flaky and slow. We needed to reliably handle audio file uploads, transcribe them to text using AI models, map that text to structured onboarding data, and ensure that a slow transcription or a failed 3rd-party API didn't ruin the user experience.

How we approached it

We built an event-driven pipeline in NestJS using Redis and Bull queues.
When a user uploaded an audio snippet, we immediately stored the file, created an active VoiceSession in Redis, and pushed a job to a VOICE_TRANSCRIPTION queue. To guarantee reliability, we implemented a strict fallback strategy: our VoiceTranscriptionService attempted transcription via the ultra-fast Groq API first. If Groq timed out or failed, it automatically failed over to AssemblyAI.

What broke and how we fixed it

Initially, we attempted to do the transcription synchronously within the HTTP request lifecycle. This was a disaster—requests would time out on larger files or slower internet connections, leaving the frontend hanging.

We fixed this by decoupling the process. The API returned a 202 Accepted with a session ID, and the frontend polled our session status endpoint while the queue handled the heavy lifting in the background.

We also ran into 3rd-party API rate limits. During bursts of concurrent uploads, the primary transcription provider threw errors. Our fallback implementation caught these exceptions perfectly, routing the overflow to AssemblyAI, making the unreliability completely invisible to the user.

What I took away from it

Never trust 3rd-party APIs: Always assume they will fail and build fallbacks.
Decouple heavy tasks: Background processing (via queues) is non-negotiable for media uploads and AI integrations.
Redis is versatile: We used it not just for caching, but for managing ephemeral session state across distributed background workers.

Why I picked it

This task was a highlight because of the cross-functional teamwork required. We didn't just bolt on a feature; we engineered a fault-tolerant pipeline that felt incredibly satisfying to see in action.

If you're interested in system design, NestJS, or scaling backend infrastructure, let's connect!

DEV Community: Elijah Arhinful

From Scaling Data to Transcribing Voices: Building Resilience Under Pressure

Task 1: Scaling the Insighta Labs+ Query Engine (Individual)

What it was

The problem it was solving

How I approached it

What broke and how I fixed it

What I took away from it

Why I picked it

Task 2: Resilient Voice Transcription for Onboarding (Team Task)

What it was

The problem it was solving

How we approached it

What broke and how we fixed it

What I took away from it

Why I picked it