Jingyi

Posted on Jun 26 • Edited on Jun 30

While Learning LangChain, I Ended Up With This Realtime Voice Pattern

#ai #beginners #voiceagent #abotwrotethis

I've recently started learning LangChain while exploring different ways to build voice agents.

As someone who's still new to LangChain, I tried a few different approaches before landing on one that felt surprisingly clean.

I don't know whether this is already a common pattern in the LangChain ecosystem, but it worked well enough that I thought it was worth sharing. Hopefully it can also spark some discussion with people who've been building voice agents for longer.

The idea

The basic idea is to separate responsibilities instead of letting one framework handle everything.

In this pattern:

LangChain stays responsible for tool selection, tool execution, and response composition.
A separate realtime runtime handles RTC / RTM, speech input/output, and session lifecycle.
The two communicate through an OpenAI-compatible endpoint.

What surprised me

When I first started experimenting, I assumed adding realtime voice would require restructuring most of the application.

Instead, it felt more like adding another interaction layer.

LangChain continues doing what it already does well—tool orchestration and workflow management—while the realtime runtime focuses on voice-specific concerns.

That separation made the overall architecture feel much simpler.

The pattern

To better understand the idea, I put together a small recipe that demonstrates this integration pattern.

The recipe consists of three pieces:

Python exposes an OpenAI-compatible endpoint and manages the agent lifecycle.
Next.js handles the client-side realtime interaction.
LangChain remains server-side as the orchestration and tool layer.

It's not intended to be a complete application or a production-ready project.

It's simply a pattern that helped me understand how these pieces can fit together.

A quick note

For transparency, I work at Agora, so I naturally used Agora as the realtime runtime while putting this recipe together.

That said, I'm much more interested in learning how other people are building voice agents with LangChain.

If you're using a different stack or have taken a completely different approach, I'd love to hear about it.

The recipe

If you're curious, here's the recipe:

https://github.com/bluemotional/recipe-agent-langchain

Since I'm still learning LangChain myself, I'd really appreciate any feedback or suggestions.

What's next?

If people find this pattern useful, I'd like to keep expanding it with more recipes, for example:

RAG
Internal tools
Docs Copilot
MCP integrations

I'm also curious to see what other patterns the community has found useful.

DEV Community