DEV Community: Harshit Kumar

Memory Is Not a Vector Database: Why AI Agents Need Beliefs, Not Storage

Harshit Kumar — Wed, 04 Feb 2026 07:25:35 +0000

Why storage is not the same as remembering

If you've built an AI agent that works with users over multiple sessions, you've probably hit this wall: the agent keeps forgetting things it should know.

You store user preferences. The agent ignores them. You correct it. It makes the same mistake tomorrow. You add more context to the prompt. It works for a while, then breaks again.

So you reach for the obvious solution: a vector database. Store everything, retrieve what's relevant, inject it into the prompt. Problem solved, right?

Not quite.

I've been building agents for a while now, and I keep seeing the same pattern. Vector retrieval gets you 70% of the way there. The last 30% is where things fall apart—and it's the part that actually matters for user experience.

The Pattern That Keeps Breaking

Here's a scenario I've seen repeatedly:

A user tells your agent: "I prefer dark mode." You store it. Great.

A week later, the user says: "Actually, I've switched to light mode—easier on my eyes during the day."

What happens? Your vector store now has two contradictory statements. When the agent retrieves "user display preferences," it might get either one. Or both. It has no way to know which is current, which is outdated, or how confident it should be in either.

The agent ends up flip-flopping, or worse, confidently asserting the wrong preference.

This isn't a retrieval problem. It's a representation problem. We're treating memory as storage when it should be treated as belief.

Here's another failure mode:

A support agent learns that asking clarifying questions before proposing a fix reduces churn. You observe this pattern manually. The agent never does.

You can store past conversations. You can retrieve similar ones. But nothing in a vector store turns "this approach worked" into "do this more often."

That's not memory. That's logging.

I call this the storage fallacy: assuming persistence automatically produces understanding.

A List of Embeddings Is Not Memory

Vector databases are excellent at one thing: finding semantically similar content. But similarity isn't the same as relevance, and retrieval isn't the same as remembering.

Human memory doesn't work like a filing cabinet where you pull out documents. It's an active system that:

Reinforces things you encounter repeatedly
Forgets things you don't use
Updates when new information contradicts old information
Weighs memories by confidence, not just similarity

When you tell someone the same thing three times, they become more confident it's true. When you contradict yourself, they become less certain about both statements. When you don't mention something for months, it fades.

None of this happens in a vector store. Every embedding sits there with equal weight, forever, until you manually delete it.

Memory as Belief

The shift that changed how I think about this: stop treating memories as facts and start treating them as beliefs.

A belief has properties that a stored fact doesn't:

Confidence. How certain are we that this is true? A preference mentioned once in passing is different from one stated emphatically three times.

Reinforcement. When we encounter similar information again, confidence should increase. "User likes dark mode" and "User prefers dark themes" shouldn't create two entries—they should strengthen one belief.

Decay. Beliefs that aren't accessed or reinforced should fade over time. Not deleted, but deprioritized. The user's preference from two years ago probably matters less than what they said last week.

Contradiction handling. When new information conflicts with existing beliefs, both should be affected. The old belief loses confidence. The new one starts with moderate confidence. The system acknowledges uncertainty rather than pretending it doesn't exist.

Beliefs are functions of time, not static rows.

Here's what this looks like in practice:

"User prefers dark mode" → stored with confidence 0.6

User mentions dark mode again → confidence rises to 0.75

User later says "I prefer light mode" →
  - "dark mode" drops to 0.45
  - "light mode" created at 0.65

Now the agent doesn't see "two facts." It sees uncertainty. It can say "I think you prefer light mode now, though you used to prefer dark" instead of confidently asserting the wrong thing.

Concretely, a belief looks more like:

{
  "content": "User prefers dark mode",
  "confidence": 0.45,
  "last_verified_at": "2024-01-12",
  "reinforcement_count": 3,
  "source": "user_statement"
}

Not a document. Not an embedding. A stateful object.

Memory Isn't One Thing

The other realization: I've been lumping together very different types of memory under one label.

Cognitive science has known for decades that human memory isn't monolithic. There are distinct systems serving different purposes. For agents, four types matter most:

Semantic memory is factual knowledge. "The user is a backend engineer." "They work at a fintech company." "They prefer Python for scripting." These are beliefs about the world and the user that persist across sessions.

Episodic memory is experiential. Not just what happened, but the context around it—when, where, what was the emotional tone, what was the outcome. "Last Tuesday, the user was frustrated about a deployment failure. We helped them set up monitoring. They were satisfied." This is richer than extracted facts.

Working memory is the active scratchpad. What's the current goal? What context is relevant right now? This is session-scoped and limited in capacity—you can't hold everything active at once.

Procedural memory is learned skills. Not facts, but patterns of successful action. "When a user says they want to cancel, offering a discount before processing usually leads to retention." This is how agents get better at their jobs over time.

This isn't academic purity. Each type maps cleanly to an engineering responsibility:

Semantic → long-term knowledge store
Episodic → append-only experience log
Working → active context assembler
Procedural → policy selection system

Most "memory" implementations I've seen treat everything as semantic memory. They miss the temporal richness of episodes, the active focus of working memory, and the skill accumulation of procedural memory.

Bigger Models Don't Fix Memory

Larger context windows help agents remember more, but they don't help agents remember better.

Without reinforcement, decay, and contradiction handling, bigger context just means more clutter. You can fit 100k tokens in a prompt. You still can't represent "I used to believe X but now I believe Y with moderate confidence."

Many agent failures that look like reasoning problems are actually memory problems. The model reasons fine—it's just reasoning over the wrong context because retrieval gave it stale or contradictory information with no signal about which to trust.

What We're Building

This is what led me to build Engram, a cognitive memory layer for AI agents.

The core idea: memory should behave like memory, not like storage.

Think of it as a cognitive operating layer that sits between your agent and its storage. The current focus is correctness and cognitive behavior, not feature breadth.

When you store a belief, it has confidence. When you encounter it again, it reinforces. When you contradict it, both beliefs adjust. When you don't access it, it decays. When you retrieve, you get not just similarity but a weighted score that accounts for confidence, recency, and relevance.

You can store episodes with full context—entities, emotional valence, outcomes and have the system extract semantic beliefs automatically. You can record what worked and what didn't, and have procedural patterns emerge that make your agent better over time.

We consistently see agents stop repeating the same errors after a few dozen interactions. That's not because we tuned anything, it's because the memory system does what memory should do.

It's an HTTP API. You plug it into whatever agent framework you're using. It handles the cognitive complexity so your agent code stays clean.

What This Isn't

A few things Engram explicitly doesn't do, because scope matters:

Not an agent framework. We're not competing with LangChain or CrewAI. We're infrastructure they can use.

Not a vector database. We use vectors, but that's an implementation detail. The interface is cognitive, not geometric.

Not long-term context stuffing. We don't build giant prompts. We build systems that knows what matters.

The goal is to be the memory layer—one thing, done well.

Who This Is For

If you're building agents that interact with users over time, and you've been frustrated by:

Context limits forcing you to drop important information
Agents repeating the same mistakes
Preferences that don't stick
No sense of learning or improvement

Then this might be useful.

It's early. The API is stabilizing. We're looking for people who want to build with it and give feedback.

What's Next

We're publishing examples and benchmarks showing the learning dynamics in action. There's a demo that shows error rates dropping as an agent accumulates procedural memory, not because we tweaked numbers, but because the memory system is doing what memory should do.

Here's the repo: github.com/Harshitk-cp/engram

If this resonates with how you've been thinking about agent memory, I'd love to hear from you. If you think I'm wrong about something, I'd love to hear that too.

Building in public means being wrong in public. That's the tradeoff for building things that actually work :)

Integrating RTMP and WebRTC for Real-Time Streaming

Harshit Kumar — Thu, 27 Jun 2024 22:34:55 +0000

Introduction
In the rapidly evolving landscape of real-time communication and streaming, integrating different protocols to leverage their unique strengths is crucial. This project presents an RTMP server inspired by the LiveKit Ingress Service. It receives an RTMP stream from a user in a room, transcodes the audio from AAC to Opus (making it WebRTC compatible), and the video to H264, then pushes it to WebRTC tracks connected to clients. The server acts as a peer, maintaining a peer-to-peer (P2P) connection with each client.

Why RTMP and WebRTC?
RTMP: A Proven Protocol for Live Streaming
Real-Time Messaging Protocol (RTMP) is a mature and robust protocol widely used for live streaming. It provides low-latency transmission of audio, video, and data over the Internet. RTMP is favored for its ability to handle high-quality streams with minimal buffering and its support for a variety of codecs and formats. This makes it an excellent choice for ingesting live video streams.

WebRTC: Real-Time Communication in the Browser
Web Real-Time Communication (WebRTC) is a cutting-edge technology that enables real-time audio, video, and data sharing directly between browsers without the need for plugins. WebRTC is designed for low-latency communication, making it ideal for video conferencing, live streaming, and interactive applications. Its peer-to-peer architecture ensures efficient data transmission and scalability.

Integrating RTMP and WebRTC: The Best of Both Worlds
By integrating RTMP for stream ingestion and WebRTC for stream delivery, we can create a powerful real-time streaming solution. RTMP handles the initial high-quality stream intake, and WebRTC ensures efficient, low-latency distribution to end-users. This combination provides a seamless streaming experience with the reliability of RTMP and the real-time capabilities of WebRTC.

Features
RTMP to WebRTC: Receives RTMP streams and delivers them to WebRTC clients.
Audio Transcoding: Transcodes AAC audio to Opus for WebRTC compatibility.
Video Transcoding: Ensures video is encoded in H264 for WebRTC delivery.
Webhook Notifications: Uses webhooks to notify the publishing state of the stream to different rooms.
WebSocket Signaling: Establishes WebRTC connections using WebSockets for offer/answer exchange.
Concurrency for Performance: Utilizes Go's concurrency patterns and channels to enhance streaming performance and reduce latency.

Core Libraries and Packages
Pion WebRTC: Used for handling WebRTC connections.
Yuptopp RTMP: Used for handling RTMP streams.
fdkaac: For AAC decoding.
gopkg.in/hraban/opus.v2: For Opus encoding.
go-chi/chi: Lightweight, idiomatic, and composable router for building Go HTTP services.
logrus: For logging.

How It Works
RTMP Server
The RTMP server listens for incoming RTMP streams. When a stream is published:

Audio Processing: Decodes AAC audio and encodes it into Opus format using fdkaac and opus.
Video Processing: Ensures the video stream is in H264 format.
WebRTC Integration: Sends processed audio and video to WebRTC tracks connected to clients.
WebRTC Connection
The WebRTC connection is established via WebSockets:

WebSocket Handler: Manages WebRTC signaling (offer/answer exchange) using WebSockets.
Peer Connection: Each client establishes a peer connection with the server.
Track Delivery: Delivers audio and video tracks to clients via WebRTC.
Webhooks
Webhooks listen to the audio and video channels and notify the state of streams to their subscribers:

Notifications: Sent when streams start or stop using a webhook manager.

Challenges Faced
One of the primary challenges in this project was the lack of support for Opus audio in RTMP. RTMP and OBS (Open Broadcaster Software) share only one common audio codec: AAC. This posed a problem since WebRTC requires Opus audio for optimal performance. Here's how I tackled this issue:

Initial Approach: External Pipeline
My first solution was to use a separate GStreamer or FFmpeg pipeline to convert the AAC encoded audio. This pipeline would process the audio and pass it to an RTP channel, which would then ingest the audio packets directly into WebRTC. However, this approach increased CPU utilization by 70%, significantly impacting performance when handling multiple streams.

Optimized Solution: In-Memory Encoding
After further research, I discovered a more efficient method. By performing in-memory encoding of the audio buffer directly to the Go channel, I could pass it to WebRTC tracks in the Opus codec. I used the gopkg.in/hraban/opus.v2 package, a Go translation layer for C libraries like libopus and libopusfile, which provide encoders and decoders.

This approach allowed for in-memory translation of the audio layer from AAC to Opus, drastically reducing the performance cost compared to the initial solution. The overhead was minimal, making it almost as efficient as streaming without encoding.

Performance Enhancements
Concurrency: Utilizes Go's concurrency patterns to efficiently handle multiple streams.
Channels: Uses channels for buffering video and audio data, ensuring smooth delivery to WebRTC tracks.
Optimized Transcoding: Efficiently transcodes audio and video to minimize latency.
Conclusion

This project demonstrates the power of combining RTMP and WebRTC to create a real-time streaming solution that is both robust and efficient. By leveraging the strengths of each protocol, we can deliver high-quality, low-latency streams to users seamlessly. Whether you're building a live streaming platform, a video conferencing tool, or any other real-time application, this RTMP server provides a solid foundation for your needs.

Stay tuned for further updates and enhancements to this project, and feel free to contribute!.
Source Code