DEV Community

Jacob H
Jacob H

Posted on

I built a memory API that keeps long AI conversations fast without losing context

As conversations get longer, passing the full message history to your LLM
on every request gets slow and expensive. At 500 messages you're sending
thousands of tokens every single time just to maintain context. Most
developers either cap the history and lose important details, or keep
growing the context and watch latency and costs climb.

ChatSorter solves this by automatically compressing chat history in the
background while keeping the facts that matter.

How it works

Every message you send to ChatSorter gets buffered. Every 5 messages, a
background server summarizes the batch using a local LLM, scores each
message by importance, and stores the result. High-importance facts —
names, allergies, preferences, relationships, get promoted to a
master memory so they are never lost even as older messages
get compressed away.

When you need context, a single search call returns the most relevant memories based on importance score, and recency.
Your bot gets the right context every time without ever seeing the full
raw history.

The results

In a head-to-head test against a standard context window approach over
150 messages:

  • ChatSorter: 100% recall accuracy, 135 avg tokens per request
  • Standard: 20% recall accuracy, 1914 avg tokens per request
  • Token reduction: 89%
These results are from a controlled benchmark. 5 personal facts
introduced in the first 5 messages, tested after 150 total messages
of filler conversation. Real world results will vary depending on
how facts are distributed across messages and how your users write.

The standard bot forgot almost everything or timed out. ChatSorter remembered all of
it at a fraction of the cost.

Integration

Two API calls. few changes to your chat logic.

# Store a message
requests.post("/process", json={
    "chat_id": "user_123",
    "message": "I'm allergic to peanuts and shellfish"
}, headers={"Authorization": "Bearer YOUR_KEY"})

# Retrieve relevant context before your next LLM call
result = requests.post("/search", json={
    "chat_id": "user_123",
    "query": "what do I know about this user"
}, headers={"Authorization": "Bearer YOUR_KEY"})

memories = result.json()["memories"]
# inject memories into your system prompt
Enter fullscreen mode Exit fullscreen mode

Who it's for

ChatSorter is built for developers running chatbots or AI assistants where
users return across multiple sessions, companion apps, productivity
assistants, tutoring bots, or anything where the bot should remember who
it's talking to.

Get a free beta key

Docs and examples: https://github.com/codeislife12/Chatsorter
Free beta key: chatsorter-website.vercel.app

Currently in beta, free access, no credit card required.

Top comments (0)