Jonathan Murray for Backboard.io

Posted on Jun 3

Stateful AI without a database: threads and assistants

#ai #api #beginners #tutorial

LLMs are stateless. Every API call to a raw model is a blank slate. The model has no idea what was said two messages ago. So the moment you want a chatbot that remembers the conversation, you are on the hook for state.

The usual answer is infrastructure. Spin up Postgres to store message history. Add Redis to cache sessions. Stand up a vector database for long-term memory. Write the code that loads history, trims it to fit the context window, stitches it into every prompt, and saves the new turn. That is a lot of plumbing before the bot says hello.

Backboard handles state for you. Two ideas replace the whole stack: threads and assistants. You never run a database.

The model

Three things, nested:

Message is one turn. A user message in, an assistant reply out.
Thread is one conversation. An ordered list of messages. Pass its thread_id on the next call and the model sees the full history.
Assistant is the profile above the thread. It holds the name, default instructions, tools, and memory. One assistant can own many threads, for example one thread per end-user.

Memory lives on the assistant, so it is shared across every thread under it. History lives on the thread. Both persist on Backboard's side. Nothing to provision.

Threads: state within one conversation

Send a first message and a thread is created automatically. The response hands you a thread_id. Pass it back on the next call and the conversation continues with full context.

Python

pip install backboard-sdk

import asyncio
from backboard import BackboardClient

async def main():
    client = BackboardClient(api_key="YOUR_API_KEY")

    first = await client.send_message("My favorite color is blue.")

    # Same thread: the model remembers the previous turn
    second = await client.send_message(
        "What did I just tell you?",
        thread_id=first.thread_id,
    )
    print(second.content)  # "You told me your favorite color is blue."

asyncio.run(main())

JavaScript (Node 18+)

const send = (body) =>
  fetch("https://app.backboard.io/api/threads/messages", {
    method: "POST",
    headers: {
      "X-API-Key": "YOUR_API_KEY",
      "Content-Type": "application/json",
    },
    body: JSON.stringify(body),
  }).then((r) => r.json());

const first = await send({ content: "My favorite color is blue." });

// Same thread: pass the thread_id back
const second = await send({
  content: "What did I just tell you?",
  thread_id: first.thread_id,
});

console.log(second.content);

cURL

# First message, thread auto-created
curl -X POST "https://app.backboard.io/api/threads/messages" \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"content": "My favorite color is blue."}'

# Continue: pass the thread_id from the first response
curl -X POST "https://app.backboard.io/api/threads/messages" \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"content": "What did I just tell you?", "thread_id": "THREAD_ID_FROM_FIRST_RESPONSE"}'

No history table. No prompt-stitching code. The thread_id is your conversation state, and Backboard stores it. When a thread gets long enough to crowd the context window, Backboard summarizes older messages automatically so you do not have to manage trimming.

Assistants: state across conversations

A thread remembers one chat. An assistant remembers the user across many chats. Memory is stored per assistant, so to carry facts into a brand new conversation you reuse the same assistant_id and start a fresh thread.

Python

# Conversation 1
await client.send_message(
    "I'm allergic to peanuts.",
    assistant_id="your-assistant-id",
    memory="Auto",
)

# Conversation 2: new thread, same assistant, memory carries over
reply = await client.send_message(
    "Any dietary restrictions you remember?",
    assistant_id="your-assistant-id",
    memory="Auto",
)
print(reply.content)  # "You mentioned you're allergic to peanuts."

JavaScript (Node 18+)

await send({
  content: "I'm allergic to peanuts.",
  assistant_id: "your-assistant-id",
  memory: "Auto",
});

const reply = await send({
  content: "Any dietary restrictions you remember?",
  assistant_id: "your-assistant-id",
  memory: "Auto",
});

console.log(reply.content);

cURL

curl -X POST "https://app.backboard.io/api/threads/messages" \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"content": "I am allergic to peanuts.", "assistant_id": "your-assistant-id", "memory": "Auto"}'

curl -X POST "https://app.backboard.io/api/threads/messages" \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"content": "Any dietary restrictions you remember?", "assistant_id": "your-assistant-id", "memory": "Auto"}'

This is the part that normally requires a vector database: embedding facts, storing vectors, running similarity search on every request. Here it is one parameter, memory="Auto", and the assistant owns it.

When to pass what

Goal	Pass
Keep talking in the same chat	The same `thread_id` every call
New chat, but remember the user	Omit `thread_id`, reuse the same `assistant_id` with `memory="Auto"`
One assistant, many users	One `assistant_id`, a separate `thread_id` per user

That last row is the whole pattern for a multi-user app. One assistant defines your AI. Each user gets their own thread. State stays separated without a schema you designed, a migration you ran, or a database you babysit.

The point

Stateless models force you to build a state layer. Backboard makes that layer part of the API. Threads hold the conversation. Assistants hold the profile and the memory. Both persist server-side. You ship a stateful, multi-user AI app and never write a line of database code.

Grab a key and try it: app.backboard.io

Architecture in full: docs.backboard.io/concepts/architecture

Top comments (9)

Syed Ahmer Shah • Jun 3

Calling the stateless nature of LLMs an 'engineering tax' is the perfect way to frame this. Everyone starts building an AI app thinking about prompt engineering, only to realize 80% of their time is actually spent babysitting database schemas, managing Redis caches, and manually stitching together conversation arrays.

Benjamin Nguyen • Jun 3

I am curious! Are you using any security protocol to prevent any data breach's from hacker?

Jonathan Murray Backboard.io • Jun 3

Yes. We're SOC 2, with encryption in transit and at rest, and scoped API keys you control and can rotate to lock down access. You also keep full CRUD over your threads and memory, so you can export or delete your data anytime. And if you don't want data leaving your walls at all, you can run Backboard on an open-source model fully on-prem or air-gapped, so there's no external attack surface to breach. Happy to go deeper if you're evaluating for something specific.

Benjamin Nguyen • Jun 3

neat! Sure. Do you have linkedin? If you feel more comfortable to continue our conservation there.

Jonathan Murray Backboard.io • Jun 3

Of course! linkedin.com/in/aimemory

Benjamin Nguyen • Jun 3

Thank you! I will sent you an invitation.

Echo • Jun 3

Good walkthrough. The 'observe before you optimize' framing is what most agent benchmarks skip.

Jonathan Murray Backboard.io • Jun 3

Thanks! We've been doing a poor job communicating all of our stack value lol so I'm trying to release a blog a day explaining the different features and benefits!

Some comments may only be visible to logged-in visitors. Sign in to view all comments.