We have spent a decade watching hackers grind through the same 48-hour cycle at MLH: build something brilliant, realize it has the memory of a gold...
For further actions, you may consider blocking this person and/or reporting abuse
Quite an interesting API integration to MLH! Can't wait to see how it goes!
We thought the friendship was a no brainer 😀
You guys certainly made a great choice indeed! Congrats Jonathan :D
Can't wait to see what these builders do with it! Let's go!
Great news.
woo woo!
I agree with the problem statement, but I’m not fully convinced the solution space is “one stateful API to rule them all.” 🤔
Memory in AI apps isn’t just persistence — it’s:
relevance filtering
temporal decay
user intent modeling
If those concerns are abstracted too aggressively, we risk trading engineering tax for loss of control and explainability.
Curious how Backboard handles things like memory pruning and conflict resolution?
Hey Sven! Really appreciate the depth here. These are exactly the right questions to be asking.
You're correct that memory isn't just persistence, and we'd push back hard on any system that treated it that way. The "one API" framing is about abstracting the infrastructure, not the policy. Control stays with you.
On your specific concerns:
Relevance filtering there's a sensible default that handles what gets extracted from conversations, but you're not locked into it. Through the custom_fact_extraction_prompt field on the assistant, you define in natural language exactly what the system should treat as worth remembering. Your schema, your rules.
Conflict resolution and pruning the custom_update_memory_prompt field gives you the same level of control over how memories evolve. You write the prompt that governs when memories get added, updated, or deleted. Newer signal wins by default, but you're defining what "newer signal" means and how aggressively the system acts on it.
User intent modelling users can also set memory behaviour conversationally at runtime, things like "don't remember anything about X" or "always surface Y when relevant." Control exists at both the developer layer and the end user layer.
These aren't toggle-style config options. They're natural language prompt overrides at the assistant level, which means the memory behaviour is readable, auditable, and fully in your hands.
On the "loss of control" concern more broadly: the abstraction handles the plumbing. The policy is yours.
Engineering tax is the perfect phrase for this.
Every team building AI agents solves memory and state management from scratch. The first 80% feels productive. The last 20% making it actually work reliably is where the tax shows up. And it shows up the same way for everyone.
What frustrates me isn't that it's hard. It's that we're all solving it separately.
What do you think is the biggest blocker to a shared solution? Technical? Or just no industry consensus on what memory should look like?
Great piece. This conversation needs to happen more. 🙌
The "engineering tax" framing nails it. I run an automation consultancy and the pattern I see repeatedly is: client wants an AI agent that remembers user preferences across sessions, team spends 3 weeks building a bespoke memory layer with Redis + vector DB + custom retrieval logic, and then realizes they need decay, conflict resolution, and pruning — which is another 3 weeks.
The trading system comment from @juangonzalez below is the most telling example here. Five distinct failure modes from stateless memory, and the fix was 200 lines of structured JSONL. The gap isn't that we lack tools — it's that every team rediscovers the same failure modes independently.
One thing I'd push on: the 17,000+ model support is great for flexibility, but the real value proposition is the memory abstraction, not the routing. Teams can switch LLMs easily. They can't switch memory architectures easily. That's where the lock-in tax actually lives.
Esto me toca de cerca. Opero un sistema de trading multi-agente con más de 20 agentes autónomos. El "impuesto sin estado" casi lo destruye.
El sistema genera estrategias, las prueba, mata perdedores, promueve ganadores. Después de 40 días y 34.000 estrategias, descubrí que los mayores problemas no eran de estrategia — eran de estado.
Cinco formas concretas en que la falta de estado me costó:
La memoria existía pero los agentes nunca la consultaban. El log de decisiones estaba ahí — nadie lo leía antes de decidir.
Mi mejor agente generó el 87% de los beneficios. El sistema lo mató por 5 pequeñas pérdidas porque la función de kill no comprobaba PnL acumulado. El estado estaba registrado. La lógica lo ignoraba.
Después de 30 días, la memoria tenía miles de entradas. Cargar todo hacía a los agentes más lentos sin mejorar decisiones. Necesitaba categorías tipadas y recuperación filtrada, no volcados crudos.
Memorias viejas de un mercado en tendencia sesgaban decisiones durante un mercado lateral. Hechos verdaderos hace 30 días eran activamente dañinos hoy.
Incluso después de arreglar la recuperación, algunos agentes seguían ignorando el contexto inyectado. Tuve que añadir un bucle de feedback que registra si la memoria recuperada realmente influyó cada decisión.
El fix fueron 200 líneas de Python. JSONL por agente, entradas estructuradas, filtradas por tipo de evento y régimen. Sin vector DB. Sin embeddings. Sin framework.
This is exactly the kind of hard-won experience we built Backboard for.
40 days, 34,000 strategies, and real losses before arriving at 200 lines of Python. That is not a small fix. That is expensive, painful, and deeply specific knowledge that most teams never reach. You diagnosed five distinct failure modes that most people do not even have names for yet. That is genuinely impressive.
We hope your solution holds. Seriously. But if it does not, or when the next edge case shows up, that is what Backboard is here for. Not to replace what you built, but so the next developer does not have to spend a month getting there.
We built Backboard so your peers can skip the 40 days and get straight to what you already know. Less time, less stress, less cost of discovery.
———-
Esto es exactamente el tipo de experiencia que nos motivó a construir Backboard.
40 días, 34,000 estrategias, y pérdidas reales antes de llegar a 200 líneas de Python. Eso no es una solución simple. Es conocimiento caro, doloroso, y muy específico que la mayoría de equipos nunca alcanza. Diagnosticaste cinco modos de fallo distintos que la mayoría ni siquiera sabe nombrar todavía. Eso es genuinamente impresionante.
Esperamos que tu solución aguante. En serio. Pero si no es así, o cuando aparezca el próximo caso límite, para eso está Backboard. No para reemplazar lo que construiste, sino para que el próximo desarrollador no tenga que pasar un mes llegando a donde tú ya estás.
Construimos Backboard para que tus pares puedan saltarse los 40 días e ir directo a lo que tú ya sabes. Menos tiempo, menos estrés, menos costo de descubrimiento.