<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: kaushal trivedi</title>
    <description>The latest articles on DEV Community by kaushal trivedi (@kaushalt2004).</description>
    <link>https://dev.to/kaushalt2004</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3951922%2Fcac77414-3b69-4ae6-b8e2-d277624a17ab.png</url>
      <title>DEV Community: kaushal trivedi</title>
      <link>https://dev.to/kaushalt2004</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kaushalt2004"/>
    <language>en</language>
    <item>
      <title>Stop rebuilding memory and orchestration for every AI agent you build</title>
      <dc:creator>kaushal trivedi</dc:creator>
      <pubDate>Tue, 26 May 2026 07:09:51 +0000</pubDate>
      <link>https://dev.to/kaushalt2004/stop-rebuilding-memory-and-orchestration-for-every-ai-agent-you-build-3lbj</link>
      <guid>https://dev.to/kaushalt2004/stop-rebuilding-memory-and-orchestration-for-every-ai-agent-you-build-3lbj</guid>
      <description>&lt;p&gt;Your agent fails&lt;/p&gt;

&lt;p&gt;You restart it&lt;/p&gt;

&lt;p&gt;It fails at the exact same thing again&lt;/p&gt;

&lt;p&gt;Sound familiar&lt;/p&gt;

&lt;p&gt;The problem every AI team hits&lt;/p&gt;

&lt;p&gt;Every team building autonomous agents eventually rebuilds the same three things&lt;/p&gt;

&lt;p&gt;Memory so the agent remembers what failed last time&lt;/p&gt;

&lt;p&gt;Retry logic so it does not loop forever on the same broken approach&lt;/p&gt;

&lt;p&gt;Orchestration so multiple agents do not step on each other&lt;/p&gt;

&lt;p&gt;You build it It works You start the next project and build it again from scratch&lt;/p&gt;

&lt;p&gt;There is no standard layer for this Until now&lt;/p&gt;

&lt;p&gt;Introducing NEXUS&lt;/p&gt;

&lt;p&gt;One line install Works with any agent Gets smarter over time&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;cognicore &lt;span class="nb"&gt;env&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;import cognicore as cc&lt;/p&gt;

&lt;p&gt;env = cc.make SafetyClassification Easy v1&lt;br&gt;
agent = cc.AutoLearner&lt;/p&gt;

&lt;p&gt;cc.train agent=agent env=env episodes=30&lt;br&gt;
score = cc.evaluate agent=agent env=env episodes=5&lt;/p&gt;

&lt;p&gt;What makes it different&lt;/p&gt;

&lt;p&gt;Memory that compounds&lt;/p&gt;

&lt;p&gt;The more tasks NEXUS handles the better it gets&lt;/p&gt;

&lt;p&gt;text&lt;br&gt;
Week 1   0.05 per fix&lt;br&gt;
Week 4   0.02 per fix&lt;br&gt;
Week 8   0.01 per fix&lt;/p&gt;

&lt;p&gt;An agent with 6 months of memory on your codebase is fundamentally different from one starting cold&lt;/p&gt;

&lt;p&gt;Agent Immune System&lt;/p&gt;

&lt;p&gt;Protect any agent from prompt injection jailbreaks and token bombs&lt;/p&gt;

&lt;p&gt;python&lt;br&gt;
from cognicore.immune import NexusShield&lt;/p&gt;

&lt;p&gt;safe_agent = NexusShield agent=your_agent&lt;/p&gt;

&lt;p&gt;Replay and Time Travel&lt;/p&gt;

&lt;p&gt;Every decision event sourced Rewind any task to any step Branch and try a different strategy&lt;/p&gt;

&lt;p&gt;cognicore replay task abc123&lt;br&gt;
cognicore branch task abc123 step 3 policy minimal&lt;/p&gt;

&lt;p&gt;6 Enterprise Integrations&lt;/p&gt;

&lt;p&gt;Label a GitHub issue nexus NEXUS fixes it opens a PR automatically&lt;/p&gt;

&lt;p&gt;bash&lt;br&gt;
cognicore integrations setup&lt;/p&gt;

&lt;p&gt;Live Dashboard&lt;/p&gt;

&lt;p&gt;bash&lt;br&gt;
cognicore ui&lt;/p&gt;

&lt;p&gt;The research finding that surprised everyone&lt;/p&gt;

&lt;p&gt;I ran ablation studies comparing multi agent configurations&lt;/p&gt;

&lt;p&gt;Expected more specialized agents equals better results&lt;/p&gt;

&lt;p&gt;Actual&lt;/p&gt;

&lt;p&gt;minimal Coder Tester only   19 20 solved   0.014&lt;br&gt;
full pipeline 5 agents      18 20 solved   0.009&lt;br&gt;
review first ordering       18 20 solved   0.009&lt;/p&gt;

&lt;p&gt;The Reviewer agent costs minus 1 solve rate and plus 9642 tokens&lt;/p&gt;

&lt;p&gt;More agents Worse performance More expensive&lt;/p&gt;

&lt;p&gt;An offline RL agent trained on 220 trajectories independently confirmed minimal policy wins 89 percent of task states&lt;/p&gt;

&lt;p&gt;For developers building AI agents&lt;/p&gt;

&lt;p&gt;Stop rebuilding memory from scratch on every project&lt;/p&gt;

&lt;p&gt;from cognicore import Memory ReflectionEngine&lt;/p&gt;

&lt;p&gt;mem = Memory&lt;br&gt;
ref = ReflectionEngine memory=mem&lt;/p&gt;

&lt;p&gt;action reason confidence = ref.suggest_override&lt;br&gt;
null handling&lt;br&gt;
guard fix&lt;/p&gt;

&lt;p&gt;For ML researchers&lt;/p&gt;

&lt;p&gt;38 built in environments across 6 domains&lt;/p&gt;

&lt;p&gt;4 RL agent types with clean interfaces&lt;/p&gt;

&lt;p&gt;Ablation infrastructure with statistical rigor&lt;/p&gt;

&lt;p&gt;460 plus trajectories exportable for offline RL&lt;/p&gt;

&lt;p&gt;SWE bench style evaluation built in&lt;/p&gt;

&lt;p&gt;CognitiveMemory with working episodic semantic and procedural layers&lt;/p&gt;

&lt;p&gt;from cognicore import Experiment&lt;/p&gt;

&lt;p&gt;exp = Experiment&lt;br&gt;
name=memory ablation&lt;br&gt;
env id=SafetyClassification v1&lt;/p&gt;

&lt;p&gt;exp.add_variant no memory cc.AutoLearner&lt;br&gt;
exp.add_variant with memory cc.AutoLearner&lt;/p&gt;

&lt;p&gt;results = exp.run episodes=50&lt;/p&gt;

&lt;p&gt;For CTOs and engineering leads&lt;/p&gt;

&lt;p&gt;Self hostable&lt;/p&gt;

&lt;p&gt;Open source core Apache 2.0&lt;/p&gt;

&lt;p&gt;Token cost tracking built in&lt;/p&gt;

&lt;p&gt;Budget controls&lt;/p&gt;

&lt;p&gt;Full audit log&lt;/p&gt;

&lt;p&gt;GitHub Slack Linear integrations&lt;/p&gt;

&lt;p&gt;text&lt;br&gt;
Devin   500 month&lt;br&gt;
NEXUS   3 to 15 month&lt;/p&gt;

&lt;p&gt;Numbers&lt;/p&gt;

&lt;p&gt;1700 plus downloads in first week&lt;/p&gt;

&lt;p&gt;95 percent solve rate on SWE style benchmark&lt;/p&gt;

&lt;p&gt;472 tests passing&lt;/p&gt;

&lt;p&gt;62 built in environments&lt;/p&gt;

&lt;p&gt;153 public API exports&lt;/p&gt;

&lt;p&gt;Zero required dependencies for core&lt;/p&gt;

&lt;p&gt;6 enterprise integrations&lt;/p&gt;

&lt;p&gt;460 plus trajectories stored for offline RL&lt;/p&gt;

&lt;p&gt;Try it in 2 minutes&lt;/p&gt;

&lt;p&gt;bash&lt;br&gt;
pip install cognicore env&lt;br&gt;
cognicore ui&lt;br&gt;
cognicore integrations setup&lt;/p&gt;

&lt;p&gt;python&lt;br&gt;
import cognicore as cc&lt;/p&gt;

&lt;p&gt;env = cc.make GridWorld v1&lt;br&gt;
agent = cc.AutoLearner&lt;/p&gt;

&lt;p&gt;cc.train agent=agent env=env episodes=50&lt;/p&gt;

&lt;p&gt;print&lt;br&gt;
cc.evaluate agent=agent env=env episodes=5&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;


GitHub

github com Kaushalt2004 cognicore my openenv

PyPI

pypi org project cognicore env

Docs

cognicore readthedocs io

Open source Apache 2.0 Solo built Actively maintained

Star the repo if this solves a problem you have hit before

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>ai</category>
      <category>langchain</category>
      <category>programming</category>
    </item>
    <item>
      <title>Built an AI agent framework, discovered more agents made it worse, and accidentally created cognition infrastructure for AI.</title>
      <dc:creator>kaushal trivedi</dc:creator>
      <pubDate>Tue, 26 May 2026 06:56:27 +0000</pubDate>
      <link>https://dev.to/kaushalt2004/built-an-ai-agent-framework-discovered-more-agents-made-it-worse-and-accidentally-created-3p0h</link>
      <guid>https://dev.to/kaushalt2004/built-an-ai-agent-framework-discovered-more-agents-made-it-worse-and-accidentally-created-3p0h</guid>
      <description>&lt;p&gt;I want to tell you about the most surprising thing I've found in the past few weeks of building.&lt;br&gt;
I was running ablation studies on a multi-agent system — comparing different configurations of Planner, Coder, Reviewer, Tester, Verifier agents working together. The hypothesis was obvious: more specialized agents = better results. That's how human teams work, right?&lt;br&gt;
Here's what I actually found:&lt;br&gt;
minimal (Coder → Tester only):   19/20 solved  27,476 tokens  $0.014&lt;br&gt;
full pipeline (all 5 agents):    18/20 solved  37,118 tokens  $0.009&lt;br&gt;
review_first ordering:           18/20 solved  45,591 tokens  $0.009&lt;br&gt;
The reviewer agent costs -1 solve rate and +9,642 tokens. It makes things worse.&lt;br&gt;
I ran this three times across different seeds thinking I'd made a mistake. Same result every time. I then trained a Q-Learning agent on 220 execution trajectories to independently verify — it confirmed that the minimal policy dominates 89% of task states.&lt;br&gt;
More agents. Worse performance. More expensive.&lt;br&gt;
I genuinely did not expect that.&lt;/p&gt;

&lt;p&gt;How this started&lt;br&gt;
A few weeks ago I was frustrated by a pattern I kept seeing in autonomous agents: they'd fail at something, you'd restart, and they'd fail at the exact same thing again. No memory. No learning. Every session starts cold.&lt;br&gt;
It felt like hiring someone who forgets everything overnight. Imagine telling your engineer the same bug exists every single morning.&lt;br&gt;
So I asked a weird question: what if memory lived in the environment instead of the agent?&lt;br&gt;
Instead of modifying the agent to have memory, the environment stores every failure and injects it back as context next time. The agent doesn't need to change at all — any LLM, any RL agent, any rule-based system automatically gets memory for free.&lt;br&gt;
That was the insight that became CogniCore.&lt;/p&gt;

&lt;p&gt;What I built&lt;br&gt;
Over the past few weeks this evolved from a simple memory experiment into something I'm calling NEXUS — a runtime cognition layer for autonomous AI agents.&lt;br&gt;
Here's what it does:&lt;br&gt;
Persistent cross-session memory&lt;br&gt;
Every failure is stored. Every success is stored. When a similar task appears, the agent gets context about what worked and what didn't — not just in this session but across all previous sessions. Forever.&lt;br&gt;
python# Agent remembers guard_fix failed 6 times for null_handling&lt;/p&gt;

&lt;h1&gt;
  
  
  Automatically suggests rewrite instead
&lt;/h1&gt;

&lt;h1&gt;
  
  
  Without any changes to the agent itself
&lt;/h1&gt;

&lt;p&gt;from cognicore import Memory, ReflectionEngine&lt;/p&gt;

&lt;p&gt;mem = Memory()&lt;br&gt;
ref = ReflectionEngine(memory=mem)&lt;/p&gt;

&lt;h1&gt;
  
  
  After enough failures...
&lt;/h1&gt;

&lt;p&gt;action, reason, confidence = ref.suggest_override("null_handling", "guard_fix")&lt;/p&gt;

&lt;h1&gt;
  
  
  → action="rewrite", confidence=0.87
&lt;/h1&gt;

&lt;h1&gt;
  
  
  → reason="guard_fix failed 6/6 times, rewrite succeeded 3/3"
&lt;/h1&gt;

&lt;p&gt;The compounding effect&lt;br&gt;
This is the part that genuinely excites me. The more tasks NEXUS handles, the cheaper and faster it gets:&lt;br&gt;
Week 1:  cost per fix $0.05  (no memory, tries everything)&lt;br&gt;
Week 4:  cost per fix $0.02  (knows what doesn't work)&lt;br&gt;
Week 8:  cost per fix $0.01  (skips failed approaches immediately)&lt;br&gt;
I measured this. It's real. An agent with 6 months of memory on your codebase is fundamentally different from one starting cold — and that difference compounds every single day.&lt;br&gt;
NEXUS multi-agent runtime&lt;br&gt;
This is where it gets interesting. NEXUS coordinates specialized agents:&lt;br&gt;
Planner → decomposes the issue&lt;br&gt;
Coder   → generates patches&lt;br&gt;&lt;br&gt;
Tester  → validates in sandbox&lt;br&gt;
Memory  → checks past failures before each attempt&lt;br&gt;
And based on my ablation research — no Reviewer. The data is clear.&lt;br&gt;
Agent Immune System&lt;br&gt;
A DQN-backed threat detector that learns to block prompt injection, jailbreaks, and token bomb attacks. It gets better with every attack it sees, developing "antibodies" for known threats.&lt;br&gt;
pythonfrom cognicore.immune import NexusShield&lt;/p&gt;

&lt;p&gt;agent = NexusShield(agent=your_agent)&lt;/p&gt;

&lt;h1&gt;
  
  
  Now protected. Learns from every interaction.
&lt;/h1&gt;

&lt;p&gt;Replay and time travel&lt;br&gt;
Every agent decision is event-sourced. You can rewind any task to any step and branch from that point with a different strategy. The RL navigator learns which branches lead to success over time.&lt;br&gt;
bashcognicore replay --task abc123 --from-step 3&lt;br&gt;
cognicore branch --task abc123 --step 3 --policy minimal&lt;br&gt;
6 enterprise integrations&lt;br&gt;
GitHub Issues auto-trigger (label nexus → auto-fix → PR), CI failure fixer, Slack live updates, Linear integration, scheduled overnight runs, memory-backed PR review.&lt;br&gt;
bashcognicore integrations setup&lt;/p&gt;

&lt;h1&gt;
  
  
  Interactive wizard connects GitHub, Slack, Linear
&lt;/h1&gt;

&lt;h1&gt;
  
  
  Then just label an issue with 'nexus' and watch it fix itself
&lt;/h1&gt;

&lt;p&gt;The benchmark results&lt;br&gt;
Policy comparison (20 tasks, 3 seeds, SWE-style):&lt;/p&gt;

&lt;p&gt;minimal         19/20 (95%)   27,476 tokens   $0.014&lt;br&gt;
full_pipeline   18/20 (90%)   37,118 tokens   $0.009&lt;br&gt;&lt;br&gt;
review_first    18/20 (90%)   45,591 tokens   $0.009&lt;/p&gt;

&lt;p&gt;RL policy learning:&lt;br&gt;
220 trajectories → Q-Learning → 11,000 updates&lt;br&gt;
Learned: minimal wins 89% of states&lt;br&gt;
Exception: test_first wins for long-description tasks&lt;br&gt;
Honest caveat: these are rule-based agents on curated tasks, not real LLMs on production repos. The architecture is designed for LLM substitution — we're working on that now. But the orchestration findings are real and statistically significant.&lt;/p&gt;

&lt;p&gt;The CognitiveMemory system&lt;br&gt;
This is the part I'm most proud of technically. It's a three-layer biological memory model:&lt;br&gt;
pythoncog = cc.CognitiveMemory()&lt;/p&gt;

&lt;h1&gt;
  
  
  After 20 experiences...
&lt;/h1&gt;

&lt;p&gt;result = cog.recall(category='null_handling')&lt;/p&gt;

&lt;h1&gt;
  
  
  Returns:
&lt;/h1&gt;

&lt;h1&gt;
  
  
  recommended_action: 'rewrite'
&lt;/h1&gt;

&lt;h1&gt;
  
  
  confidence: 0.75
&lt;/h1&gt;

&lt;h1&gt;
  
  
  sources_used: ['episodic', 'semantic', 'procedural']
&lt;/h1&gt;

&lt;h1&gt;
  
  
  episodic: 3 past null_handling fixes
&lt;/h1&gt;

&lt;h1&gt;
  
  
  semantic: accuracy=0.75 for this category
&lt;/h1&gt;

&lt;h1&gt;
  
  
  procedural: rule learned from repetition
&lt;/h1&gt;

&lt;p&gt;Working memory (last 7 items), episodic memory (specific past experiences), semantic memory (category-level patterns), procedural memory (rules learned from repetition). Each layer contributes to the recommendation. The agent doesn't just remember — it learns rules from repeated experience.&lt;/p&gt;

&lt;p&gt;What this could become&lt;br&gt;
I keep thinking about this framing: every infrastructure company starts by solving a problem that everyone has but nobody has built proper tooling for.&lt;br&gt;
AWS solved "I need servers but don't want to manage them."&lt;br&gt;
Docker solved "it works on my machine."&lt;br&gt;
Kubernetes solved "I need to orchestrate containers."&lt;br&gt;
The autonomous agent space right now feels like pre-Docker. Every team is rebuilding memory, retry logic, and orchestration from scratch. Every deployment is fragile. Nobody has won the "cognition infrastructure" layer.&lt;br&gt;
That's what NEXUS is trying to be. Not an agent. Not a wrapper. The layer underneath that makes any agent smarter, cheaper, and more reliable over time.&lt;/p&gt;

&lt;p&gt;The honest part&lt;br&gt;
I'm one person. This is Alpha. There are bugs — I've documented four known ones in the repo and I'm fixing them as fast as I can. The immune system doesn't catch prompt injection yet. The SemanticMemory fuzzy matching isn't as good as I want it to be.&lt;br&gt;
But the core architecture works. The memory compounds. The ablation finding is real. The CognitiveMemory recommendation system actually suggests the right action after enough experience.&lt;br&gt;
1,700+ downloads in the first week. Getting good traction on r/reinforcementlearning. Interesting conversations starting with some folks in the agent memory space.&lt;/p&gt;

&lt;p&gt;Try it&lt;br&gt;
bashpip install cognicore-env&lt;/p&gt;

&lt;h1&gt;
  
  
  Quick demo
&lt;/h1&gt;

&lt;p&gt;python -c "&lt;br&gt;
import cognicore as cc&lt;/p&gt;

&lt;p&gt;env = cc.make('SafetyClassification-Easy-v1')&lt;br&gt;
agent = cc.AutoLearner()&lt;br&gt;
cc.train(agent=agent, env=env, episodes=30)&lt;br&gt;
score = cc.evaluate(agent=agent, env=env, episodes=5)&lt;br&gt;
print(f'Score: {score:.2%}')&lt;br&gt;
"&lt;/p&gt;

&lt;h1&gt;
  
  
  Or open the dashboard
&lt;/h1&gt;

&lt;p&gt;cognicore ui&lt;br&gt;
GitHub: github.com/Kaushalt2004/cognicore-my-openenv&lt;/p&gt;

&lt;p&gt;The reviewer finding still surprises me every time I look at it. I expected the paper to say "multi-agent coordination improves performance." Instead it says "be very careful what agents you add."&lt;br&gt;
I think that's a more interesting finding honestly.&lt;/p&gt;

</description>
      <category>chatgpt</category>
      <category>githubcopilot</category>
      <category>ai</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
