DEV Community: kaushal trivedi

Starting a Discord for AI Agent Builders, MCP Developers, and Open-Source Contributors

kaushal trivedi — Thu, 18 Jun 2026 06:34:00 +0000

We're launching the CogniCore community and looking for developers interested in AI agents, memory systems, MCP, LangChain, CrewAI, and open-source infrastructure.

Over the past few months we've built:

MCP Integration
LangChain Integration
CrewAI Integration
OpenAI Agents SDK Integration
Memory & Reflection Systems
Agent Benchmarking Infrastructure

The project has already crossed 7,000+ downloads, and we're now looking to build a community of contributors, researchers, and builders who want to explore how agents can learn from experience.

Current areas we're working on:

Agent memory architectures
Reflection systems
Replay & evaluation
Benchmarks
MCP ecosystem tooling
Developer experience

If you're interested in contributing, testing ideas, discussing benchmarks, or helping shape the direction of the project, we'd love to have you involved.

GitHub:
https://github.com/cognicore-dev/cognicore-my-openenv

Discord:
https://discord.gg/E8NSyZUd9

We're especially looking for people who enjoy building, experimenting, and sharing ideas around open-source AI infrastructure.

Stop rebuilding memory and orchestration for every AI agent you build

kaushal trivedi — Tue, 26 May 2026 07:09:51 +0000

Your agent fails

You restart it

It fails at the exact same thing again

Sound familiar

The problem every AI team hits

Every team building autonomous agents eventually rebuilds the same three things

Memory so the agent remembers what failed last time

Retry logic so it does not loop forever on the same broken approach

Orchestration so multiple agents do not step on each other

You build it It works You start the next project and build it again from scratch

There is no standard layer for this Until now

Introducing NEXUS

One line install Works with any agent Gets smarter over time

pip install cognicore env

import cognicore as cc

env = cc.make SafetyClassification Easy v1
agent = cc.AutoLearner

cc.train agent=agent env=env episodes=30
score = cc.evaluate agent=agent env=env episodes=5

What makes it different

Memory that compounds

The more tasks NEXUS handles the better it gets

text
Week 1 0.05 per fix
Week 4 0.02 per fix
Week 8 0.01 per fix

An agent with 6 months of memory on your codebase is fundamentally different from one starting cold

Agent Immune System

Protect any agent from prompt injection jailbreaks and token bombs

python
from cognicore.immune import NexusShield

safe_agent = NexusShield agent=your_agent

Replay and Time Travel

Every decision event sourced Rewind any task to any step Branch and try a different strategy

cognicore replay task abc123
cognicore branch task abc123 step 3 policy minimal

6 Enterprise Integrations

Label a GitHub issue nexus NEXUS fixes it opens a PR automatically

bash
cognicore integrations setup

Live Dashboard

bash
cognicore ui

The research finding that surprised everyone

I ran ablation studies comparing multi agent configurations

Expected more specialized agents equals better results

Actual

minimal Coder Tester only 19 20 solved 0.014
full pipeline 5 agents 18 20 solved 0.009
review first ordering 18 20 solved 0.009

The Reviewer agent costs minus 1 solve rate and plus 9642 tokens

More agents Worse performance More expensive

An offline RL agent trained on 220 trajectories independently confirmed minimal policy wins 89 percent of task states

For developers building AI agents

Stop rebuilding memory from scratch on every project

from cognicore import Memory ReflectionEngine

mem = Memory
ref = ReflectionEngine memory=mem

action reason confidence = ref.suggest_override
null handling
guard fix

For ML researchers

38 built in environments across 6 domains

4 RL agent types with clean interfaces

Ablation infrastructure with statistical rigor

460 plus trajectories exportable for offline RL

SWE bench style evaluation built in

CognitiveMemory with working episodic semantic and procedural layers

from cognicore import Experiment

exp = Experiment
name=memory ablation
env id=SafetyClassification v1

exp.add_variant no memory cc.AutoLearner
exp.add_variant with memory cc.AutoLearner

results = exp.run episodes=50

For CTOs and engineering leads

Self hostable

Open source core Apache 2.0

Token cost tracking built in

Budget controls

Full audit log

GitHub Slack Linear integrations

text
Devin 500 month
NEXUS 3 to 15 month

Numbers

1700 plus downloads in first week

95 percent solve rate on SWE style benchmark

472 tests passing

62 built in environments

153 public API exports

Zero required dependencies for core

6 enterprise integrations

460 plus trajectories stored for offline RL

Try it in 2 minutes

bash
pip install cognicore env
cognicore ui
cognicore integrations setup

python
import cognicore as cc

env = cc.make GridWorld v1
agent = cc.AutoLearner

cc.train agent=agent env=env episodes=50

print
cc.evaluate agent=agent env=env episodes=5




GitHub

github com Kaushalt2004 cognicore my openenv

PyPI

pypi org project cognicore env

Docs

cognicore readthedocs io

Open source Apache 2.0 Solo built Actively maintained

Star the repo if this solves a problem you have hit before

Built an AI agent framework, discovered more agents made it worse, and accidentally created cognition infrastructure for AI.

kaushal trivedi — Tue, 26 May 2026 06:56:27 +0000

I want to tell you about the most surprising thing I've found in the past few weeks of building.
I was running ablation studies on a multi-agent system — comparing different configurations of Planner, Coder, Reviewer, Tester, Verifier agents working together. The hypothesis was obvious: more specialized agents = better results. That's how human teams work, right?
Here's what I actually found:
minimal (Coder → Tester only): 19/20 solved 27,476 tokens $0.014
full pipeline (all 5 agents): 18/20 solved 37,118 tokens $0.009
review_first ordering: 18/20 solved 45,591 tokens $0.009
The reviewer agent costs -1 solve rate and +9,642 tokens. It makes things worse.
I ran this three times across different seeds thinking I'd made a mistake. Same result every time. I then trained a Q-Learning agent on 220 execution trajectories to independently verify — it confirmed that the minimal policy dominates 89% of task states.
More agents. Worse performance. More expensive.
I genuinely did not expect that.

How this started
A few weeks ago I was frustrated by a pattern I kept seeing in autonomous agents: they'd fail at something, you'd restart, and they'd fail at the exact same thing again. No memory. No learning. Every session starts cold.
It felt like hiring someone who forgets everything overnight. Imagine telling your engineer the same bug exists every single morning.
So I asked a weird question: what if memory lived in the environment instead of the agent?
Instead of modifying the agent to have memory, the environment stores every failure and injects it back as context next time. The agent doesn't need to change at all — any LLM, any RL agent, any rule-based system automatically gets memory for free.
That was the insight that became CogniCore.

What I built
Over the past few weeks this evolved from a simple memory experiment into something I'm calling NEXUS — a runtime cognition layer for autonomous AI agents.
Here's what it does:
Persistent cross-session memory
Every failure is stored. Every success is stored. When a similar task appears, the agent gets context about what worked and what didn't — not just in this session but across all previous sessions. Forever.
python# Agent remembers guard_fix failed 6 times for null_handling

Automatically suggests rewrite instead

Without any changes to the agent itself

from cognicore import Memory, ReflectionEngine

mem = Memory()
ref = ReflectionEngine(memory=mem)

After enough failures...

action, reason, confidence = ref.suggest_override("null_handling", "guard_fix")

→ action="rewrite", confidence=0.87

→ reason="guard_fix failed 6/6 times, rewrite succeeded 3/3"

The compounding effect
This is the part that genuinely excites me. The more tasks NEXUS handles, the cheaper and faster it gets:
Week 1: cost per fix $0.05 (no memory, tries everything)
Week 4: cost per fix $0.02 (knows what doesn't work)
Week 8: cost per fix $0.01 (skips failed approaches immediately)
I measured this. It's real. An agent with 6 months of memory on your codebase is fundamentally different from one starting cold — and that difference compounds every single day.
NEXUS multi-agent runtime
This is where it gets interesting. NEXUS coordinates specialized agents:
Planner → decomposes the issue
Coder → generates patches

Tester → validates in sandbox
Memory → checks past failures before each attempt
And based on my ablation research — no Reviewer. The data is clear.
Agent Immune System
A DQN-backed threat detector that learns to block prompt injection, jailbreaks, and token bomb attacks. It gets better with every attack it sees, developing "antibodies" for known threats.
pythonfrom cognicore.immune import NexusShield

agent = NexusShield(agent=your_agent)

Now protected. Learns from every interaction.

Replay and time travel
Every agent decision is event-sourced. You can rewind any task to any step and branch from that point with a different strategy. The RL navigator learns which branches lead to success over time.
bashcognicore replay --task abc123 --from-step 3
cognicore branch --task abc123 --step 3 --policy minimal
6 enterprise integrations
GitHub Issues auto-trigger (label nexus → auto-fix → PR), CI failure fixer, Slack live updates, Linear integration, scheduled overnight runs, memory-backed PR review.
bashcognicore integrations setup

Interactive wizard connects GitHub, Slack, Linear

Then just label an issue with 'nexus' and watch it fix itself

The benchmark results
Policy comparison (20 tasks, 3 seeds, SWE-style):

minimal 19/20 (95%) 27,476 tokens $0.014
full_pipeline 18/20 (90%) 37,118 tokens $0.009

review_first 18/20 (90%) 45,591 tokens $0.009

RL policy learning:
220 trajectories → Q-Learning → 11,000 updates
Learned: minimal wins 89% of states
Exception: test_first wins for long-description tasks
Honest caveat: these are rule-based agents on curated tasks, not real LLMs on production repos. The architecture is designed for LLM substitution — we're working on that now. But the orchestration findings are real and statistically significant.

The CognitiveMemory system
This is the part I'm most proud of technically. It's a three-layer biological memory model:
pythoncog = cc.CognitiveMemory()

After 20 experiences...

result = cog.recall(category='null_handling')

Returns:

recommended_action: 'rewrite'

confidence: 0.75

sources_used: ['episodic', 'semantic', 'procedural']

episodic: 3 past null_handling fixes

semantic: accuracy=0.75 for this category

procedural: rule learned from repetition

Working memory (last 7 items), episodic memory (specific past experiences), semantic memory (category-level patterns), procedural memory (rules learned from repetition). Each layer contributes to the recommendation. The agent doesn't just remember — it learns rules from repeated experience.

What this could become
I keep thinking about this framing: every infrastructure company starts by solving a problem that everyone has but nobody has built proper tooling for.
AWS solved "I need servers but don't want to manage them."
Docker solved "it works on my machine."
Kubernetes solved "I need to orchestrate containers."
The autonomous agent space right now feels like pre-Docker. Every team is rebuilding memory, retry logic, and orchestration from scratch. Every deployment is fragile. Nobody has won the "cognition infrastructure" layer.
That's what NEXUS is trying to be. Not an agent. Not a wrapper. The layer underneath that makes any agent smarter, cheaper, and more reliable over time.

The honest part
I'm one person. This is Alpha. There are bugs — I've documented four known ones in the repo and I'm fixing them as fast as I can. The immune system doesn't catch prompt injection yet. The SemanticMemory fuzzy matching isn't as good as I want it to be.
But the core architecture works. The memory compounds. The ablation finding is real. The CognitiveMemory recommendation system actually suggests the right action after enough experience.
1,700+ downloads in the first week. Getting good traction on r/reinforcementlearning. Interesting conversations starting with some folks in the agent memory space.

Try it
bashpip install cognicore-env

Quick demo

python -c "
import cognicore as cc

env = cc.make('SafetyClassification-Easy-v1')
agent = cc.AutoLearner()
cc.train(agent=agent, env=env, episodes=30)
score = cc.evaluate(agent=agent, env=env, episodes=5)
print(f'Score: {score:.2%}')
"

Or open the dashboard

cognicore ui
GitHub: github.com/Kaushalt2004/cognicore-my-openenv

The reviewer finding still surprises me every time I look at it. I expected the paper to say "multi-agent coordination improves performance." Instead it says "be very careful what agents you add."
I think that's a more interesting finding honestly.