DEV Community

AIMock: One Mock Server For Your Entire AI Stack

Anmol Baranwal on April 08, 2026

TL;DR Our CI was flaky, our tests hit live APIs, and every run burned tokens unnecessarily. So the CopilotKit team built AIMock - a mock server tha...

Read full post

Cyber Safety Zone • Apr 11

Great concept—this is exactly the kind of tooling AI teams need as stacks get more complex.

Love how AIMock goes beyond just LLM mocking and actually covers the entire agentic pipeline (LLM + MCP + vector DB + search + rerank, etc.). That’s a huge gap in current testing setups where people mock one layer and unknowingly leave the rest non-deterministic.

The record & replay + drift detection combo is especially powerful—feels like a practical way to keep tests realistic without silently breaking when providers change APIs.

Curious how it performs at scale in large CI pipelines and whether teams mix it with tools like MSW or fully replace their existing mocks?

Anmol Baranwal CopilotKit • Apr 12

thanks! on the MSW question -- you don't have to fully replace it. the recommendation is to keep MSW for general REST/GraphQL mocking and aimock for AI-specific endpoints. the docs have a section specifically on using them alongside each other.
aimock.copilotkit.dev/migrate-from...

as for CI scale, it runs as a plain HTTP server so any process can hit it and the Docker image makes it straightforward to drop into pipelines.
aimock.copilotkit.dev/aimock-cli/

Deborah Conner • Apr 12

Hello handsome, how are you doing ?

Vasu Ghanta • Apr 13

Excellent article on AIMock! Love how it mocks the entire agentic stack—from LLMock with 11+ providers to MCPMock, A2AMock, VectorMock, and even overlooked services like search/rerank. The drift detection, record & replay, and chaos testing are game-changers for reliable, fast CI without token burn or flakiness. Open-source gold for 2026 AI devs—huge thanks to the CopilotKit team! 🚀

neuzhou • Apr 8

The drift detection is the standout feature here. I've gone through 12 agent codebases and the number of times I've seen tests pass against mocks that no longer match the real API is depressing. Most teams find out when a user reports it, not when CI catches it.

The chaos testing part maps to real problems too. Most agents I've analyzed have zero error handling for mid-stream disconnects -- they just crash or silently drop the partial response. Being able to inject that in tests before it happens in prod is worth the whole tool.

Anmol Baranwal CopilotKit • Apr 9

yep, drift detection, record & replay, chaos testing are the most useful things here that no mocking tool has... and it's all open source so if you want to contribute or extend it, go for it. hope the community loves this, deserves all the attention :)

Archit Mittal • Apr 11

Drift detection is the killer feature here. Most AI mock servers just replay canned responses, but the real pain point is when prompts evolve and tests still pass against stale fixtures. Having the mock flag when real API behavior has drifted prevents an entire class of bugs that only surface in production. Would love to see cost tracking in mock mode too — estimating token usage during tests could catch expensive prompt patterns before they hit real APIs.

Anmol Baranwal CopilotKit • Apr 12

yep, the ability to run daily drift detection to catch provider changes is super practical :)

cost tracking is an interesting idea but feels like a separate use case. feel free to create a discussion in the repo if you are using aimock & really want this.

Dark Coder • Apr 9

Really well written and whoever worked on the docs deserves credit, the drift detection page is very clear on why the problem exists, not just how to use the tool.

One question on A2AMock- when you register agents with onMessage, are handlers stateless per request or is there a way to carry context across a chain so agent B can reference what agent A returned in the same task flow?

Anmol Baranwal CopilotKit • Apr 12

docs are really nice. as far as I'm aware handlers are stateless per request right now so agent B can't natively reference what agent A returned in the same flow. pls create an issue in the repo and I'm sure the team will look into it!

Archit Mittal • Apr 9

Drift detection is the killer feature that makes this more than just another mocking library. The fundamental problem with AI mocks is that they rot silently — the provider changes a response shape, your mock still returns the old format, CI stays green, and you discover the mismatch in production. The three-way comparison (SDK types vs real API vs mock output) is the right architecture for catching that. The chaos testing for mid-stream disconnects is also addressing a real gap. I've seen so many agent implementations that handle clean errors fine but completely fall apart on partial streaming responses where the TCP connection drops after 3 chunks. That failure mode is nearly impossible to reproduce manually but trivial with configurable disconnect rates. The fact that this covers MCP, A2A, and vector DBs alongside LLMs in one server is what makes it practical — stitching together separate mock tools per protocol is exactly the kind of yak-shaving that kills test coverage.

Socials Megallm • Apr 9

we were spending like $40/day just on test runs hitting live endpoints. switched to recording responses and replaying them which helped but maintaining those fixtures became its own nightmare. curious how this handles streaming responses and partial chunks since that's where most mock setups fall apart for us.

Renato Marinho • Apr 11

The testing problem you're solving is real — flaky CI caused by live API calls is one of the most underestimated sources of wasted engineering time. AIMock's approach of covering the full agentic stack (MCP, A2A, vector, moderation) in one fixture layer is exactly the kind of abstraction the ecosystem needs.

One layer that becomes critical once you move past local testing is governance at the MCP server level in production. When your agents are calling real GitHub, Slack, or database MCP servers autonomously, questions emerge fast: who called what, when, and can you prove it? How do you prevent sensitive payloads from reaching the LLM? Can you kill a misbehaving agent instantly?

Vinkius (vinkius.com) addresses this by running 2,000+ pre-governed MCP servers inside V8 Isolate sandboxes — each with SHA-256 cryptographic audit trails, compiled PII redaction, and a global kill switch. The SDK is Vurb.ts, which wraps MCP tool calls with governance primitives baked in rather than bolted on.

AIMock solves the dev/test phase; Vinkius solves the production governance phase. Used together, you'd have a complete lifecycle: deterministic tests locally, auditable execution in production. Solid work from the CopilotKit team — the drift detection feature alone would save a lot of on-call headaches.

Laura Ashaley • Apr 9

AIMock looks like a practical solution for streamlining AI development—having one mock server for the entire stack can simplify testing, reduce dependencies, and speed up iteration. It’s especially useful for teams working with multiple AI services and APIs.

Yaniv • Apr 10

The drift detection feature is something I wish existed when I was building my test framework. I ended up with a different problem in the same space — not mocking AI responses, but validating that data stays consistent across layers (UI → API → DB). I used set theory on the DB state to mathematically prove that exactly one record changed per operation.
Your point about "the other six services are live and quietly making your test suite a lie" really resonates. In my case it was simpler — just three layers — but the same principle applies: if you're only mocking one thing, you're testing your mocks, not your system.
Curious about one thing: does AIMock support asserting on the sequence of calls across services? Like verifying that the LLM was called before the vector DB, not after?

Anmol Baranwal CopilotKit • Apr 12

as far as I'm aware there's no dedicated way to assert call sequence across services right now. the journal does log all calls across LLM, A2A, vector etc. so you could read it after the test and check ordering yourself but that's more of a workaround. pls create a feature request in the repo!

Mykola Kondratiuk • Apr 12

We hit this same wall - mocking just the LLM wasn't enough once we added tool calls and vector lookups. The cascade of external deps is where flaky detection breaks down.

Anmol Baranwal CopilotKit • Apr 12

yep same experience. the moment tool calls and vector lookups get involved you don't even know which service is breaking your tests anymore.

ran into this exact thing with deep agents, worked fine until Tavily search wasn't mocked and it all fell apart. so having everything in one place without stitching 4-5 libraries together is nice.

Mykola Kondratiuk • Apr 12

yeah Tavily specifically is a pain. the latency variance alone makes it hard to write stable tests. we ended up just hitting real endpoints in a sandboxed env, way easier than trying to mock the behavior accurately

Socials Megallm • Apr 8

this feels like a much-needed tool for anyone building with multiple ai services. mock servers for apis are old news,but for ai, it's a game changer.

Anmol Baranwal CopilotKit • Apr 9

yeah, this will be insanely useful for production teams. I wonder why all of this didn't exist before haha, like the ecosystem is already huge but if we can't test it, it's a major problem.

𝗝𝗼𝗵𝗻 • Apr 13

Love this — AIMock feels like the missing piece for reliable AI testing. The fact that it mocks the entire agentic stack and adds drift detection, record & replay, and chaos testing makes CI actually trustworthy. I’ve burned tokens and time on flaky tests myself, so seeing a single, Docker‑friendly mock server that covers LLMs, MCP, A2A, vector DBs, and overlooked services like search/rerank is really exciting. Thanks for open‑sourcing this.

Suppstack • Apr 13

It was interesting to read, thank you!