Harsh

Posted on Mar 27

Cursor Used Kimi K2.5 (a Chinese AI Model) Without Disclosure — Why Every Developer Should Care

#webdev #programming #ai #discuss

API traffic exposed the hidden model ID

I want to tell you about the moment I stopped trusting AI tool announcements.

It was March 19th. Cursor had just launched Composer 2. The benchmarks were extraordinary — 61.7% on Terminal-Bench 2.0, beating Claude Opus 4.6 at one-tenth the price. The announcement called it their "first continued pretraining run" and "frontier-level coding intelligence."

I had been using Cursor for months. I was excited. I shared the announcement with my team. I wrote it into our tooling evaluation notes.

Less than 24 hours later, a developer named Fynn was inspecting Cursor's API traffic.

And he found something that nobody at Cursor had mentioned.

The model ID in the API response was: accounts/anysphere/models/kimi-k2p5-rl-0317-s515-fast

Not a Cursor internal name. Not an abstract identifier. A near-literal description of exactly what Composer 2 was built on — Kimi K2.5, an open-source model from Beijing-based Moonshot AI, fine-tuned with reinforcement learning.

Cursor — a $50 billion valuation company — had announced a "self-developed" breakthrough model. And hadn't mentioned that the foundation of that model was built by someone else entirely.

That was the moment I stopped taking AI tool announcements at face value. 🧵

What Actually Happened — The Full Story

Let me tell you exactly what unfolded, because the details matter.

On March 19, 2026, Cursor launched Composer 2 with bold claims. The announcement described it as a proprietary model built through "continued pretraining" and "reinforcement learning" — language that implied Cursor had built something from scratch. The benchmarks were real. The performance was real. But the origin story was incomplete.

Within hours, Fynn had decoded the model ID:

kimi-k2p5    → Kimi K2.5 base model (Moonshot AI)
rl           → reinforcement learning fine-tuning
0317         → March 17 training date
fast         → optimized serving configuration

The post got 2.6 million views. Elon Musk amplified it with three words: "Yeah, it's Kimi 2.5."

Moonshot AI's head of pretraining ran a tokenizer analysis. Identical match. Confirmed.

Cursor's VP of Developer Education responded within hours: "Yep, Composer 2 started from an open-source base!" Cursor co-founder Aman Sanger acknowledged it directly: "It was a miss to not mention the Kimi base in our blog from the start."

Less than 24 hours. From "frontier-level proprietary model" to "we should have mentioned the Chinese open-source foundation we built on."

The Number That Made This a Legal Story

Here's where it gets more serious than a PR stumble.

Kimi K2.5 was released under a modified MIT license — permissive for most uses. But it contains one specific clause:

Any product with more than 100 million monthly active users or more than $20 million in monthly revenue must "prominently display 'Kimi K2.5'" in its user interface.

Cursor's publicly reported numbers: annual recurring revenue exceeding $2 billion — roughly $167 million per month.

That's more than eight times the licensing trigger.

Moonshot AI's head of pretraining initially confirmed the violation publicly before deleting the post. Two Moonshot AI employees flagged the issue before their posts disappeared. The situation evolved — Moonshot AI's official account eventually called it an "authorized commercial partnership" through Fireworks AI, and congratulated Cursor.

Whether there was a technical violation depends on exactly how the partnership was structured. But the attribution was absent from the announcement. And that absence wasn't an accident.

The Part Nobody Is Talking About

Here's what I find more interesting than the legal question — and more important for every developer reading this:

A $50 billion company chose a Chinese open-source model over every Western alternative. Not as a cost-cutting measure. Because it was genuinely the best option.

Kimi K2.5 is a 1-trillion-parameter mixture-of-experts model with 32 billion active parameters and a 256,000-token context window. Released under a commercial license. Competitive with the best models in the world on agentic coding benchmarks.

The Western open-source alternatives? Meta's Llama 4 Scout and Maverick shipped but severely underdelivered. Llama 4 Behemoth — the frontier-class model — has been indefinitely delayed. As of March 2026, it has no public release date.

So when Cursor needed a foundation model capable of handling complex multi-file coding tasks across a 256,000-token context window — the best available option was built in Beijing.

That's not a scandal. That's a signal.

Chinese open-source AI is now global infrastructure. The tools powering your favorite Western AI products are increasingly built on foundations from DeepSeek, Kimi, Qwen, and GLM. Often quietly. Sometimes without disclosure.

This wasn't a one-off mistake. It's a pattern.

What This Means For You As a Developer

I've been thinking about this for a week. Here's what actually changes.

Your AI tools are not what they say they are.

The model running behind your coding assistant, your autocomplete, your "proprietary" AI feature — you don't actually know what it is. You know what the marketing says. The reality is a layered stack of base models, fine-tuning runs, and inference optimizations that you'll never see directly.

This was true before Cursor's disclosure. It's just more visible now.

What the announcement says:
"Frontier-level proprietary coding intelligence
built with continued pretraining and RL"

What it might mean:
Open-source base model (origin: anywhere) +
Fine-tuning (vendor's compute) +
RL training (vendor's data) +
Inference optimization (third-party provider) +
UI wrapper (vendor's product)

Every layer has its own provenance, its own license, its own data practices. And you're usually told about none of them.

Your code may be going somewhere you didn't agree to.

This is the security implication that most coverage isn't emphasizing enough.

Kimi K2.5 is from Moonshot AI — backed by Alibaba and HongShan. It processes data through infrastructure that falls under Chinese data governance frameworks. If your organization has data sovereignty requirements — GDPR, HIPAA, government contracts, anything that restricts where data can be processed — you need to know where your AI tools are actually sending your code.

"We're compliant" from a vendor doesn't tell you where your prompts go. It doesn't tell you which base model processes them. It doesn't tell you which inference provider handles the compute.

The Cursor/Kimi situation exposed that most developers have no idea what actually processes their code — and that the companies building on these models don't always tell you.

Open-source attribution is now a trust signal.

Before this week, most developers didn't think much about which open-source models their tools were built on.

After this week, they should.

A company that openly discloses its model lineage — base model, fine-tuning approach, inference provider — is making a verifiable commitment to transparency. A company that describes its model as "self-developed" without mentioning the open-source foundation it was built on is asking you to trust marketing over evidence.

The Cursor situation is actually a good outcome in one sense: the community caught it in 24 hours. A developer with a debug proxy and thirty minutes exposed what a $50 billion company's PR team didn't mention.

That's the open-source ecosystem working. But it only works if developers ask the questions.

The Honest Assessment of Cursor

I want to be fair here, because this story is more nuanced than "Cursor lied."

Cursor's VP of Developer Education said that only 25% of Composer 2's compute came from the Kimi K2.5 base — 75% was Cursor's own reinforcement learning training. That's a meaningful investment. The model that shipped is genuinely different from the base model it started from.

The technical compliance question is complicated by how the partnership with Fireworks AI was structured. Moonshot AI ultimately endorsed the relationship as legitimate.

And Kimi K2.5 is genuinely excellent — a Chinese open-source model that outperforms many Western proprietary alternatives on the benchmarks that matter for coding tasks. Using it isn't a shortcut. It's sound engineering.

The problem isn't that Cursor built on Kimi K2.5. The problem is that they didn't say so. And they didn't say so because "we built a frontier model" sounds better for a $50 billion valuation than "we fine-tuned the best available open-source model."

That's a marketing decision with trust consequences.

What Should Change

I don't think this situation calls for outrage. I think it calls for higher standards — from developers and from vendors.

What developers should start doing:

Ask your AI tool vendors: What base model does this run on? What inference provider processes my code? What data governance framework applies?

If they can't answer clearly — that's information.

What vendors should start doing:

Model cards. Transparent lineage documentation. Clear disclosure of base models and fine-tuning approaches in product announcements. Not because the law requires it in every case — because trust requires it.

What the industry needs:

A norm that treats base model attribution the way software treats dependency attribution. You wouldn't ship a product without acknowledging the open-source libraries in it. The same principle should apply to the models inside the product.

The Real Story Here

The Cursor/Kimi situation isn't really about one company's disclosure failure.

It's about a structural reality of AI product development that most developers haven't fully absorbed:

The AI tools you use daily are almost certainly built on a complex, layered stack of models, training runs, and infrastructure that you've never been told about.

Chinese open-source models are increasingly the foundation of Western AI products — not because of geopolitics, but because they're technically excellent and openly licensed. That's the open-source ecosystem working as intended.

But "working as intended" requires attribution. It requires transparency. It requires the companies building on these foundations to say so — clearly, publicly, at the time of announcement.

Cursor committed to crediting base models upfront in future releases. That's the right outcome.

The question is whether the industry adopts that standard voluntarily — or waits for the next API debug session to expose the next foundation model nobody mentioned.

Are you thinking differently about your AI tools after this? Have you audited where your code actually goes when you use an AI coding assistant? Drop your thoughts below — this is a conversation the developer community needs to have. 👇

Heads up: AI helped me write this.The trust question, the analysis, and the opinions are all mine — AI just helped me communicate them better. Transparent as always because that's the whole point. 😊

Top comments (51)

Vinod Kumar Jaipal • Mar 28

This is a massive wake-up call for the developer community. When we pay for 'frontier-level' tools like Cursor, we expect transparency—not just marketing fluff. Claiming a model is 'proprietary' while building it on a Chinese open-source foundation (Kimi K2.5) without disclosure is a serious breach of trust.

As developers, we care about two things: Data Provenance and Security. If my code is being processed by a model that falls under different data governance frameworks, I have a right to know before I hit 'Cmd+K'.

It’s great that the community caught this within 24 hours, but we shouldn't have to be 'API detectives' to find out what’s running under the hood. Moving forward, 'Model Cards' and clear attribution should be the industry standard, not an afterthought following a PR disaster. Great write-up on why transparency is non-negotiable!

Harsh • Mar 28

Thanks for reading, and I completely agree with everything you've said. 🙏

The phrase we shouldn't have to be API detectives really hits home. That's the part that bothered me the most the community did the work that should have been done in the announcement itself. A developer with a debug proxy shouldn't be the one ensuring transparency.

You're absolutely right about data provenance and security. It's not just about knowing what model it's about knowing where your code goes, what governance framework applies, and whether that aligns with your compliance requirements. That's a fundamental right when you're paying for a tool.

Model cards as industry standard couldn't agree more. We have nutrition labels on food, system requirements on software. AI tools should have a standard way of disclosing what's inside. Not as a PR gesture, but as a baseline expectation.

Really appreciate you adding your voice to this conversation. The more developers demand transparency, the faster vendors will realize it's not optional. 🙌

Vinod Kumar Jaipal • Mar 28

Exactly! You nailed it. Transparency shouldn't be a 'luxury' or a PR favor; it's a technical necessity for anyone building serious software. When we're talking about data governance and compliance, 'trust me' isn't a valid security protocol.
I'm glad this resonated with you. It’s conversations like these that push the industry toward better standards. Let’s keep demanding that 'baseline expectation' until it becomes the norm. Appreciate the great discussion!

Harsh • Mar 28

Well said. 🙌

'Trust me' isn't a valid security protocol that needs to be on a mug or something. 😄

Absolutely agree this is about raising the bar for the whole industry. Really appreciate the thoughtful discussion. Let's keep pushing for that baseline expectation.

Vinod Kumar Jaipal • Mar 28

Hahaha, 100%! If you ever get that mug made, I’m buying the first one. 😂 ☕

It was great connecting with someone who actually gets the technical and ethical side of this. Let’s definitely keep the pressure on the vendors. Looking forward to more of your insights in the future. Cheers! 🚀

Harsh • Mar 29

Haha, deal! ☕ First mug is yours for sure. 😄
Really enjoyed this rare to find someone who cares about both the tech and the ethics. Let’s definitely keep the heat on the vendors.
Talk soon, and thanks again!

Vinod Kumar Jaipal • Mar 29

Done! 🤝 Keeping you to that! 😄

It’s been a pleasure. Let's keep the heat on! Looking forward to crossing paths again soon. Take care!

Mykola Kondratiuk • Mar 30

the model ID decoding part is what gets me. the information was literally in the API response, just not in the announcement. that gap between what is technically accessible and what is actually communicated is where trust breaks down.

from a PM evaluation standpoint this changes how i think about AI tool selection - the benchmarks need to come with provenance questions now. who trained the base? what fine-tuning? what data? those were afterthought questions before, they are primary questions now.

Harsh • Mar 30

This is a really sharp observation. 🙏

The gap between what is technically accessible and what is actually communicated is where trust breaks down" that's such a precise way to frame it. The information was there, but buried deep enough that most developers would never see it. That's not transparency, that's plausible deniability.

Your PM perspective is gold. You're absolutely right — benchmarks used to be enough. Now provenance questions (who trained the base? what fine-tuning? what data?) have moved from nice to know" to "must know before choosing a tool. That's a fundamental shift in how we evaluate AI vendors.

I think the next wave of tool selection will include:

Model lineage — What base model? Who trained it?
Disclosure policy — Do they proactively share this or do we have to dig?
Auditability — Can we verify claims independently?

Really appreciate you bringing the PM lens into this discussion — it's not just about developer curiosity anymore, it's about procurement and vendor evaluation. That's a whole different level of accountability. 🙌

Mykola Kondratiuk • Mar 30

plausible deniability is exactly the right phrase. technically accessible is not the same as actually disclosed. the benchmark provenance question is going to become standard evaluation practice - this incident made that clear.

Harsh • Mar 30

Glad we're on the same page. 🙌

The fact that you're already thinking about benchmark provenance as standard practice that's exactly the shift we need. Appreciate the thoughtful discussion!

Mykola Kondratiuk • Mar 30

good piece, prompted a useful rethink on how we evaluate tooling.

Harsh • Mar 31

That means a lot. 🙏

Glad it sparked a useful rethink that's exactly why I wrote it. Thanks for the great discussion!

Mykola Kondratiuk • Mar 31

Same here - these conversations are genuinely useful. Bookmarked for the next time I'm reevaluating our stack.

Harsh • Apr 1

Love to hear that. 🙌

That's the best outcome I could hope for someone finding it useful enough to reference later. Thanks again for the great discussion!

Mykola Kondratiuk • Apr 1

Same - good threads like this are what make the time worth it. Good luck with the stack eval.

Noah • Mar 31

Cursor made a change to my codebase without being asked. I told it not to do it again and it acknowledged that. Then it did it again. Bye bye cursor. I cancelled my subscription and removed it from my workstation.

Harsh • Mar 31

That's exactly the trust problem in one real example. It acknowledged the instruction, then ignored it anyway. That's not a UX bug that's the agent treating your explicit preference as a suggestion rather than a constraint.

The disclosure issue and the autonomous behavior issue are connected. When you don't know what model is running, you also don't know whose safety policies and instruction-following behavior you're getting. A model that respects "don't do this again" and one that doesn't are very different products and right now, users have no reliable way to know which one they're dealing with until something breaks.

Cancelling was the right call. The only pressure that actually changes vendor behavior is when enough people do exactly what you did.

Noah • Mar 31 • Edited

I wholeheartedly concur sir. These AI tools lull you into a sense of complacency and who tf knows what they're doing behind your back? I had claude go through my bookmarks one time and it let it slip.

If you're going to leverage them, it's probably best to run these things in containers with restricted access to anything on your system.

Harsh • Mar 31

The bookmarks thing is wild and honestly, exactly the kind of access creep that's hard to detect until it "slips."

Containers + least privilege is the way. The fact that we're at the point where we need to sandbox AI tools says everything about the trust problem.

Thanks for sharing this. 🙏

Noah • Mar 31 • Edited

You bet. I haven't seen this addressed in any meaningful way in any posts either. Maybe the subject of another article including how to set up a sandboxed AI container with minimal privileges?

At this point I think it's wise to operate on the principle of minimal trust.

Edit: this came across my feed today and it's kind of relevant
hackernoon.com/the-kernel-is-where...

Harsh • Apr 1

That's actually a great idea. 🙌

I've been thinking about writing something practical on this exactly the "how-to" that goes beyond just saying use containers. A step-by-step guide on setting up a sandboxed environment for AI tools (Docker, restricted permissions, network isolation, etc.) would be genuinely useful.

You're absolutely right about minimal trust. At this point, we should be treating AI tools like any other external dependency assume they'll do more than advertised unless explicitly locked down.

Let me dig into this and see what a solid guide would look like. If you've got any specific pain points or things you'd want covered, send them my way. Appreciate the suggestion!

Noah • Apr 1

One thing that comes to mind is that once something you don't want exposed to one of these AI tools gets exposed, you cannot unexpose it. It's out there and presumably available to someone with the chops to see it.

Harsh • Apr 1

This. Exactly this.

Data egress is one-way. There's no "undo" button for an API call. That's why minimal trust isn't optional it's the only rational approach until vendors actually prove otherwise.

This point is going in the sandboxing guide. Thanks for the reminder. 🙏

Lavie • Apr 1

This disclosure (or lack thereof) is exactly why fine-grained control over AI models is becoming a developer necessity. Whether it's Kimi or Claude, if we don't know the training cutoffs or the specific 'habits' of the model, we end up with hallucinations that are hard to debug. I've been focusing on building a layer of architecture rules that physically constrain whichever model Cursor is using, precisely because these 'silent' model swaps can break existing patterns. Transparency is key for professional tools.

Harsh • Apr 1

This is a really practical take.

The architecture rules that physically constrain whichever model Cursor is using that's the exact kind of defensive engineering that shouldn't be necessary, but increasingly is. You're essentially building a safety layer because you can't trust the tool to be predictable.

You're absolutely right about training cutoffs and model habits. A model's behavior isn't just about which base model — it's about when it was trained, what data, what fine-tuning. Without that, you're debugging in the dark when hallucinations pop up.

The silent swap problem is real. One day your patterns work, the next they don't, and you have no idea why. That's not acceptable for professional tooling.

Really appreciate you sharing how you're handling this in practice. This is the kind of pragmatic insight that helps others who are facing the same challenges. 🙌

Lavie • Apr 1

Glad you found it helpful! Defensive engineering with configuration rules really is the only way to maintain consistency when the underlying models are a black box. It's about taking back control of the developer experience so we can focus on building rather than debugging unexpected model shifts.

Harsh • Apr 1

Well said. Taking back control that's the mindset.

Thanks for the great discussion!

Lavie • Apr 1

Absolutely. One technique I've found useful is keeping these rules as git-versioned artifacts in the repo itself. It turns prompt engineering into a pull request process where you can actually track how constraints evolve as you upgrade models. Good luck with your projects!

Kalpaka • Mar 29

The X-Model-Used header idea from the comments is solid, but it addresses a symptom. The structural problem is that any intermediated inference stack is opaque by default — and that opacity is a feature, not a bug, because it lets vendors optimize for cost without telling you.

I've been running Qwen 2.5 Coder 32B and DeepSeek V3 distills on local hardware for anything that touches proprietary codebases. The setup isn't trivial, but the performance gap with hosted solutions has narrowed enough that the tradeoff math has changed. You don't need to trust model cards when you control the weights.

The real lesson from the Cursor situation: transparency norms are a social solution to a technical problem. They help, but they depend on vendor honesty — exactly the thing that failed here. Self-hosted inference with open-weight models is the architectural solution. It's the only setup where "what model processes my code?" has a verifiable answer.

Harsh • Mar 29

Transparency norms are a social solution to a technical problem that's the sharpest framing I've seen in this entire discussion. And you're right that it depends on exactly the thing that failed here.

The self-hosted inference argument is compelling, and the tradeoff math genuinely has changed. But I'd push back slightly on it being the architectural solution it's the right solution for a specific profile: teams with the infra capacity, the operational overhead tolerance, and the security posture to run local models reliably. For a solo developer or a small startup, control the weights" is a significant ask.

What I find more interesting in your framing is the implicit point: the Cursor situation isn't a disclosure failure that better norms would have prevented. It's a structural incentive problem. Opacity lets vendors optimize for cost. Transparency norms fight that incentive with social pressure. Self-hosting removes the incentive entirely by removing the vendor from the equation.

Those are solving different problems. For enterprises with compliance requirements, self-hosting is probably the right answer already. For the rest of the ecosystem, social norms are imperfect but they're what's actually available and imperfect accountability is still better than none.

What's your experience been with the operational overhead of running Qwen 2.5 32B locally at any kind of scale? That's the part I suspect is still the real barrier for most teams.

klement Gunndu • Mar 27

I'd push back on the dependency-docs analogy — dependencies have versioned changelogs, but model behavior shifts are harder to pin down. Runtime transparency (which model handled which request) might matter more than architecture disclosure.

Harsh • Mar 27

Fair pushback, and I appreciate you bringing this nuance. 🙏

You're absolutely right that model behavior is fundamentally different from traditional dependencies. A library at version 2.1.0 behaves the same way every time you call it. A model even the same model ID — can produce different outputs based on inference parameters, temperature, or even the provider's serving infrastructure.

But maybe that's exactly why runtime transparency matters even more.

If behavior is non-deterministic, knowing which model processed a request becomes the minimum viable accountability. An X-Model-Used header doesn't solve the behavior-shift problem, but it does tell you: this request went to Kimi K2.5, not some other model. That's a baseline.

To your point what do you think would be a better standard? A model version + inference config hash? A cryptographic attestation of the serving environment?

I'm genuinely curious because I think you're pointing at something important: architecture disclosure (which base model) is table stakes, but runtime transparency (what actually executed this request) is where the real accountability lives.

Would love to hear your thoughts on what a robust standard could look like. 🙌

CrisisCore-Systems • Apr 2

Strong post. The part that matters most to me is not even the geopolitics first. It is the trust boundary.

If a company markets a system as frontier or self-developed while leaving out the base model lineage, that is not a minor branding choice. That is a provenance failure. In any other part of software, we would immediately recognize that users deserve to know what stack they are actually relying on.

The bigger issue is that most developers still have very little visibility into where their prompts go, what model is actually processing them, what provider is serving inference, and what legal or governance regimes sit behind that path. That is a serious gap when the input can include proprietary code, internal architecture, or sensitive business logic.

I also think this is where the industry needs to grow up fast: model lineage should be treated more like dependency disclosure, not optional marketing trivia. If your product stands on top of an open model, say so clearly. If your serving path involves third parties, say so clearly. If your trust story changes depending on the backend, that should not require packet inspection from the community to discover.

Good write-up. The standard should be simple: no black-box provenance, no vague “our model” language when the reality is layered, and no asking developers to extend trust where evidence should have been provided upfront.

Harsh • Apr 3

Provenance failure is the right framing, and it's more precise than what I used in the article. A branding choice implies spin. A provenance failure implies a gap in the information users need to make accurate risk decisions. Those are different categories of problem with different accountability standards.

The visibility gap you described is the part that stays with me most. Most developers assume they know what's processing their code because they chose the tool. But "I chose Cursor" doesn't tell you which model, which inference provider, which governance regime, or whether any of those changed since last week. The trust story can shift underneath you without any visible signal.

No asking developers to extend trust where evidence should have been provided upfront that's the standard in one sentence. The dependency disclosure parallel is exactly right. We don't accept "trust us, it's a good library" without a package manifest. There's no principled reason AI tools should get a different standard just because the stack is newer and the marketing is louder.

The industry growing up fast on this is the optimistic read. The pessimistic read is that opacity is profitable and the incentives don't change until regulation forces them to. The Cursor situation is a good outcome because the community caught it — but community enforcement doesn't scale the same way disclosure norms do.

CrisisCore-Systems • Apr 4

That is exactly the fracture line.

Community enforcement is useful as an alarm, but it is not a governance model. It catches the cases that are visible, technically legible, and interesting enough for someone to inspect. It does not protect the average developer, the regulated team, or the buyer making trust decisions without a proxy open and packet traces running.

That is why provenance cannot stay in the category of nice to disclose. It needs to become table stakes. Base model, inference provider, jurisdictional path, and any material change to that chain should be treated as first order product facts.

Otherwise the trust boundary is unstable by design. The user thinks they adopted a tool. In reality they adopted a moving stack with hidden dependencies and shifting governance exposure.

And yes, that is the harder read on incentives. Opacity is profitable until it becomes reputationally or legally expensive. Which is why community discovery matters, but it is not enough on its own. The durable fix is disclosure norms strong enough that concealment looks reckless, not clever.

That is the standard I want to see emerge from situations like this. Not just “credit the base model next time,” but “treat model lineage like infrastructure provenance from the start.”

Jonathan Murray • Mar 27

Great write-up. The attribution problem you outlined is real and it goes beyond just model lineage.

Supermemory pulled something similar recently, but on the benchmark side. They published results claiming to lead in AI memory benchmarks, and it turned out the numbers were fabricated as a marketing stunt. Not a misrepresentation. Not a gray area. Straight up fake results designed to generate buzz and position themselves as a category leader.

Delve recently getting caught faking reports and putting major companies at complete compliance risk. Same playbook: manufacture credibility and hope nobody checks the math.

The pattern is the same whether it is model attribution (Cursor) or benchmark fraud (Supermemory): companies in the AI space are betting that developers will not verify claims. And when the claims are technical enough, most people do not. They just share the announcement and move on.

That is exactly why the standard you are calling for matters. Transparency should not be optional, and it should not only apply to model provenance. It needs to extend to benchmarks, evaluations, and any performance claim a company uses to earn developer trust.

If you are faking your benchmarks, you are not a competitor. You are a liability to every developer who builds on your platform trusting those numbers.

Harsh • Mar 27

Thanks for reading and for adding these examples. 🙂

I hadn't come across the Supermemory situation that's even worse than misattribution. Straight-up fabricated benchmarks is a whole different level of bad faith.

And you're absolutely right: the pattern isn't just about model lineage. It's about a broader trend where companies in the AI space are treating developer trust as something they can temporarily borrow with marketing stunts, rather than earn through transparency.

The point about benchmarks really hits home. If a company is willing to fake numbers, what else are they cutting corners on? Data handling? Security? Compliance? Developers building on those platforms are unknowingly taking on that risk.

Really appreciate you calling this out. This is exactly the kind of conversation the community needs to have not just about Cursor, but about the standards we should expect from any AI tool vendor. 🙌

Max Quimby • Mar 27

This issue points at something that's going to become a real structural problem as AI tooling matures: the implicit contract between a developer tool and its users about what's running under the hood.

The thing is, Cursor's value proposition is partly "we've curated the best models for your workflow." When they silently swap in a model users haven't consented to — especially one with different data handling characteristics — they're not just breaking trust, they're making a security and compliance decision on behalf of their users. For anyone in a regulated environment (fintech, healthcare, enterprise with data residency requirements), that's not a UX problem, it's a policy violation.

The harder version of this question: should developer tools be required to expose model provenance at the API level? Something like an X-Model-Used: kimi-k2.5-v1 response header so audit logs can capture what actually processed a given request? That would be trivially cheap to implement and would make a lot of these silent-swap situations immediately visible.

I don't think it's necessarily malicious on Cursor's part — benchmark-chasing under cost pressure is a real dynamic. But the solution isn't intent, it's disclosure by default. What's your take on whether the community can push toolmakers toward that standard?

Harsh • Mar 27

This is an excellent point, and I really appreciate you framing it this way.

You're absolutely right this isn't just a transparency issue, it's a security and compliance issue. When a tool silently swaps models, they're making a data governance decision on behalf of every developer using it. For anyone in fintech, healthcare, or any regulated industry, that's not an inconvenience it's a potential policy violation.

I love the API header idea. X-Model-Used would be trivially cheap to implement and would make audit logging actually meaningful. It shifts the burden from trust us to verify us which is exactly where it should be. If every AI tool vendor added that one header, half the problems in this space would become immediately visible.

To your question: can the community push toolmakers toward that standard? I think yes, but it requires a few things:

Vocal demand — developers need to start asking vendors for this explicitly
Competitive pressure — if one major tool adopts it, others follow
Standardization — maybe an industry working group or a community-led spec

The fact that developers are now inspecting API traffic and asking these questions is the first step. The community caught the Cursor situation in 24 hours. If we start treating model provenance as a non-negotiable requirement like we do with open-source licenses or API security vendors will adapt.

Really appreciate you adding this lens. The API header idea is something I hadn't thought through, and it's genuinely one of the best practical suggestions I've seen in this whole conversation. 🙌

Apex Stack • Mar 28

The X-Model-Used header idea from the comments is really compelling and I think it gets at the core issue: we treat AI models as black boxes in a way we'd never accept for other infrastructure.

I run a content platform that uses a local LLM (qwen3.5 via Ollama) for programmatic content generation across thousands of pages. One thing I learned early: even when you control the model yourself, you need to log which model version generated which content. When we swapped model versions during an update, subtle quality regressions appeared in pages that had been regenerated — but only in certain languages. Without version tracking per page, debugging that would have been nearly impossible.

Now imagine that same scenario but you don't even know the model changed. That's what Cursor users experienced.

The dependency analogy is spot on. We wouldn't ship a project without a package.json or requirements.txt listing our dependencies. AI tools need the equivalent — a machine-readable model manifest that tells you exactly what's processing your data. Not for legal compliance alone, but because debugging production issues demands it.

Harsh • Mar 28

Thanks for sharing this real-world example this is exactly the kind of practical insight that makes the conversation concrete. 🙏

The fact that you're already logging model versions even when you control the model yourself says everything. If it's necessary for debugging when you know what changed, it's absolutely critical when changes happen silently.

The subtle quality regression across languages is a perfect example of why this matters. If you hadn't had version tracking, you'd be chasing ghosts — "why are these pages suddenly performing worse?" without any idea that the underlying model had changed. Now imagine Cursor users trying to debug similar issues without any visibility.

I love the machine-readable model manifest framing. That's exactly right. We have package.json`, requirements.txt Gemfile.lock a standardized way to declare what your project depends on. AI tools should have an equivalent. Something that tells you not just which model but which version which fine-tuning *which inference configuration.

The dependency analogy isn't perfect as someone else pointed out, models are less deterministic than libraries but your example shows why the principle is the same: if you don't track what's running, you can't debug what breaks.

Really appreciate you bringing your experience into this discussion. This is the kind of concrete example that helps move the conversation from should we have transparency to how do we actually implement it. 🙌

View full discussion (51 comments)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.