Which AI coding assistants train on your code? A 2026 zero-retention comparison

#ai #privacy #security #devtools

On 24 April 2026, GitHub flipped a default. Copilot Free, Pro and Pro+ now use your prompts and accepted code suggestions to train models unless you go into settings and switch it off. Until that date the same data sharing was opt-in. Now it's opt-out, and most individual subscribers never saw the toggle move.

That change is a good reason to actually read what your coding assistant does with the code you feed it. I went through the published terms for seven of the assistants developers actually use and pulled out the one thing that matters: does your code train a model, and can the vendor hold onto it? The short version is that "it depends on your plan" is not a cop-out answer. For most of these tools it's the literally correct answer, and the line usually runs right between the free tier you're using and the business tier you're not.

Here's where each one stands as of June 2026.

The quick comparison

Tool	Trains on your code by default?	Zero data retention?
Tabnine	No, on every plan	Yes. Ephemeral, nothing stored
Sourcegraph Cody	No	Yes. ZDR with providers
GitHub Copilot	Free/Pro/Pro+: yes (opt-out). Business/Enterprise: no	No individual ZDR product
Cursor	Free/Pro with Privacy Mode off: yes. Privacy Mode on / Business: no	Yes, when Privacy Mode is on
Codeium / Windsurf	Individual non-ZDR: code logs may be stored. Teams/Enterprise: no	Yes. Default on Teams/Enterprise, opt-in for individuals
Amazon Q Developer	Free: yes (opt-out). Pro: no	Enterprise posture only, no self-serve toggle
Replit AI	Public Apps: yes. Private Apps / paid endpoints: no	Enterprise routes ZDR-endpoints only

Two tools say no across the board. The other five gate it by tier. If you only remember one thing: the free tier is almost never the safe tier.

The two that don't train, regardless of plan

Tabnine runs a no-train, no-retain policy on every plan, from the solo Dev seat up to fully air-gapped self-hosting. Code you send for a completion is held in memory as ephemeral context to generate the suggestion, then discarded as soon as the response comes back. There's nothing to opt out of because training on customer code never happens by default. Its base completion and chat models are trained only on permissively licensed open-source code, and if you want a model that knows your codebase you can pre-train a private one inside your own environment where only your team can reach it. The compliance paperwork backs the claim up: SOC 2 Type II, ISO 27001, GDPR.

Sourcegraph Cody also doesn't train on your code, but the mechanics are worth understanding because Cody isn't running its own model. It sends code snippets to Anthropic and OpenAI to generate responses, and the protection is that those calls happen under zero-retention agreements on both inputs and outputs. The same holds through Sourcegraph's own Cody Gateway, and the Fireworks.ai endpoint used for autocomplete doesn't store chat or autocomplete data either. Enterprises that want inference to never leave their cloud can bring their own LLM keys through Azure OpenAI or Amazon Bedrock. Cody is now positioned as the Sourcegraph Enterprise code-intelligence assistant, and it carries SOC 2 Type II, GDPR and CCPA compliance.

The distinction between these two matters if your threat model includes third parties: Tabnine can run with no data leaving your infrastructure at all, while Cody's default path still routes snippets to external model providers, just under contracts that forbid retention and training.

The five that hinge on your tier

GitHub Copilot is the one that changed. Since 24 April 2026, interaction data (your prompts, the code you accept, the surrounding file context) trains GitHub's models on Free, Pro and Pro+ unless you disable it. Copilot Business and Enterprise are exempt and were never folded into the policy change; prompts and suggestions on those tiers are never used for training. There's no zero-retention product aimed at individuals. If you're on an individual plan and want out, go to your profile photo, open Copilot settings, and set "Allow GitHub to use my data for AI model training" to Disabled. A separate "block suggestions matching public code" filter exists on every tier and is worth turning on regardless.

Cursor routes the entire question through one switch called Privacy Mode. With it on, none of your code is ever trained on by Cursor or any third party, and zero data retention kicks in with the model providers. With it off, which is the default for Free and Pro, Cursor may store and train on your codebase data, prompts, editor actions and code snippets. Privacy Mode is forced on for Business and Teams and can be enforced org-wide by an admin, so company seats are covered automatically; individual users have to opt in themselves. One detail people miss: with Privacy Mode off, the underlying providers like OpenAI and Anthropic may retain prompts for around 30 days for trust and safety, so it's not just Cursor in the loop.

Codeium / Windsurf ties everything to zero-data-retention mode. Code submitted by ZDR users is never serialized, never stored in plaintext on Codeium's servers or subprocessors, and never trained on. ZDR is on by default for Teams and Enterprise. Individuals have to opt in from their profile page, and until they do, logs containing code snippets may be stored. So on Free or Pro you'll want to enable ZDR and disable telemetry under Settings. Enterprise admins get an explicit "train on customer code" toggle and US/EU data-residency selection that lower tiers don't expose, plus HIPAA BAAs for significant implementations. I'd rate the confidence here a notch lower than the others; the public terms are less precise about individual non-ZDR handling than I'd like.

Amazon Q Developer splits cleanly on Free versus Pro. On the Pro tier, AWS does not use your content for service improvement or to train foundation models at all; it's governed by the AWS service terms and GDPR DPA. On the Free tier, AWS may use your questions, the responses, and your generated code for service improvement, including model training, unless you opt out. The opt-out lives in your IDE: in VS Code, search settings for "Amazon Q: Share Content" and deselect it; JetBrains and Eclipse have an equivalent "Share Amazon Q content with AWS" checkbox. Organizations can also set an AI services opt-out policy in AWS Organizations to cover console and chat usage. There's no self-serve per-request zero-retention switch; data handling rides on the AWS agreement rather than a toggle.

Replit AI is the odd one out because its training line is public versus private, not free versus paid. Content you publish in a public App may be used by Replit to develop and train large language models, during and after your term. Private App content is excluded from AI training. For Replit's AI Integrations, paid model endpoints have training disabled, but free endpoints may train on and publish your prompts and completions. Enterprise enforces routing to zero-data-retention endpoints only, which narrows the model selection in exchange for the guarantee. A DPA with Standard Contractual Clauses is available, account data is deleted within 30 days of request, and Replit holds SOC 2 Type 2.

What to actually do about it

If you're on a free or individual plan of anything other than Tabnine or Cody, assume your code is in scope for training until you've changed a setting. The concrete moves:

Copilot (individual): profile → Copilot settings → "Allow GitHub to use my data for AI model training" → Disabled.
Cursor (Free/Pro): turn on Privacy Mode. That single switch both stops training and triggers ZDR with providers.
Codeium/Windsurf (individual): enable zero-data-retention mode from your profile page and disable telemetry under Settings.
Amazon Q (Free): deselect "Amazon Q: Share Content" in your IDE, or set an Organizations opt-out policy.
Replit: keep work in private Apps, and if you use AI Integrations, stay on paid endpoints or bring your own API key.

If the data simply can't leave the building, the conversation is shorter: Tabnine offers VPC, on-prem and air-gapped deployments, and Cody Enterprise plus Amazon Q Pro give you contractual no-training postures under a DPA.

How I checked this, and where the numbers come from

Every fact above comes from the vendors' own published data-use pages, security pages, docs and terms of service, cross-checked and dated. I didn't paraphrase marketing copy. The tier breakdowns track what the actual privacy and security documentation says, including the awkward parts vendors don't lead with (Copilot's default flip, Replit's free-endpoint publishing, Codeium's non-ZDR logging).

I keep these write-ups current at AI Data Watch, a cited directory of AI training and retention verdicts. Each tool has its own page with the per-tier breakdown and source links:

Vendor terms change, and they depend on your plan, region and contract, so treat this as a starting point for your own due diligence rather than legal advice. If something's drifted since I last verified it (most of these were re-checked on 31 May–1 June 2026), the per-tool pages carry the latest verification date.