Max Quimby

Posted on Mar 26 • Edited on Mar 30 • Originally published at computeleap.com

GitHub Copilot Is Now Training on Your Code. Here's Your 30-Day Window to Stop It.

#github #githubcopilot #ai #privacy

GitHub just made one of its most consequential policy changes since acquiring Copilot's underlying technology. On March 25, 2026 — today — Microsoft-owned GitHub announced that starting April 24, 2026, all interaction data from Copilot Free, Pro, and Pro+ users will be used to train AI models. Code snippets you write, file names in your repos, navigation patterns across your codebase, even the comments and documentation you author — all of it feeds the training pipeline unless you explicitly opt out.

📖 Read the full version with charts and embedded sources on ComputeLeap →

This isn't a future possibility. It's a 30-day countdown. If you're one of the millions of developers using GitHub Copilot on a personal account and you do nothing before April 24, your coding interactions become training data for Microsoft's AI models. Business and Enterprise tier customers are exempt — their contracts prohibit it. But if you're on Free, Pro, or Pro+, the default is opt-in, and the clock is ticking.

Let's break down exactly what's happening, what data is at stake, how to opt out in under 60 seconds, and whether this changes the calculus for Copilot versus its competitors.

What GitHub Actually Announced

GitHub's Chief Product Officer Mario Rodriguez published a blog post today framing the change as an improvement to model quality. The core claim: training on real-world developer interaction data will produce "more accurate and secure code pattern suggestions" and better bug detection. GitHub says their experiments with Microsoft employee interaction data showed "meaningful improvements, including increased acceptance rates in multiple languages."

The pitch is straightforward: your data makes the models better for everyone. The problem is the delivery mechanism — it's opt-out, not opt-in. And for paying Pro and Pro+ customers who are already sending GitHub $10–$39/month, the expectation that their coding data would silently become training material feels like a breach of trust, not a feature upgrade.

⚠️ 30-day deadline. The new policy takes effect April 24, 2026. If you haven't opted out by then, your Copilot interaction data — code snippets, file names, repo structure, navigation patterns — will be used for AI model training. Opt out now →

Exactly What Data Is (and Isn't) Collected

GitHub was relatively transparent about the scope. Here's the complete breakdown from the official announcement:

Data That WILL Be Used for Training

If you don't opt out, GitHub will collect and use:

Code outputs you accept or modify — every suggestion you tab-complete or edit becomes training data
Inputs sent to Copilot — including code snippets shown to the model for context
Code context surrounding your cursor position — the model sees what's around your cursor, and so does the training pipeline
Comments and documentation you write — your inline comments, docstrings, and documentation
File names, repository structure, and navigation patterns — how your project is organized and how you move through it
Copilot feature interactions — chat conversations, inline suggestions, code review interactions
Feedback on suggestions — thumbs up/down ratings

Data That Won't Be Used

Copilot Business or Enterprise interaction data — contractually prohibited
Enterprise-owned repository content — excluded regardless of user tier
Data from users who opt out — your preference is respected (GitHub says)
Private repository content "at rest" — GitHub draws a careful distinction here

ℹ️ The "at rest" distinction matters. GitHub explicitly says they don't use "content from your issues, discussions, or private repositories at rest." But they add: "Copilot does process code from private repositories when you are actively using Copilot. This interaction data is required to run the service and could be used for model training unless you opt out." Translation: your private code isn't scraped from repos, but the moment Copilot sees it during a session, it's fair game.

Where the Data Goes

The collected data may be shared with "GitHub affiliates, which are companies in our corporate family including Microsoft." It will not be shared with third-party AI model providers or independent service providers. So OpenAI, Anthropic, and Google won't see your Copilot interaction data directly — but every team inside Microsoft's AI division potentially could.

How to Opt Out (60-Second Guide)

The opt-out process is simple, but GitHub didn't exactly make it easy to find. The blog post links to github.com/settings/copilot but doesn't name the specific toggle. Here's exactly where to go:

Step 1: Go to https://github.com/settings/copilot

Step 2: Scroll down to the "Privacy" section (or go directly to https://github.com/settings/copilot/features)

Step 3: Find "Allow GitHub to use my data for AI model training"

Step 4: Set it to Disabled

That's it. One toggle. But there's an important nuance: if you previously opted out of data collection for product improvements, GitHub says your preference has been retained. You shouldn't need to re-opt-out. But verify anyway — multiple users on Hacker News reported finding the toggle enabled despite believing they had previously disabled it.

ℹ️ Go to github.com/settings/copilot/features → Look for "Allow GitHub to use my Copilot interaction data for AI model training" → Set to Disabled. The toggle is under the "Features" tab in your Copilot settings.

🚨 Verify your setting, don't assume. Several developers in the HN discussion reported that the training toggle was enabled on their accounts despite having previously opted out of data collection. One user wrote: "I just checked my Github settings, and found that sharing my data was 'enabled'. This setting does not represent my wishes and I definitely would not have set it that way on purpose." Whether this is a bug or a dark pattern, the safest move is to check right now.

The Business/Enterprise Exemption — and Its Gray Areas

GitHub is clear that Copilot Business and Enterprise customers are contractually exempt. From their FAQ:

"Our agreements with Business and Enterprise customers prohibit using their Copilot interaction data for model training, and we honor those commitments."

But the real world is messier than clean tier boundaries. Consider this scenario, raised by multiple commenters on HN: a developer has a personal GitHub account with Copilot Free or Pro. Their employer uses GitHub Enterprise. The developer contributes to private enterprise repositories using their personal Copilot subscription. Does the enterprise exemption apply?

GitHub employee Martin Woodward clarified on HN: "We do not train on the contents from any paid organization's repos, regardless of whether a user is working in that repo with a Copilot Free, Pro, or Pro+ subscription. If a user's GitHub account is a member of or outside collaborator with a paid organization, we exclude their interaction data from model training."

That's a strong statement — if it's enforced consistently. But the fact that this clarification required a GitHub employee to step into an HN thread, rather than being spelled out in the official blog post, tells you something about how well this rollout was communicated.

ℹ️ For enterprise security teams: If your developers use personal Copilot accounts on company code, GitHub says they exclude interaction data from members of paid organizations. But this relies on account-level detection, not repository-level enforcement. Consider whether your organization's security posture should depend on this distinction.

The Dark Pattern Question

Let's talk about how GitHub framed this toggle. Multiple developers on Hacker News noted the way the setting is presented:

Enabled = "You will have access to the feature"
Disabled = "You won't have access to the feature"

As one commenter put it: "As if handing over your data for free is a perk. Kinda hilarious."

The framing isn't accidental. Describing data training as a "feature" you "have access to" is textbook FOMO design — it implies you're losing something by opting out. GitHub could have labeled this "Allow GitHub to use my interaction data for AI model training: Yes/No." Instead, they made opting out feel like giving something up.

The notification email GitHub sent was similarly opaque. As another HN user noted: "They didn't even link the setting in their email. They didn't even name it specifically, just vaguely gestured toward it."

This doesn't make GitHub evil. It makes them a large corporation optimizing for data collection while maintaining plausible deniability about user friction. The opt-out exists. The default does the work.

How Copilot's Privacy Compares to Every Major Competitor

This policy change doesn't exist in a vacuum. Every AI coding assistant handles data differently, and the differences matter — especially if you're writing proprietary code or working under NDA. Here's how the major players stack up:

Feature	GitHub Copilot (Free/Pro/Pro+)	Cursor	Windsurf (Codeium)	Cline	Tabnine
Trains on your code by default?	✅ Yes (as of Apr 24)	✅ Yes (Privacy Mode OFF by default on Free/Pro)	⚠️ Possible (privacy policy allows it)	❌ No (open source, local API keys)	❌ No ("Your code never trains our models")
Opt-out available?	✅ Yes, single toggle	✅ Yes, Privacy Mode toggle	✅ Yes, Zero Data Retention mode	N/A (no data sent to vendor)	N/A (never trains on code)
Zero data retention option?	❌ No (only opt-out of training)	✅ Yes (with Privacy Mode)	✅ Yes (Zero Data Retention mode)	✅ Inherent (bring-your-own API keys)	✅ Yes (default behavior)
Enterprise/Business exempt?	✅ Yes, contractually	✅ Yes (Business plan)	✅ Yes (Enterprise plan)	N/A	✅ Yes
Self-hosted option?	❌ No	❌ No	✅ Yes (on-prem available)	✅ Yes (runs locally)	✅ Yes (on-prem, air-gapped)
Data shared with parent company?	✅ Yes (Microsoft affiliates)	Unclear	⚠️ Oracle Cloud for inference	❌ No	❌ No
Open source?	❌ No	❌ No	❌ No	✅ Yes (Apache 2.0)	❌ No
GDPR/SOC2 compliance?	✅ Yes	✅ Yes	✅ Yes (SOC2 Type II)	N/A (self-hosted)	✅ Yes (SOC2 Type II)

The takeaway is stark: Copilot is now the least private mainstream AI coding assistant for individual users. Cursor at least defaults to off for paid Business plans and offers true zero data retention. Tabnine has built its entire brand around "your code never trains our models." Cline avoids the problem entirely by running through your own API keys — no vendor ever sees your code beyond the LLM inference call.

Windsurf is an interesting middle ground. Their privacy policy technically allows training on user data, and Reddit users have raised concerns about the gap between marketing ("zero data retention!") and what the fine print permits. But even Windsurf offers a zero-retention mode that Copilot doesn't match.

ℹ️ Privacy isn't the only factor. Copilot still has the deepest IDE integration, the largest model ecosystem (Claude, GPT, Gemini, and Copilot's own models), and the tightest GitHub platform integration. But if privacy is your top priority, alternatives have meaningfully better defaults.

Community Reaction: "What Did Everyone Expect?"

The announcement hit Hacker News within hours and rapidly climbed to the front page, accumulating 150+ points and 72+ comments. The discussion thread reads like a focus group for developer distrust of big tech platforms.

The dominant sentiment isn't surprise — it's weary resignation. "What did everyone expect?" wrote one of the top-voted commenters. "I can't understand this community's trust of Microsoft or startups. It's the typical land grab: start off decent, win people over, build a moat, then start shaking everybody down in the most egregious way possible. It's just unusual how quickly they're going for the shakedown this time." Others echoed this with pointed references to Microsoft's history: "Can't believe Microslop is force-feeding people Copilot in yet another way." The cynicism is baked into years of watching platform incentives play out — free tier gets you hooked, then the extraction begins. Several developers announced they were moving repositories to self-hosted git or Codeberg, with one writing: "Thanks to Github and the AI apocalypse, all my software is now stored on a private git repository on my server."

But the technical objections cut deeper than sentiment. EU developers immediately raised GDPR concerns: "What is the legal basis of this in the EU? The collected information could easily contain PII, and consent would have to be freely given, specific, informed and unambiguous." Others pointed out a fundamental enforcement problem: if a developer with a personal Copilot account works on proprietary or source-available code, GitHub's system would absorb that code into training data, potentially violating the code's license. The question of whether GitHub can legally train on copyleft-licensed code that passes through Copilot sessions remains genuinely unresolved. On Reddit's r/GithubCopilot, similar threads emerged with developers sharing opt-out instructions and debating whether the toggle was trustworthy. The overall temperature across platforms is clear: developers feel this was done to them, not for them, regardless of GitHub's framing about improving model quality for the community.

View the full HN discussion →

The reaction on X/Twitter was equally pointed. Developer @HedgieMarkets posted a detailed breakdown criticizing GitHub for "enrolling users into a training program through opt-out rather than active consent," noting that "opt-out defaults exist because companies know most people never change them." The post highlighted that the data scope goes far beyond autocomplete — encompassing file names, repository structure, navigation patterns, and code context.

Other developers were less analytical and more visceral. @bruvimtired's response — simply quoting the policy text with "lol wtf" — captured the gut reaction many felt upon reading the announcement.

The concern isn't new, either. Back in September 2025, Simon Willison — one of the most respected voices in the developer community — was already asking whether Microsoft could clearly promise that nothing passed to Copilot would be used as training data. Six months later, we have our answer: it will be, unless you opt out.

View original post on X →

What This Means for the AI Coding Assistant Market

This policy change is a strategic bet by Microsoft. By training on interaction data from millions of Copilot users, they're building a data moat that competitors can't easily replicate. OpenAI has code training data from the public internet. Anthropic trains Claude on curated datasets. But no one else has real-time interaction data from developers actively coding — the accepted suggestions, the rejected ones, the patterns of how developers navigate and modify code.

The question is whether the privacy cost is worth the quality improvement. GitHub claims their experiments with Microsoft employee data showed "meaningful improvements." But they haven't published benchmarks, acceptance rate deltas, or any verifiable metric. "Trust us, it's better" isn't a privacy policy — it's a marketing pitch.

For individual developers, the calculation is simple:

If you don't care about data training and want the best possible Copilot experience, leave it enabled. Your data arguably helps the model.
If you write proprietary code, work under NDA, or care about intellectual property, opt out immediately. The risk-reward ratio is terrible.
If you're evaluating alternatives, this is a good moment to trial Cursor (with Privacy Mode on), Cline (fully local), or Tabnine (never-train guarantee).

For engineering leaders, this is a policy review trigger. Audit which developers use personal Copilot accounts on company code. Confirm your enterprise exemption is properly configured. And have an honest conversation about whether GitHub's platform lock-in is worth the privacy trade-off.

FAQ

How do I opt out of GitHub Copilot training data collection?

Go to https://github.com/settings/copilot, scroll to the Privacy section (or navigate directly to https://github.com/settings/copilot/features), find "Allow GitHub to use my data for AI model training," and set it to Disabled. This takes effect immediately.

What data does GitHub Copilot collect for training?

GitHub collects: code outputs you accept or modify, inputs and code snippets sent to Copilot, code context around your cursor, comments and documentation you write, file names, repository structure, navigation patterns, Copilot feature interactions (chat, inline suggestions), and your feedback on suggestions (thumbs up/down).

Is GitHub Copilot Enterprise exempt from training data collection?

Yes. Copilot Business and Copilot Enterprise customers are contractually exempt. Their interaction data is never used for model training. Additionally, if your personal GitHub account is a member or outside collaborator of a paid organization, your interaction data is excluded from training.

When does the new Copilot data policy take effect?

The policy takes effect April 24, 2026 — 30 days from the announcement on March 25, 2026. If you haven't opted out by that date, your interaction data will begin being used for training.

Is Cursor better than Copilot for privacy?

It depends on your configuration. Cursor's Privacy Mode (when enabled) provides zero data retention — your code is never stored or used for training. However, Privacy Mode is off by default on Free and Pro plans, meaning Cursor also trains on your data unless you opt in to privacy. The key difference: Cursor offers true zero data retention, while Copilot only offers an opt-out from training (your data may still be stored). Tabnine and Cline offer even stronger privacy guarantees by default.

Does opting out affect Copilot's functionality?

No. GitHub explicitly states: "If you prefer not to participate, that's fine too — you will still be able to take full advantage of the AI features you know and love." You lose nothing by opting out.

Can GitHub be trusted to honor the opt-out?

GitHub says preferences are respected and previously opted-out users' choices are preserved. However, some developers have reported finding the toggle re-enabled despite previously disabling it. There's no independent audit mechanism. The answer is: trust but verify — check your settings periodically.

This article was published on March 25, 2026 — the day of GitHub's announcement. We'll update this piece if GitHub modifies the policy, adds new privacy controls, or if the April 24 deadline changes. Subscribe to ComputeLeap for updates.

Sources: GitHub Official Announcement · GitHub Community FAQ · Hacker News Discussion · Cursor Privacy Policy · Windsurf Security · Tabnine Code Privacy