Brian Carpio

Posted on May 20 • Originally published at outcomeops.ai

How to Find Your Own Code Inside ChatGPT — The Tiger Team Method Every Engineering Leader Should Run This Week

#ai #chatgpt #tigerteam #claude

There is a ten-minute test you can run on Monday morning that will tell you whether your engineers are pasting proprietary source code into ChatGPT. It costs nothing. It requires no procurement, no security review, no consultant. If the test comes back positive, you have an open audit finding the regulator hasn't discovered yet. If it comes back negative, you have a brief window to ship the platform that prevents the next finding.

The test is the Tiger Team Method. The platform is the architectural answer cloud and DevOps already proved works. This post covers both — and explains why the operator playbook that won the cloud transformation in 2014 is the same playbook that wins the AI governance transformation in 2026. Same human dynamics. Different layer.

The Tiger Team Method

Pull a distinctive internal artifact from your own codebase — a function name no one outside your engineering organization would invent, a variable naming convention specific to your team, a comment pattern your style guide enforces, an internal acronym that appears in your ADRs. The more idiosyncratic, the better. calculate_loyalty_tier_uplift, not calculate_total. RiskWeightedExposureCalculator, not Calculator. In a regulated health setting, riskAdjustedMemberScore or an internal acronym like HEDIS_gap_closure_engine — the kind of artifact that only exists inside a payer or provider system and would never appear in public training data by chance.

Open ChatGPT in a private window. No login. Run three queries:

"How do I implement [your function signature]?"
"Explain what [your internal class name] does in a Python service."
"Give me a working example of [your internal acronym] with error handling."

Repeat the same three queries against Claude, Gemini, and any other public model your engineers have access to. Look for two patterns in the responses. First, does the model return code that uses your exact internal terminology — not generic equivalents, but the specific names and patterns from your own codebase. Second, does the model exhibit suspicious confidence about an artifact that should have zero public footprint — explaining what your internal class "does" with details consistent with how it actually behaves in your production system.

Either signal is a positive Tiger Team result. The model has seen fragments of your code. The only way that happens is if one of your engineers pasted it in. The exfiltration has already occurred. What you do next determines whether you discover the rest of the leakage from your own audit, or from a regulator's.

The method has a name because it deserves to be a named process inside every engineering organization. The Tiger Team is a five-person standing audit — one engineer from platform, one from security, one from each of the two largest product orgs, and one from the office of the CTO. They run the test quarterly. They report findings to the executive leadership team. The artifact is a one-page memo. That's the entire process. The technical complexity is zero. The organizational complexity is whether you have the discipline to look.

What the Research Already Shows

Three findings frame the scale of the problem. None of them are speculative. All three are sourced to research the executive leadership team can quote in a board deck without qualification.

IBM's 2025 Cost of a Data Breach Report found that one in five organizations had a breach attributed to shadow AI. The average shadow-AI breach cost $670,000 more than a comparable standard breach — making shadow AI the third-costliest breach factor in the 2025 dataset, displacing security skills shortages from previous years. Sixty-five percent of shadow-AI breaches exposed customer PII, against a 53% global average. Ninety-seven percent of organizations that suffered an AI-related breach lacked AI access controls. Only 37% have an AI governance policy. Only 17% have technical controls that can prevent employees from uploading confidential data to public AI tools.

Gartner's November 2025 analysis, based on a survey of 302 cybersecurity leaders, predicts that more than 40% of global enterprises will suffer a security or compliance incident linked to unauthorized AI tools by 2030. Sixty-nine percent of cybersecurity leaders surveyed already have evidence or suspect their employees are using public GenAI tools at work. The prediction is not whether an incident will happen at most enterprises — it is which enterprises will be in the 40% versus the 60% that built a sanctioned alternative in time.

The Netskope Cloud and Threat Report on Generative AI (2025) documented that prompts sent to GenAI tools grew sixfold in one year — from 3,000 to 18,000 prompts per organization per month, with the top quartile of organizations sending more than 70,000 per month. Data volume into GenAI tools grew thirtyfold over the same period. Organizations now detect an average of 223 monthly attempts by employees to include sensitive data in GenAI prompts. The trajectory is not slowing. It is accelerating.

Cyberhaven's 2026 AI Adoption & Risk Report, based on analysis of usage patterns across millions of AI interactions at hundreds of enterprises, found that 39.7% of all AI interactions involve sensitive data, and 32.3% of ChatGPT usage occurs through personal accounts — bypassing SSO, centralized logging, enterprise retention policies, and any control your existing data loss prevention stack might apply. Among the categories of sensitive data flowing into AI tools, source code is the single largest category at 18.7% of all sensitive data inputs. The percentage of corporate data going into AI tools that is sensitive has grown from 10.7% two years ago to 27.4% last year to 34.8% today. Employees input sensitive data into AI tools every three days.

The named incidents you already know are the visible portion of this picture. Samsung Electronics suffered three separate semiconductor source-code and meeting-transcript leaks into ChatGPT within twenty days in April 2023, leading to a company-wide ban. JPMorgan Chase, Apple, and Amazon each restricted ChatGPT firm-wide in early 2023 after similar incidents, with Amazon's own counsel warning employees in writing that ChatGPT output "closely matches existing material we have already produced." The difference between those four organizations and yours is that they discovered their incidents. The Cyberhaven numbers say the same pattern is occurring at virtually every enterprise.

You've Lived This Movie Before

Anyone who ran a cloud or DevOps transformation between 2010 and 2022 has watched this pattern play out three or four times already. The shape is invariant. A new technology arrives. Developers want to use it. The official answer is "not yet, security hasn't reviewed it." Developers use it anyway, on personal credit cards or personal accounts, and the organization discovers the adoption during an audit. The platform team eventually ships a sanctioned alternative. The unsafe path stops being the easy path. The transition completes.

I've led enterprise-wide cloud and DevOps transformations more times than I'll recount here — the four below are the ones that map most directly onto the AI moment. The mechanics differ. The human dynamics do not. What follows is not theory. It is what I watched happen at four Fortune 50 organizations, in chronological order, with the specific platforms and outcomes documented at the time.

Pearson, 2012–2014 — The Nibiru Platform

Developers at Pearson were spinning up AWS accounts on personal credit cards. Security had banned AWS use because it hadn't been reviewed. The ban produced exactly zero compliance — engineers used AWS regardless, and the organization had no visibility into what was running where. We built Nibiru, a self-service platform that was effectively an IaaS/IaC layer over AWS before that category had a name — a Flask web UI and REST API on top of Puppet configuration management, Zabbix monitoring, Route 53 DNS, and an LDAP-backed inventory, with encryption, network controls, naming standards, and audit logging baked into the deployment surface itself. Provisioning that had previously taken 12–18 months through traditional IT collapsed to minutes through Nibiru. The platform was the guardrail. The ban became unnecessary because the sanctioned path was faster than the unsafe one. Gene Kim came on-site to see what we were doing. Engineers stopped using personal AWS accounts because the platform path was strictly better — not because anyone enforced anything.

Aetna, 2014–2017 — The Utopia Platform

Same playbook, container layer — except this fight was about where the platform ran, not what it ran on. Aetna's Enterprise IT and security organizations wanted the consumer business out of AWS entirely and back inside Aetna's own data centers. When Aetna's CISO called me and asked what I was going to do when the platform got DDoSed, I told him "autoscale — what are you going to do?" We built Utopia on Mesosphere and SaltStack, with Twistlock for container security and Checkmarx for SAST integrated into the deployment surface. The platform delivered 0.05% security defect density on the consumer code base, against 5% on the legacy core. That number won the argument politics couldn't: the cloud-native, containerized platform was measurably more secure than the on-prem stack we were being told to retreat to. The CISO did not grudgingly relent — he mandated Docker enterprise-wide, and Aetna became one of the most publicly referenced Twistlock deployments in the industry. The platform changed what governance looked like — the safe path and the fast path were the same path, and the data proved it.

Liberty Mutual, 2016–2017 — The Fusion Platform

Liberty Mutual's Consumer business unit had a struggling Docker migration when I was brought in. We built Fusion on Chef and Docker Datacenter, with a declarative Fusionfile at the center: teams declared what they needed (upstream/downstream sidecars, data layer components, pre/post deploy hooks) and the platform figured out the rest. By 2017 the platform scaled to 300+ services in containers, hundreds of deployments per day, and the team told the story in their own Docker conference talk, All Roads Lead to the Cloud: Liberty Mutual's Journey with Docker EE, walking through the Jenkins-driven pipelines and Docker Datacenter foundation the platform ran on. The Fusionfile pattern was the architectural ancestor of every "declare-what-you-need-in-a-config-file" system we use today — including the per-repo ADR and code-map manifest patterns that power modern AI engineering platforms. I unpacked that lineage in detail in What Is an AI Engineering Platform? (2026 Guide).

Comcast, 2019–2022 — The SEED Platform and the Governance Argument

Comcast is the centerpiece anchor for this post because it is the lived governance story that maps most directly onto where every enterprise sits with AI today. At an internal cloud summit during my time there, a VP at Comcast told the room something like "80% of our AWS spend is on EC2, and 80% of those EC2 instances are sitting at 1% utilization." That single statement was a governance failure quantified. It was the financial signature of an engineering organization where every team had been given AWS and left alone to figure it out. Local optimization at scale. Hundreds of teams, each writing their own Terraform, each spinning up their own EC2 fleets, each making the same decisions in isolation, and the consolidated bill telling the story neither the teams nor the central architecture function had bothered to look at.

It got worse before it got better. Comcast's senior architecture leadership read Accelerate, became convinced of the value of standardized CI/CD tooling, and mandated Concourse company-wide. Every engineering org was instructed to move to Concourse. The mandate produced the most expensive form of local optimization imaginable. Hundreds of teams spent months each writing Concourse YAML — thousands of lines of it per pipeline — reinventing the same patterns in isolation because Concourse had no concept of shared libraries at the time. I personally reviewed pipelines from team after team where engineers told me with pride that they had spent three months building their CI/CD. Three months. Per team. Across hundreds of teams. To produce variants of the same pipeline.

SEED predated all of this. We built it before the company-wide Concourse mandate landed — a self-service platform on Jenkins where teams consumed a shared library from their Jenkinsfile, passed in a few parameters and a tfvars file declaring the services they wanted, and got our underlying Terraform modules wired together and provisioned out of the box. When the Concourse mandate came down, senior architecture leadership gave SEED a hallway pass: SEED teams were exempted until Concourse could reach feature parity with what SEED already delivered — which it never did. We banned EC2 across the platform — not by writing a memo, but by making the alternative paved-road: SEED only deployed to Lambda and ECS Fargate, both of which autoscaled on demand and routinely ran at 90% utilization rather than the 1% the EC2 fleet was averaging. SEED integrated directly with Comcast's Change Management API, which meant any team adopting SEED got CAB bypass for routine deploys. SEED integrated with Comcast's centralized logging, AWS Config inventory, and code-quality gates. SEED teams never spent three months reinventing pipelines and Terraform the way the Concourse-path teams did, because making the right way the easy way meant they got the whole thing in an afternoon.

The platform was the nucleus. Security got centralized logging and policy enforcement without having to chase teams individually. Finance got the EC2 spend collapsed by structural design. Architects got a single place to make decisions that propagated across the entire organization without having to convince every team one at a time. Engineers got CAB bypass, instant deploys, and Lambda-grade autoscaling without writing Terraform. Every constituency won because the platform was the governance, instead of governance being a tax bolted onto whatever the teams were going to do anyway.

I wrote about this exact pattern in December 2022 in a post called DevOps Is the New Waste in 2023. The argument then: hundreds of teams reinventing the same CI/CD patterns is not DevOps, it is overproduction. The DORA 2022 State of DevOps Report had just shown zero elite performers globally for the first time in the survey's history. The reason was visible to anyone running a platform org inside a Fortune 50: the "DevOps" movement had decayed into every team building bespoke versions of the same infrastructure, in isolation, because no one had built the nucleus. Three years later that argument is exactly the AI argument. Every team building bespoke prompt frameworks, per-team RAG implementations, per-team context layers. No central nucleus. No queryable audit trail. No governance shape that compliance can sign off on. Same waste pattern. Same answer.

The Pattern Is the Same. The Layer Is Different.

An engineer pasting a function into ChatGPT in 2026 is the same human dynamic as an engineer spinning up an EC2 instance on a personal credit card in 2013. The engineer has a problem. The sanctioned path is too slow, too painful, or too restrictive. The unsanctioned path is right there, takes thirty seconds, and gives them an answer. The engineer is not malicious. The engineer is rational. The organization that ignores this dynamic ends up with the consolidated bill that the VP at Comcast read out loud, except this time the bill is denominated in proprietary source code on someone else's training set, regulatory exposure, and the audit finding waiting to be discovered.

The fix has the same shape as every prior layer's fix. Build a sanctioned platform that is faster, easier, and better than the unsafe option. Put the governance into the platform surface, not into a policy memo. Make the safe path the easy path. The platform is the guardrail. Everything else is theater.

The Cyberhaven number that 32.3% of ChatGPT usage occurs through personal accounts is the 2026 version of the personal AWS credit cards at Pearson in 2012. The IBM finding that 97% of breached organizations lack AI access controls is the 2026 version of every team having its own Terraform module at Comcast in 2020. The Gartner prediction that 40% of enterprises will suffer a shadow-AI incident by 2030 is the 2026 version of DORA's 0% elite performer finding from 2022 — a leading indicator that the gap between the platform-led organizations and the policy-led organizations is about to widen by an order of magnitude.

What the AI Platform Has to Do

The architectural requirements are short and have been litigated extensively in the prior posts on this site. I'll summarize and link rather than relitigate.

The platform must run inside the customer's own AWS account, not in a vendor cloud. SaaS data exfiltration is the entire problem the platform is supposed to solve, and you cannot solve an exfiltration problem with a platform whose architecture is exfiltration by design. We covered the customer-AWS deployment model in detail in AI Coding Tool That Deploys in Your AWS Account.

The platform must replace ChatGPT for code questions, and beat ChatGPT on time-to-useful-answer. If the safe path is slower than the unsafe path, engineers will route around it. The advantage the platform has is that it can retrieve authoritative internal context — ADRs, code maps, internal documentation — that ChatGPT does not have access to. That retrieval architecture is what turns the safe path into the faster path. We unpacked the retrieval design in Why RAG Isn't Enough for Code: Adding a Graph.

The platform must log every interaction in a customer-owned audit trail. Who asked, what was retrieved, what was generated, what got merged. The audit trail is the artifact the Tiger Team uses to demonstrate governance to a regulator. The audit trail is not optional. It is the entire reason executive leadership invested in the platform.

The platform must use customer context, not training data. The output should reflect your architectural decisions and your internal patterns. The model is commodity. The context is the moat. We made this argument in What Are Context Engineering Platforms?

The platform must deploy in weeks, not quarters. Shadow AI is in flight right now. Quarterly procurement is not a strategy for an active incident. The customer-AWS deployment model collapses procurement to a Terraform read-through because the platform inherits the existing AWS posture rather than introducing a new vendor. We covered the compliance procurement path in AI Coding Tools for Regulated Industries.

That is what an AI engineering platform must do to actually solve the shadow-AI governance problem rather than perform solving it. Anything that is structurally a SaaS subscription with a security policy attached is policy theater dressed up as a platform. The Tiger Team will still find your code in ChatGPT. The bill will still arrive.

The Nucleus Argument

SEED was not just a deployment platform. SEED was the nucleus that gave every constituency inside Comcast's engineering org what it needed without forcing them to negotiate with each other. Security got centralized policy. Finance got cost structure. Architects got organizational consistency. Engineers got speed. Compliance got audit. The nucleus is the platform pattern that scales because every constituency wins simultaneously rather than one at the expense of another.

An AI engineering platform serves the same role at the new layer. It is the nucleus of the engineering organization's intelligence layer. The security team uses it to enforce policy on AI usage and produce audit evidence. The development teams use it because it is faster than ChatGPT and grounded in their own patterns. The architects use it to make organizational decisions that propagate without per-team negotiation. The compliance function uses it to demonstrate governance to regulators. The platform is the answer to every constituency's shadow-AI problem at once, not just one constituency's.

The organizations that built nucleus platforms during the cloud transformation won the cloud transformation. The organizations that built nucleus platforms during the DevOps transformation are the organizations that show up as elite performers on the DORA report when there are any elite performers to show up. The organizations that build nucleus platforms during the AI transformation will be the 60% that did not appear in Gartner's 2030 shadow-AI incident statistic. The pattern is invariant. The transformation is on a slightly faster clock this time. That is the only difference.

Run the Test Tomorrow

The Tiger Team Method is ten minutes of work. Pull a distinctive internal artifact from your codebase. Run three queries against ChatGPT, Claude, and Gemini in a private window. Look for your own terminology coming back. If it does, the exfiltration has already happened and you have a window to ship the platform before the regulator does the test for you. If it doesn't, you have an even better window because you can ship the platform while the problem is still ahead of you instead of behind.

The platform is what the cloud transformation, the container transformation, the CI/CD transformation, and the DevOps transformation all proved already. Build the nucleus. Make the safe path faster than the unsafe path. Put the governance into the platform surface instead of into a policy memo no one reads. Hand every constituency — security, developers, architects, compliance, finance — the same answer to the same problem.

Make the right way the easy way. That's how you won cloud. That's how you'll win AI.

How to Evaluate

The two-week proof of concept is structured for this evaluation. Apply the OutcomeOps Terraform into a non-production AWS account, run the Tiger Team Method against your own codebase, connect 20 representative repositories, and verify that the audit trail captures the interaction quality your compliance function needs. Book an enterprise briefing to start the PoC, or run the five-minute Readiness Assessment to get a written report on where your organization sits before scheduling.

Run the Tiger Team Test. Then Build the Nucleus.

If the test comes back positive, the exfiltration has already happened and you have a narrow window to ship the platform before a regulator runs the test for you. If it comes back negative, you have an even better one. Either way the answer is the same nucleus the cloud and DevOps transformations already proved works — deployed in your own AWS account, in weeks.

The platform is the guardrail. Everything else is theater.

Book an Enterprise Briefing

DEV Community