80% of DE Candidates Use AI on Take-Homes. Companies Can't Stop It.

#dataengineering #interview #career #beginners

I've been on both sides of the hiring table for data engineering roles. I've given take-homes, graded take-homes, argued with other panelists about take-homes, and done my share of them as a candidate. So when I tell you the entire system is broken in a way nobody wants to talk about honestly, I'm not theorizing. I watched it happen in real time.

Here's the situation: 64% of companies now prohibit AI tools in technical interviews. Meanwhile, 35% of candidates are using LLMs anyway, up from 15% just six months prior. In purely technical roles, that number climbs to 48%. And 61% of those candidates pass the approval threshold and advance without anyone noticing. The ban exists on paper. In practice, it's a suggestion that penalizes the people who follow it.

The Honest Candidate Tax

This is the part that actually pisses me off. If you're a data engineering candidate who follows the rules, who sits down with your take-home and writes your own SQL, builds your own pipeline, tests your own edge cases, you are now competing against people whose submissions were polished by an LLM in a fraction of the time. And the hiring team cannot tell the difference.

Cheaters have a roughly 3:1 pass rate advantage. That's not a guess; that's from Fabric's analysis of 19,368 interviews between July 2025 and January 2026. Candidates using AI tools scored above the 7.0 approval threshold 61% of the time. The honest candidates? They're producing slower, rougher, less polished work. Because that's what real human output looks like when you're solving an unfamiliar problem under time pressure.

It gets worse. Take-home assignments have ballooned. What used to be a 2 to 3 hour exercise is now routinely 10 to 20 hours of unpaid work. Full pipeline implementations, data modeling, documentation, testing, presentations. At that scope, using AI isn't just tempting; it's economically rational. You're asking someone to do a part-time job for free and then punishing them for using the most efficient tool available.

The 20-hour take-home created the cheating incentive. Companies shifted from live coding to extended take-homes to "reduce bias" and inadvertently built the perfect environment for undetectable AI assistance.

83% of candidates say they would use AI if they could get away with it. I'm honestly surprised the number is that low. The game theory here is a textbook prisoner's dilemma: if you assume your competition is cheating (and statistically, they are), following the rules is the losing move. Genuine candidates report feeling forced to cheat because they assume everyone else already is.

And the detection? It's theater. Some platforms claim 93% accuracy analyzing keystroke patterns and tab-switching behavior. But invisible overlay tools like Cluely and Interview Coder now render answers using DirectX and Metal at the OS level, completely invisible to screen sharing. A second device listening to interview audio works just as well. The detection arms race is over before it started.

The Ban That Nobody Can Enforce

Here's the double standard that makes this whole thing absurd: 64% of organizations using AI in HR apply it to recruiting and interviewing on their end. They're screening your resume with AI, generating interview questions with AI, scoring your responses with AI. But you, the candidate? You're banned from using AI. Because integrity.

Amazon explicitly disqualifies candidates caught using AI. Goldman Sachs told campus recruits they "must not use ChatGPT, Google, or any external AI assistance." Noble policies. Zero enforcement mechanism. Neither company has a reliable way to detect it. Enforcement depends on candidates self-reporting or failing live follow-ups.

71% of engineering leaders admit AI makes technical skills harder to assess. And yet 62% still prohibit it despite acknowledging they cannot detect violations. This isn't a policy; it's a prayer.

The detection tools themselves are worse than useless. AI detectors bundled into platforms like Turnitin and GPTZero are, by multiple 2026 analyses, "increasingly wrong" because candidates can prompt an LLM to generate novel solutions that plagiarism software flags as original work (because they are). False positive rates range from 1% to 30% depending on the tool. So you've got honest candidates getting flagged for coincidental code similarity while actual cheaters using invisible overlays sail through. The system protects liars better than truth-tellers.

The core problem isn't that AI is too good. It's that the problem is unsolvable at scale. A candidate can prompt GPT-4 to generate novel, non-plagiarized code for any assignment, and no static analysis can distinguish it from original work without access to the candidate's reasoning process. The only scalable detection is process visibility: pair programming, timestamped drafts, in-person walkthroughs. And companies resist all of those because they don't scale cheaply.

One company's response, when shown data that 80% of their take-home submissions used LLMs? They decided to ignore the cheating and just move top performers to the next round. That's not a hiring process. That's capitulation.

Three Companies, Three Opposite Bets on the Future

The industry hasn't converged on a solution. It's fractured into at least three incompatible approaches, and if you're job hunting in data engineering right now, you need to understand all of them.

The AI-required camp. Meta launched AI-enabled interviews in October 2025. Candidates work in CoderPad with access to GPT-4o, Claude, Gemini, or Llama. They're evaluated on AI fluency, prompt engineering, output validation, and debugging. The company plans to expand this to all backend and ops roles in 2026. Canva went further: they replaced their entire "Computer Science Fundamentals" interview with "AI-Assisted Coding" for backend, ML, and frontend roles. Candidates must use Copilot, Cursor, or Claude. The problems are designed so they can't be solved with a single prompt; they require iterative thinking and judgment.

The signal these companies are hiring for isn't "can you code without help." It's "can you direct AI correctly, catch its mistakes, and defend every architectural decision." Candidates who passed these rounds weren't better prompters. They knew what to build, caught what the AI got wrong, and could explain why.

The ban-and-hope camp. Amazon and Goldman Sachs sit here. Explicit prohibition, no reliable detection, trust-based enforcement. Less than 30% of companies that ban AI have actually retrained their interviewers to spot it. The policy exists to provide legal cover, not to change outcomes.

The hybrid camp. 41% of companies now pair a take-home with a synchronous defense session. You do the work at home (with whatever tools you actually use), then you sit down with an engineer for 30 minutes and explain it. This is where LLM help evaporates. If you can't walk through your own solution, modify it on the fly, and handle edge cases in conversation, the take-home score doesn't matter. It's spreading as the unspoken standard because it's the only format that actually tests what companies care about.

The red flags interviewers are learning to spot in those defense sessions: explanation-code mismatch (your spoken reasoning contradicts what you wrote), terminology beyond your demonstrated level (a junior suddenly discussing architectural patterns they can't elaborate on), and the tell-tale 3 to 5 second delay before every answer that suggests an overlay is generating responses in real time.

The Career Implications Nobody's Saying Out Loud

Entry-level data engineering roles are getting hammered the hardest. Junior candidate cheating nearly tripled, from 15% to 40% year over year. And junior candidates have the lowest detection risk because interviewers expect less fluency from them. A senior engineer dropping suspiciously polished system design answers raises eyebrows. A junior producing clean code? That just looks like a strong candidate.

This inverts the hiring funnel in a way that should terrify everyone. The most junior, least skilled cohort has the highest incentive to cheat and the best chance of getting away with it. They get hired. They can't do the job. The team absorbs the cost. And six months later, the same team posts the same req, runs the same broken process, and wonders why their pipeline keeps breaking.

Here's where it lands for your career. If you're a candidate: the interview is a game. It has always been a game. AI didn't make it arbitrary; it was already arbitrary. DS&A has always been a mechanism to rank candidates, not an indicator of data engineering experience. What changed is the rules of the game, and right now nobody agrees on what the rules are. So you need to prepare for all three formats. Know your fundamentals cold; not because a take-home requires it, but because the 30-minute live defense does. That's where the real hiring decision happens now.

If you're on a hiring panel: stop pretending your take-home ban is enforceable. It isn't. Either redesign around it (hybrid format, live defense, AI-collaborative sessions) or accept that you're selecting for candidates who are good at hiding AI use. That's a skill, sure. It's just not the one you think you're testing for.

82% of data professionals now use AI tools daily. We're banning in interviews the exact workflow we expect on the job. At some point, the industry has to reconcile those two facts.

The companies that figure this out first will hire the best engineers. The ones clinging to unenforceable bans will hire the best cheaters. Same resume, same score, very different outcome six months in.

What's your read? If you're interviewing right now, are you using AI on take-homes, and do you think the hybrid format (take-home plus live defense) actually solves the problem, or just moves it?