Data Engineers Don't Need DSA. So Why Do Interviews Still Test It?

#career #dataengineering #interview #sql

I did somewhere around 20 interview loops during my last job search. Phone screens, take-homes, onsites, "culture chats" that were secretly technical screens. At one company I did eight rounds, was told I passed, was told the offer was sent, it was never sent, then a new recruiter said I'd declined the offer I never saw. I did four more rounds. Passed again. Headcount was closed.

Through all of that, you know what never once came up on the actual job? Inverting a binary tree.

This debate isn't new. Every six months, a Reddit thread blows up with senior data engineering folks asking why they're being tested on dynamic programming when their actual job is debugging why a pipeline silently dropped 2M rows last Tuesday. But in 2026, with 80,000 tech layoffs in Q1 alone and companies banning AI tools in interviews while AI reshapes the job itself, the question isn't theoretical anymore. It's a breaking point.

The DSA Debate That Won't Die

Here's why this fight resurfaces like clockwork: nobody agrees on what a data engineer even is.

I'm not being glib. The title means something completely different at Google than it does at a Series B startup than it does at a Fortune 500 retailer. When companies can't define the role, they fall back on the only standardized proxy they have: LeetCode-style algorithm problems. Binary trees. Graph traversal. Backtracking. Problems that software engineers have been grinding for a decade, repurposed wholesale for a role that shares maybe 30% of the same skill set.

DSA is a mechanism to rank candidates; not an indicator of data engineering experience. I've said this before and I'll keep saying it. Accept it for the arbitrary IQ measuring stick that it is.

But here's the thing that's changed. The market broke.

80,000 people lost their jobs in the first quarter of 2026. Nearly half of those cuts were attributed to AI and automation. The people flooding the interview pipeline aren't junior developers taking their first shot; they're experienced engineers with production systems under their belt, competing for fewer roles, facing 5 to 7 round loops that stretch 60 to 90 days. Karat's data across 600,000+ technical interviews confirms this is the norm now, not the exception.

And fewer than 30% of companies have updated their assessment systems to reflect what data engineering actually requires.

Let that sit. Seven out of ten companies are screening data engineers the same way they screened them in 2022. The tools changed. The job changed. AI changed everything about how we write and review code. The interview didn't change.

Senior DEs aren't walking away from interview loops because they can't code. They're walking away because the cost/benefit calculation broke. Twenty hours of prep for problems that have zero correlation with the job, in a market where the roles might disappear before the loop finishes.

What DSA Actually Measures (and What It Doesn't)

Let's be precise about this. LeetCode has 3,000+ problems. The vast majority test binary trees, dynamic programming, and graph algorithms. Skills that data engineers report using "never to rarely" in production.

You know what I use daily? SQL window functions. CTEs. Deduplication logic. Understanding why a LEFT JOIN is silently inflating row counts because someone upstream changed a grain without telling anyone. Figuring out why a Spark job is spilling to disk. Debugging schema drift that broke a downstream dashboard the CFO reads every Monday.

None of that is on LeetCode.

An empirical study from interviewing.io found that LeetCode rating has no correlation with interview performance percentile. What does correlate? Problem volume solved. Which isn't a signal of capability; it's a signal of free time. That's a selection bias, not a predictor of job performance.

SQL appears in 61% of data engineering job postings. Data modeling skills have 122,000+ open US roles. Cloud cost optimization is now a top-5 interview category at companies tying bonus incentives to infrastructure savings. Yet the screening gate for all of these roles is still "solve this medium in 25 minutes."

I've been on hiring panels where we passed on strong candidates for the dumbest reasons. "They got the optimal solution but took too long." Meanwhile, the candidate who speed-ran the binary search problem couldn't explain what idempotency means or why you'd want it in a pipeline. We hired the fast one. That pipeline broke in production within a month.

50+ companies (Airtable, Buffer, Calendly, CircleCI, and others) have moved away from LeetCode-style assessments entirely, replacing them with take-home projects, code reviews, and system design discussions. The signal is there. The industry just hasn't followed it at scale.

The AI-Banning Irony

This is the part that makes me want to punch a hole in the wall.

62% of organizations prohibit AI use in technical interviews. At the same time, 76% of data engineering work is now enhanced by AI tools, delivering 25% productivity improvements on average. Companies are telling candidates: "Don't use the thing you'll use every single day if we hire you."

It's like banning calculators from a math test in a world where every math job involves using calculators.

And it gets better. Karat's data says over half of candidates use AI anyway, despite being told not to. No company has disclosed a scalable detection method beyond "watch their eyes" and "screen recording." The enforcement is theater.

Anthropic, the company that built Claude, initially banned candidates from using AI in interviews. Then reversed the policy in July 2025. If the company most invested in AI's credibility can't figure out a coherent policy, what chance does your average enterprise hiring committee have?

Meanwhile, Meta went the opposite direction and piloted AI-enabled interviews where Claude, GPT, and Gemini are built into the coding environment. Amazon explicitly bans all GenAI with disqualification as the penalty. Google brought back in-person rounds because remote assessments were too easy to game.

There's no consensus. There's no stable equilibrium. There's just companies reacting quarter by quarter while candidates try to figure out which rules apply at which company.

Here's the contrarian take nobody wants to hear: if an AI can spit out a clean solution to a medium LC problem, what does asking that problem actually tell you about the candidate? That they memorized something a machine produces on demand? The signal was already thin. Now it's basically noise.

What Actually Predicts Whether Someone Can Do This Job

The actual job of data engineering is less "write a DAG" and more "figure out why finance's board deck had wrong numbers for three months and nobody noticed." It's debugging. It's data modeling. It's understanding the business well enough to catch when something looks wrong before stakeholders do.

Karat's own data from 400 engineering leaders confirms the baseline assessment focus should be SQL proficiency, window functions, CTEs, and Python fundamentals. Not graph algorithms. Not dynamic programming.

The companies getting this right are testing for:

Data modeling fluency. Can you design a schema that won't collapse when requirements change? Can you explain why you'd keep fact tables at grain? This is the make-or-break round, and every practitioner knows it.
Pipeline architecture. Not system design in the SWE sense (I don't care about load balancers and reverse proxies). Can you design an ETL pipeline that handles late-arriving data, schema evolution, and failure recovery?
Cost reasoning. Cloud cost optimization is now a top interview category. Can you explain why denormalizing that table saves $40K/year in compute even though it costs $200/year in storage? The economics argument wins every time.
Incident debugging. What broke, why, and how do you make sure it never happens again? This is 60% of the actual job and maybe 5% of interview loops.

35% year-over-year growth in data engineering demand tells you the career isn't going anywhere. 2.9 million data-related roles remain open globally. The role is healthy. The hiring process is sick.

Play the Game, But Name It

I'm not going to sit here and tell you to boycott DSA prep. That's bad advice from people who already have jobs. The game is the game. If you're interviewing at companies that screen on LeetCode, you grind LeetCode. Stick to mediums; do 50 and you'll be solid. Few companies ask hards consistently.

But let's stop pretending this process is meritocratic. It's not. It's standardized and defensible, which is what legal departments and risk-averse hiring committees want. It has almost nothing to do with predicting whether you'll be good at maintaining the pipeline that finance depends on for board decks.

Interviewing is a skill. It's separate from the actual job. Treat prep like a job. I'ma be super honest: I have a degree from a degree mill and don't feel particularly "skilled." Just a grind.

The real fix isn't going to come from candidates complaining on Reddit. It's going to come from companies losing great engineers because those engineers did the math. Twenty hours of algorithm prep for a role where you'll never touch an algorithm, in a market where you might get ghosted anyway, while simultaneously being told you can't use the AI tools that define modern engineering work. At some point, the experienced people just stop showing up for that loop.

Some companies have figured this out. The 50+ that ditched LeetCode. The ones testing pipeline architecture, data modeling, and cost optimization. They're getting better candidates because they're filtering for the right signal.

The rest are going to keep wondering why their data pipelines break and their senior engineers leave.

I've been through three waves of "data engineering is getting automated away." Still here. Still employed. Still debugging the same categories of problems. The tools change every 18 months. The problems don't change. Schema drift, late-arriving data, upstream teams breaking contracts without telling you. These are eternal.

The interview process should test for those eternal problems. Not for whether you memorized the optimal solution to "Minimum Window Substring."

What's the worst interview loop you've been through, and did the questions have anything to do with the actual job?