I use an AI coding agent for almost everything on my job board GlobalRemote. It writes my scrapers, builds my CI pipelines, architects my database schemas. It's written the vast majority of the codebase.
After a few months of building this way, I've noticed a pattern: the most valuable thing I do isn't writing code. It's catching where the AI gets it wrong — specifically the cases where the output looks correct but doesn't hold up once you think about it.
Here are three recent examples.
1. The Wrong Tool for the Job
My pipeline extracts tech stack requirements from job postings using regex. A role showed up on the board with no tech stack listed. The AI investigated, found the regex wasn't matching that posting's format, and proposed expanding the regex pattern.
Fair enough. But we already had LLMs classifying and extracting other fields from these same job descriptions. Why maintain a brittle regex when we could use the LLM we're already paying for?
The agent agreed and built the LLM-based extraction instead. More resilient, handles edge cases the regex never would have caught.
The AI optimized within the current approach. I questioned whether the approach itself was right. That's a pattern I keep seeing — AI agents are excellent at solving the problem you give them, but they don't question whether you're solving the right problem. That's still on you.
2. Technically Correct, Actually Misleading
My pipeline extracted geographic data from a GitLab job posting — a role open in the US, Canada, France, Germany, Ireland, Netherlands, Spain, and the UK — and tagged it as multi-region with regions Americas and Europe. I asked the agent to verify. It confirmed the data was accurate — the posting listed countries across both regions.
The problem: if a user from Brazil sees "Americas", they'll assume they can apply. Someone in Hungary sees "Europe", same thing. But this job is only open in 8 specific countries.
The agent hadn't considered this. It checked my existing data, found I already had a select-countries badge for this situation, updated the job, and then updated the LLM extraction prompt so the system would get this distinction right on future runs.
I caught this because I've been the person in a non-obvious country getting excluded from roles that say "Americas" or "Global Remote." I've had Zapier, Outliant, and others reject me on location after their postings implied I was eligible.
3. The Silent Failure
My pipeline ran on schedule. Scraped 39 jobs. Processed them. Reported: "No new entries to add." No errors, clean exit.
Zero new jobs from 39 listings didn't seem right. I pulled the raw data and asked the agent to audit its own pipeline's decisions.
It found two bugs. One was a dedup rule incorrectly matching a new job against a discontinued listing with a similar title — different posting, different job ID, valid salary data, silently dropped. The other was a salary field that the pipeline never parsed, so jobs with visible salary data were being dropped for "no salary transparency."
The pipeline didn't error or warn. It reported success while quietly dropping valid jobs.
I didn't catch this by reading code. I caught it because the output didn't pass a gut check.
Why This Matters
Ben Shoemaker wrote a piece recently arguing that engineers should stop reading code line-by-line and invest in the "harness" — specs, tests, verification layers, trust boundaries. OpenAI calls this Harness Engineering.
Looking at these three examples through that lens, that's what I've been doing without realizing it. The AI handles production. I handle specification, trust boundaries, and the "does this actually make sense for my users?" layer.
If you're an engineer building with AI tools right now, I'd suggest paying attention to the moments where you override the AI's suggestions. Those moments aren't interruptions to your workflow — they're the most valuable part of it. That's the skill set the market is shifting toward, and it's worth documenting for yourself even if you never publish it.
I'm a Senior Software Engineer with over a decade of experience, including building internationalization systems serving 50M+ users. I write about building with AI at blog.alleyne.dev.
Top comments (0)