DEV Community

Cover image for How to survive as a developer without AI code-gen and without autocomplete
Viktor
Viktor

Posted on

How to survive as a developer without AI code-gen and without autocomplete

How to survive as a developer without AI code-gen and without autocomplete

Hey everyone! My name is Viktor, I'm a senior backend dev at an online store, working on the logistics team.

Today I'll tell you how I survive as a developer surrounded by vibe-coders, code-writing agents, and a legacy project. I'll also talk about how I use LLMs and my experience with neural nets, etc

TL;DR

  • I've been using AI tools since 2019 (tabnine → cursor → newer stuff), but consciously chose to write code by hand. For self-development, deeper project understanding, etc
  • For a 13-year-old legacy project, AI code-gen hallucinates and breaks conventions. AI for review is a different story — works great
  • I built 1 orchestrator + 7 specialized subagents (logic, style, security, perf-db, ops, ai-smell, tests) running in parallel on the diff vs master
  • The AI-smell agent specifically catches vibe-coded PRs from teammates AND my own tired-eyes mistakes — both end up in the same noise
  • All of it is just .claude/agents/*.md markdown files — copy-pasteable to any project. Setup details + gotchas at the bottom

Why I don't write code with AI

Real quick — I'm not someone who just opened AI tools yesterday. Since 2019 I've been using tabnine, then moved to cursor, then looked at newer stuff. So my choice to write code by hand — it's not from being clueless, it's a conscious one, for self-development, deeper project understanding, etc.

Main problem — it's a project that's over 13 years old: code is written inconsistently, functions, classes and variables are used not as intended, other logic is wired into them, or they just sit around as dead weight tagged TODO or FIXME. Because of all this combined with a huge codebase, right now we're ripping out domain logic, but for now we live in this world :). When working with such a project, AI models hallucinate, generate inconsistent code, and often just get it wrong.

Also important to mention — devs in our org are tightly bound to their domain, so part of the analytical work, onboarding managers and analysts, writing docs — all of it requires really deep understanding of the project's code. This is what lets you efficiently research tasks, work with incidents, and predict timelines for new tasks and requirements.

What I use AI for

The main thing I use for working with code — that's code quality tools: ruff, mypy, writing tests via pytest, and now also AI for detailed review on the project. I built a set of agents — 1 orchestrator and 7 subagents — working in different directions: security check, algorithm logic check, code style check, autotests check, code organization and bloat check, devops check, and DB check. I'll briefly describe each one.

Here's what the final report from the orchestrator looks like (anonymized example, real findings from one PR):

ID Sev Aspect File:line Description Fix
B1 BLOCKER Logic payments/insurance.py:355 MIN_INSURANCE_COST = 1 set in main currency, but provider works in cents → insurance is 100× lower than expected Set to 100, comment # in cents
B2 BLOCKER Security scripts/executor.py:57 exec() with full __builtins__, only protected by IS_PRODUCTION flag — any prod misconfig = full RCE RestrictedPython or subprocess + seccomp
B3 BLOCKER Migration orders/migrations/0021_add_entity.py AddField(entity, FK, default=None) without null=True — will crash on a non-empty table at deploy Two-step: null=True + backfill in a separate migration
H1 HIGH Logic/Types feature/config.py:130 list[int] for whitelist, but provider_item.id is strstr in list[int] always False → feature is fully dead Replace with list[str]
H2 HIGH Logic/Ops feature/service.py:102 Fail-open on empty whitelist: all(x in [] for x in set()) is True → gate inverted, releases everything when supposed to release nothing Add if whitelist and not providers: return False
H3 HIGH Perf-DB feature/service.py:102 @cached_property reads two OneToOne (not in select_related) + ORM fallback → up to 3 SQL per item, called twice per order in serializer (hot path GET /orders/) select_related('outlet', 'delivery_info', 'ppo_info')
H4 HIGH Security payments/processor.py:172 Full payload (name, phone, email, items) logged on every request — PII leak into log storage Mask PII or log only payment_id + amount + status
H5 HIGH AI-smell utils/helpers.py:42 from utils.misc import flatten — module/symbol does not exist in project, hallucinated import list(itertools.chain.from_iterable(items))
M1 MEDIUM Tests feature/tests/test_x.py:260 Magic number expected_insurance_sum = 1 — should reference Conf.MIN_INSURANCE_COST so test breaks on constant change expected_insurance_sum = Conf.MIN_INSURANCE_COST
M2 MEDIUM Logging feature/service.py:101 New rejection branch is not logged — can't distinguish «provider not in whitelist» from «provider_item=None» during incidents log_info('feature.provider_not_allowed', extra={...})

Verdict: NEEDS WORK — 3 Blockers must be fixed before merge.

Each finding has author + sha (from git blame) and a Verify by: command, but I've trimmed those columns so the table fits. Now to the agents themselves.

Style agent

This agent looks at what a regular linter doesn't catch. Ruff already flags unused imports, line length, quotes, and PEP8 naming — that's routine, automation handles it just fine. So in the prompt for the style agent I write directly: "don't duplicate ruff". Its job is semantics. For example, function name process_data says nothing to the reader, but apply_discount does. Or a function called get_user that also updates the record inside — that's already a trap. Or a boolean flag named flag instead of is_bulk or should_retry. Or a variable without units in the name where it actually matters — price instead of price_kopecks, timeout instead of timeout_ms.

Besides naming, the agent looks at signatures (5+ positional args, mutable default), duplicated logic, functions over 40 lines, and just inconsistency with how the rest of the project is built. And separately — systemic patterns: if the same mistake appears in 5 places, it's not 5 findings, it's one with the note "this is what's all over your diff".

Logic agent

The most important agent in the "don't deploy a bug" category. Focus — runtime correctness: types, algorithms, async, datetime/Decimal, error handling. Specific patterns that come up regularly in any project:

  • Truthy traps: value or default triggers on 0, "", [], Decimal('0.00') — but you were waiting for "if None"
  • Naive vs aware datetime, comparing datetime in different timezones
  • Decimal rounding in the wrong place, or mixing Decimal + float
  • Missed await, blocking requests.get inside async-view, sequential await where gather is begging
  • except Exception: pass — silent error swallowing
  • Off-by-one, inverted if, changing function semantics without migrating tests

Each finding must contain a line, a concrete scenario, and Verify by: — i.e. a command or test you can verify it with. Without that, the agent is forbidden to write "maybe there's a bug here" — otherwise it starts hallucinating and making noise.

Security agent

Standard OWASP plus a couple of modern things. From the classics — SQL/Command/Template injections, secrets in code or logs, SSRF (external HTTP without allowlist and timeouts), path traversal on open(user_input), unsafe deserialization (pickle, yaml without SafeLoader, eval), missed @login_required/permission_classes, IDOR.

From the less obvious but very painful:

  • DB-Lock-DoS on migrations — separate category. ALTER TABLE ADD COLUMN with volatile DEFAULT (now(), gen_random_uuid()), CREATE INDEX without CONCURRENTLY, SET NOT NULL without two-step schema (NOT VALID + VALIDATE), DDL without lock_timeout. It's not a "vulnerability" in the classic sense, but takes down prod just as well as SQL injection.
  • Supply-chain — typosquatting in new dependencies (reqeusts vs requests), pins on pre-release versions, downgrade of a library with a security patch.
  • LLM-specific — if the code builds prompts for a neural net, we check prompt injection (user input straight into prompt), secret leaks via logged prompts, using LLM output as SQL/shell without validation.

Important detail: the agent checks reachability. If a dangerous pattern sits in test code or a dev script that's not reachable from outside — it's either Low or not a finding at all. Otherwise you get a flood of false positives and people stop using it.

Performance and DB agent

Everything that shows up under load. Algorithmic complexity (O(n²) where set for lookup is begging), memory (loading a full dataset into RAM instead of streaming), async (blocking I/O in async, no timeouts and retry, unlimited gather on a thousand tasks), but the main thing — DB and ORM.

N+1 — top-1 problem of legacy projects. Query in a loop, no select_related/prefetch_related, SerializerMethodField without prefetch. Then — query quality: WHERE LOWER(email) = ... kills the index, LIKE '%pattern' too, JSONB filters without GIN-index, COUNT(*) on a big table (EXISTS is faster), SELECT * where .only() would do. Transactions — long transactions with external I/O inside, SELECT FOR UPDATE without SKIP LOCKED. And cache — cache key without user/tenant (data leak between users), no TTL, cache stampede.

Each finding contains Hot path: Yes/No/Unknown — because N+1 in an endpoint pulled once a month and N+1 in hot-path are different priorities. If the agent couldn't tell — it writes Unknown, doesn't guess.

Ops and compatibility agent

What will break the deploy or monitoring, or breaks backward compatibility. New required env without default — breaks pod startup. NOT NULL without DEFAULT in migration — breaks zero-downtime deploy. Dropping a column without two-step deploy — old pods crash. Changing Celery task signature without fan-out migration — old workers start failing. Changing Kafka message schema — consumers on the old schema break.

Separately — observability. Important new logic without logs, wrong level (debug for critical, error for normal branches), no metrics and span on a new external call, no structured logging where the project already uses it. And feature flags — is there a way to turn the feature off without a release, is there a rollback plan.

In each finding the agent writes Deploy risk: (what and when will break — at deploy / in prod / over time) and Rollback: (is there a rollback plan). This turns the review into a checklist for the deploy conversation, instead of an abstract "maybe something will happen".

AI-smell agent

The most "socially useful" agent in 2026, when a significant part of PRs are AI-assisted. I use this review to run other devs' code through too — it might be vibe-coded partially or fully, and without this filter a lot of garbage lands in master. Plus, it catches my own tired-eyes mistakes: when you sit on a task all day, things like try/except: pass, a forgotten print, an extra helper or a commented-out block "I'll delete it later" — you just stop noticing them. The agent is a fresh look.

Its job is to bring the code back to the minimum-needed solution. What it catches:

  • Hallucinations — import of a non-existent module/function, calling a method with a non-existent name, reference to a missing attribute, settings.XXXXX for a key not in config. Verified via grep on the project.
  • Over-engineeringclass Helper/Manager/Factory for a one-function task, ABC/Protocol with a single implementation, generic with one type-param, decorator for one-off use, DI where an import is enough.
  • Defensive noiseif x is not None right after x = get_x() returning X, try/except: pass for no reason, isinstance on internal data, validating already-validated-by-serializer inputs.
  • Noisy docstrings — literal repetition of the signature, multiline docstrings on a one-line function, # increment counter before counter += 1.
  • Dead code from refactors — private function with no callers, commented-out blocks "just in case", a function param no longer used inside, an if/else with the same code in both branches.
  • Duplication — new helper that already exists (found via grep), reinvented flatten/chunked/lru_cache from stdlib.
  • AI-tells — overly "schoolbook" naming (calculate_total_price_with_discount when the project uses calc_total), emoji in logs where there's none anywhere else, "Successfully completed..." logs, overly polite error messages.

Main rule of the agent: "would removing this make the code worse? If no — it's noise."

Test agent

Optional — only works if docker-stack is up. Never runs the full suite (that's hours), instead does three things: finds tests relevant to the diff, runs them in parallel, calculates differential coverage on changed lines.

Relevance is calculated four ways at once: by directory structure (app/module.pyapp/tests/test_module.py), by imports (grep for from .module import), by function/class name in tests (grep for the symbol), and the test files in the diff itself. Then — dedup, limit to 20 files, and pytest --lf -n auto -x (last-failed first, parallel by cores, stop on first failure).

Differential coverage — the most useful part. We take coverage json, compare executed_lines against changed lines from diff, calculate the percentage. If less than 50% — High. This is way more honest than overall project coverage: it might be 80%, but the actual new code has zero tests.

If docker isn't available — status SKIPPED, the agent still returns a list of tests for manual run. It's not an error, it's a normal mode.

Orchestrator

The conductor that ties everything together. Here there's an architectural limit of Claude Code SDK: a subagent can't spawn other subagents. So the orchestrator isn't a separate subagent, it's a playbook the main agent reads in its main context and follows itself.

Five steps:

  1. Diff collectiongit diff --merge-base origin/master HEAD, stats, list of files, commits. Saved to /tmp.
  2. Pre-analysis — deterministic checks before subagents: ruff with auto-fix, search for migrations and dependencies in diff, regex-scan for hardcoded secrets. This reduces LLM load and hallucinations — concrete bugs are caught by regex, agents handle what regex can't catch.
  3. Blame — for each added line we run git blame -L, build file:line → author + sha. This block then goes into the final table — next to each finding you see whose code it is. Very useful for the PR-review conversation.
  4. Parallel dispatch — main agent in one message launches all 6 (or 7 with tests) subagents through the Agent tool. This is critical — only this way they run in parallel, not sequentially.
  5. Report synthesis — parse agents' responses, dedupe by key (file, line, category) (same file+lines in multiple agents = one row in the table, "Aspect" column has all agents through slash), prioritize Blocker → High → Medium → Low, format the final markdown table with verdict READY TO MERGE / NEEDS ATTENTION / NEEDS WORK.

Implementation gotchas I tripped on:

  • Artifacts go into the prompt as text, not as a /tmp path. Subagents kept hanging on cat /tmp/review_diff.patch — turns out, in some runtimes their /tmp and the main agent's /tmp are different. Solution: paste the diff straight into the prompt between === DIFF === ... === END DIFF ===. ~20 KB limit per artifact.
  • Short-circuit for trivial diffs — under 20 lines, only .md/docs, no .py/.sql/migrations → skip the parallel run, return minimal report. Saves tokens.
  • Relative paths only (server/path/file.py:LINE, not /Users/...) — IDE links don't click otherwise, and sharing with colleagues breaks.
  • Confidence per finding (High/Med/Low). Low = agent isn't sure, check by hand. More honest than making it stay silent on suspicions.
  • Verify by: per finding — a concrete command. Stops the agent from writing abstract "maybe there's a problem here".
  • Dedup by (file, line_range, category) — otherwise one bug gets reported 3× by three agents and the review becomes unreadable.

How to integrate agents at your place

All agents live in .claude/agents/*.md, skills — in .claude/commands/*.md. These are just markdown files with frontmatter (name, description, model, tools) and instructions in the body. Claude Code picks them up automatically. Global install — same thing but in ~/.claude/agents/ and ~/.claude/commands/.

Minimal subagent template:

---
name: my-reviewer
description: Brief description for the main agent — when to call
model: sonnet
tools: [Read, Grep, Glob, Bash]
---

# Agent prompt

What to do, what to check, response format.
Enter fullscreen mode Exit fullscreen mode

Practical tips from getting burned:

  • Tool names — PascalCase (Read, Bash, Grep, Glob, Agent). read_file/bash doesn't work.
  • Model IDs — short: sonnet / opus / haiku / inherit. For the orchestrator inherit is best — it runs on the model of the current session.
  • Don't give subagents the Agent tool — they can't spawn other subagents, and if they try, the runtime crashes with an unclear error.
  • Strict output format — each agent returns a structured block (=== AGENT: name === ... === END: name ===) with FINDINGS containing ID, Severity, Confidence, Location, Fix, Verify by. Free-form output → synthesis becomes mush.
  • Severity ≠ Confidence — Severity is how scary (Blocker/High/Med/Low), Confidence is how sure the agent is. Two axes, not one.
  • Prompts in your team's language, terms in English — if your team isn't English-speaking, agents read more naturally this way.
  • Pre-analysis is deterministic — linter, regex, blame run before the LLM. Cheaper for the agent to check pre-filtered material than to search from scratch.

Skills (slash-commands) — even simpler. Markdown describing what to do with the user input from $ARGUMENTS. Useful home-skills I've made:

  • /diff — branch changes summary vs master (files, commits, affected functions, migrations). Handy before a PR.
  • /lint — wrapper over make format (ruff + auto-fix).
  • /find-usage <name> — search all usages of a symbol (definition → imports → calls → tests → templates).
  • /debug <traceback> — traceback analysis: last frame, reading source at the right line, fix suggestion.
  • /logs [search] [--lines=N] [--level=ERROR] — viewing structured logs inside docker-container.
  • /orm <expr|sql> — running Django ORM or raw SQL inside docker (with auto-LIMIT, confirmation request on DELETE/UPDATE).
  • /new-test <path> [name] — generating a pytest-file by project conventions with a run via docker.
  • /make <target> [args] — proxy to Makefile with categorized reference.

The main thing about skills — they should be thin. Not "universal devops agent", but "wrapper over one command with a couple of if-else's". The less freedom a skill has, the more predictable the result.

Bonus step

Besides code review agents, there's an option to ask the bots to highlight a developer's problems at the level of language understanding, etc — code that one writes can pass all checks but not be ideal (no such thing). Also the bot can suggest what things you can improve, watch the dynamics of your growth and give advice.

How it works

Step by step:

  1. Write code in a branch
  2. Run agents on the diff of feature branch and main branch
  3. Look at the review, study, fix or leave as is
  4. Get fewer bugs and better code, without losing the developer's project context

Wrapping up

Already there's a problem that code-writing by agents is pretty expensive + agents can't work with legacy code + project devs and analysts are important for its growth. New approaches are starting to appear that have the potential to solve these problems, but we're still needed in this world and we should do it as efficiently as possible to stay afloat longer and grow faster than new AI models :)


Main thing to remember — Bender is our brother, and the one who doesn't develop, degrades!

Top comments (0)