DEV Community: Raffaele Pizzari

The Engineering Manager Is the Most Informed Person in the AI Room

Raffaele Pizzari — Sun, 31 May 2026 21:17:47 +0000

Engineering managers are almost entirely absent from the AI transformation discourse. There's a structural reason for that, and understanding it is the first step to doing something about it.

Engineers write on the internet. C-suite decisions make headlines. Engineering managers absorb pressure from above, complexity from below, and produce outcomes that get credited in both directions. The system doesn't reward the EM voice publicly. But the EM position gives you something that's genuinely hard to replicate: accountability for what happens to the team, combined with proximity to all three layers of the problem at once.

That's not a consolation prize. It's a specific kind of leverage, if you decide to use it deliberately.

You're accountable for what nobody else fully sees

Writers go where the audience is or where the authority sits. EMs are neither, which is why the playbooks keep missing them. Executives get advice that assumes frictionless implementation. Engineers get advice that assumes organizational stability. At the team level, neither holds.

The EM isn't the only person with this view. A good Staff or Principal Engineer often has comparable exposure: technical depth, some business context, real influence on architecture decisions. In many organizations, the senior IC has more technical credibility than the EM and less organizational noise to cut through.

The difference isn't the view. It's the accountability. When something goes wrong at the team level (delivery slips, quality degrades, an engineer burns out, AI adoption produces incidents instead of velocity), the EM is the one who carries it. That asymmetry is uncomfortable. It's also what makes the EM's perspective structurally different from everyone else's. You don't just see the intersection where the playbooks break down. You're responsible for what happens there.

The question isn't whether that position is valuable. It is. The question is whether you're using it actively or just absorbing it quietly.

The numbers give you something to work with, if you translate them

49% of tech employers expect to use AI to reduce headcount by 2029 ¹. 87% of developers don't view AI as an immediate threat to their employment ¹.

Leadership and the team are operating in different realities. That gap lands on you. But before we get to what to do with it, there's a harder truth worth naming: if the executive mandate really is to cut headcount, no engineering metric will stop it. The EM who walks into that conversation armed with first-pass acceptance rates and post-merge rework data will be read, correctly, as a middle manager defending their headcount.

So the first move isn't metrics. It's understanding what conversation you're actually in.

If the goal is genuinely better AI adoption, not just faster adoption, then the data is useful. What the productivity numbers leave out is that only a third of developers report significant gains from AI investments ². The actual bottlenecks (knowledge fragmentation, legacy systems, undocumented architecture, the particular flavor of technical debt in every mature codebase) don't respond to code generation. Research shows a 25% increase in AI adoption is associated with a 1.5% decrease in delivery throughput and a 7.2% decrease in system stability ³, largely because faster generation produces pull requests that are larger, harder to review, and more likely to break things downstream.

That data is useful in the right framing. Not "we need to slow down AI adoption" (that's a losing argument). But: "here's what adoption without governance is costing us in incident rate and rework, and here's what it would cost to fix it." Incidents have a price. Rework has a price. Attrition (which follows when engineers feel responsible for code they didn't fully understand) has a price. Those are numbers a C-suite can act on.

Martin Fowler's work on AI-assisted development points at a useful metric shift: instead of measuring time to first output (what leadership tends to track), measure first-pass acceptance rate, iteration cycles per task, and post-merge rework ⁴. The value of those numbers isn't in presenting them as engineering KPIs. It's in translating them: "every 10% increase in post-merge rework costs us roughly X engineer-weeks per quarter." That's a margin conversation, not a methodology conversation.

Make the invisible track visible, in business terms

The Harness State of Software Delivery 2025 documents a pattern that's become structural ⁵. Developers who use AI to rapidly ship features get noticed by non-technical leadership. The same prompting loop that generates fragile code can generate fast hotfixes for the incidents that code creates. From far enough away, that looks like responsiveness.

The engineer who spent three weeks preventing five future incidents has nothing to show for it.

This isn't a new problem. Engineering organizations have always struggled to see the person who quietly keeps things from breaking. AI has made the visible track faster and cheaper, which widens the gap. The EM sees both tracks. The intervention isn't explaining this to leadership in engineering terms: it's translating it into risk.

The practical move: attach a cost to the invisible work. Not in sprint points or engineering hours, but in incidents avoided, in on-call load, in customer impact. "This refactor eliminated the class of failures that caused three P1s last year" is a sentence leadership can evaluate. It's not a guarantee they'll fund the next one. But it's the only version of the argument that has a chance. Prevention work that gets described in engineering terms stays invisible. Prevention work described in business terms at least enters the conversation.

Help your engineers through the identity shift, without going backwards

A 2025 PNAS study: developers who use AI receive competence ratings 9% lower for identical output, with the penalty worse for women ⁶.

The stigma shapes behavior before it shows up in surveys. Engineers use the tools privately, skip mentioning it in reviews, don't raise it in 1:1s. A longitudinal study of GitHub developers found that AI adoption reduced peer collaboration by nearly 80% ⁷, not because people stopped caring, but because the stigma quietly restructured how work gets done.

The practical consequence: knowledge stops moving the way it used to. For senior engineers, working alone with AI can feel like focus. For junior developers, it's the disappearance of the informal learning channel: the questions in the flow of work, the review comments that actually taught something, the proximity where judgment develops without anyone formally teaching it.

The temptation is to reconstruct 2019. Structured sessions with AI turned off, pair programming as it used to be. It's understandable and it won't work. You can't un-generate the tools, and pretending you can produces resentment, not learning.

The real problem isn't that junior developers are writing less code. It's that they're not developing the judgment to evaluate the code that gets generated. That's a different skill, and it requires a different training environment. Not blackboards, but deliberate exercises that assume generation and stress-test evaluation: why is this design wrong?, what does this code not handle?, where would this break at scale? Threat modeling, architecture review, debugging sessions that start from broken AI output. The goal isn't less AI. It's building the judgment that makes AI useful instead of dangerous.

Make AI use discussable in your team. Run a session where people share how they actually use it, what works, what doesn't. The stigma dissolves when the conversation is normalized. And when it's normalized, you get the signal early when something isn't working, before it surfaces as a performance issue or a production incident.

Own the open questions, and admit what you don't know

The discourse treats EMs as stable context. We're not.

How do you assess a PR when significant portions weren't written by a human? How do you calibrate expectations when the productivity baseline is moving? What does it mean to evaluate engineering judgment when the judgment being exercised is mostly about directing and validating an agent? How do you interview for technical competence when the candidate can solve any algorithm in real time with the right prompt?

These questions don't have industry consensus yet. But here's the honest version: they don't have good answers yet, and that includes the EMs who are closest to them. We're all improvising. The teams that will come out of this well won't be the ones who found the right framework first. They'll be the ones whose EMs were honest about the uncertainty, made their working assumptions explicit, and updated them as they learned, instead of projecting confidence they didn't have.

Add the budget question to the list. Individual developer AI usage can spike to thousands of dollars per month in agentic workflows without governance ⁸. Tokenmaxxing (high-volume consumption driven by visibility rather than output) became widespread enough that Meta built public leaderboards, awarded "Session Immortal" titles to top spenders, and abolished the system after it contributed to production incidents. A simple policy built now (approved tools, token expectations, review standards for AI-generated code) is infinitely better than no policy. You don't need a perfect framework. You need a starting point your team can reason against.

On governance more broadly: Fowler describes AI assistants as "junior developers with infinite energy but zero context." ⁴ The frustration loop most teams hit (generate code, review it, find it doesn't fit the codebase, regenerate with corrections, repeat) happens because the AI defaults to generic internet patterns instead of your team's standards. The fix is encoding team knowledge explicitly: architecture decisions documented, conventions written down, review standards articulated.

That's real and valuable. It's also a lot of work that competes directly with performance cycles, hiring, alignment meetings, and the production incidents that AI is simultaneously generating more of. There's no clean answer here. The realistic version isn't "map everything." It's "pick one standard, document it, use it as AI context, and treat every documented convention as a compounding investment." It never feels urgent enough. It always matters.

The job nobody prepared you for

This is a harder version of engineering management than the one most people signed up for. Closing the gap between what leadership believes about AI productivity and what's actually happening at the team level, holding space for engineers going through a real identity shift, building governance structures that didn't exist before, navigating your own transition at the same time, and doing all of it without a playbook, because the playbook doesn't exist yet.

That's not a complaint. It's a more interesting job than the previous version. And it's worth being honest about what it actually is, instead of either pretending it's manageable or waiting for someone to acknowledge how hard it is.

Nobody will. That's not how this works.

Invisible work doesn't get resourced unless you name it in terms that move decisions. The view from the middle of this transition is the most complete one in the organization. Use it. And be honest about what you don't know yet.

Originally published on pixari.dev

"Why developers and their bosses disagree over generative AI", LeadDev. https://leaddev.com/ai/why-developers-and-their-bosses-disagree-over-generative-ai ↩
Atlassian State of Developer Experience 2025. https://www.atlassian.com/teams/software-development/state-of-developer-experience-2025 ↩
Google Cloud / DORA, "Announcing the 2024 DORA Report." https://cloud.google.com/blog/products/devops-sre/announcing-the-2024-dora-report. 2025 data shows throughput improved with stronger practices; stability remains negatively correlated. https://cloud.google.com/blog/products/ai-machine-learning/announcing-the-2025-dora-report ↩
Martin Fowler, "Patterns for Reducing Friction in AI-Assisted Development." https://martinfowler.com/articles/reduce-friction-ai/ ↩
Harness State of Software Delivery 2025. https://www.harness.io/state-of-software-delivery ↩
Reif, Larrick & Soll, "Evidence of a social evaluation penalty for using AI," PNAS, May 2025. https://www.pnas.org/doi/10.1073/pnas.2426766122 ↩
"The Impact of Generative AI on Collaborative Open-Source Software Development: Evidence from GitHub Copilot," arXiv:2410.02091. https://arxiv.org/pdf/2410.02091 ↩
Enterprise AI spending patterns, industry sources (2025/2026). Meta leaderboard detail cited in secondary research; primary source not independently verified. ↩

AI-Assisted Product Engineering: Orchestrating Claude Code Across the Software Development Lifecycle

Raffaele Pizzari — Mon, 04 May 2026 23:39:10 +0000

Most LLM coding tools live inside a single editor session. They suggest, complete, and refactor inside one file at a time. That is useful, but it is not where real product engineering happens.

Real engineering spans ticket breakdown, cross-repository implementation, code review, merge request management, and the knowledge that has to survive between sessions. None of that fits in one tool window.

I built a system that orchestrates Claude Code across that full lifecycle. It has been running daily for months. This post describes how it works, why it is structured the way it is, and what I have learned from the parts that broke.

The core thesis is one sentence.

The right unit of agent invocation is the judgment step, not the workflow.

Mechanical steps, the API calls, the test runs, the git operations, do not need an LLM. They need deterministic code. The agent should be invoked only when something genuinely requires judgment: writing the code, evaluating a review finding, choosing between two architectural options. Conflating these two categories is the most expensive mistake I see in agent systems.

The architecture

A terminological note before going further. Claude Code is not a raw API. It is an agent runtime: an LLM with tool use (file reads, shell commands), file system access, and a multi-turn loop. When the orchestrator "hands off to Claude Code", it is not a single API call. It is transferring control to an autonomous process that may read dozens of files, run commands, and iterate before returning. I will use "the agent" or "Claude Code" for what the system invokes, and "LLM" only when discussing the underlying model's behavior.

Three principles guide the design.

Python orchestrates, the agent reasons. Every workflow is split into phases. Phases that involve API calls, file operations, test execution, or data transformation are deterministic Python scripts. Claude Code is invoked only when the task requires judgment. This separation reduces token consumption, improves latency (mechanical phases complete in under two seconds), and makes the system auditable.

Propose, do not execute. The system never performs irreversible external actions (merging code, closing tickets, sending messages) without explicit human approval. It creates structured proposals that surface in a dashboard for review. This makes the system safe to leave running unattended.

Compound knowledge, do not re-derive it. Engineering context (architectural decisions, team ownership, ticket history) is captured in a persistent wiki and an operational database. Each session starts with this accumulated context rather than re-deriving it from scratch.

The six layers

┌─────────────────────────────────────────────────────────┐
│  1. User          CLI + Dashboard                       │
├─────────────────────────────────────────────────────────┤
│  2. Skill         Command → orchestrator routing        │
├─────────────────────────────────────────────────────────┤
│  3. Orchestrator  Python, phased, JSON I/O              │
├─────────────────────────────────────────────────────────┤
│  4. Agent         Claude Code + specialized subagents   │
├─────────────────────────────────────────────────────────┤
│  5. Data          SQLite + Markdown wiki + ChromaDB     │
├─────────────────────────────────────────────────────────┤
│  6. External      Jira, GitLab, Confluence, K8s         │
└─────────────────────────────────────────────────────────┘

Layers 1 to 3 are deterministic. Layer 4 is where Claude Code operates. Layers 5 and 6 are stateful backends. The skill layer maps user commands to orchestrators via a YAML manifest, so the system's capabilities are explicit. Specialized agents (code review, knowledge synthesis, planning) run in isolated context windows with explicitly scoped tool permissions. The code review agent, for instance, cannot edit files.

When a skill needs an orchestrator

Not every skill needs the full structure. The deciding factor is side effects.

Orchestrated skills have multi-step workflows with external side effects: ticket implementation, MR creation, CI analysis, code review remediation. They need deterministic coordination (create branch, run tests, push code) interleaved with agent judgment.

Agent-native skills are single-turn reasoning tasks: debugging a service issue, classifying an unknown input, generating a standup summary. The agent reads context and produces an output. There is nothing mechanical worth extracting.

If a skill creates branches, runs tests, calls external APIs, or modifies shared state, it gets an orchestrator. If it only reads and reasons, the agent handles it directly. Adding an orchestrator has a real cost: more code to maintain, more failure modes, more surface area to test. It is justified only when the mechanical steps are complex enough that the agent would be unreliable executing them.

A ticket from start to finish

To make this concrete, here is the lifecycle of a single ticket implementation.

                    ┌──────────────────────┐
                    │    User: /ticket      │
                    │    <ticket-id>        │
                    └──────────┬───────────┘
                               │
              ┌────────────────▼────────────────┐
              │  Phase 1: Context Assembly       │
              │  (Python orchestrator)           │
              │                                  │
              │  • Fetch Jira ticket             │
              │  • Search wiki for decisions     │
              │  • Create worktree + branch      │
              │  • Extract implementation brief  │
              │  • Return JSON bundle            │
              └────────────────┬────────────────┘
                               │
              ┌────────────────▼────────────────┐
              │  Phase 2: Implementation         │
              │  (Claude Code)                   │
              │                                  │
              │  • Read brief + standards        │
              │  • Write / modify code           │
              └────────────────┬────────────────┘
                               │
              ┌────────────────▼────────────────┐
              │  Phase 3: Validation             │
              │  (Orchestrator + Review Agent)   │
              │                                  │
              │  • Run tests, lint, format       │
              │  • If fail → back to agent (3x)  │
              │  • Dispatch code review agent    │
              │  • If blockers → back to agent   │
              └────────────────┬────────────────┘
                               │
              ┌────────────────▼────────────────┐
              │  Phase 4: Proposal + Ship        │
              │  (Orchestrator → Human → Orch.)  │
              │                                  │
              │  • Create exchange proposal      │
              │  • ── HUMAN DECISION POINT ──    │
              │  • On approve: push + create MR  │
              │  • Log to activity trail         │
              └────────────────────────────────┘

Claude Code is invoked only in Phase 2 and during fix iterations in Phase 3. Everything else is deterministic Python.

Before and after

The first version of the system did not look like this. The agent orchestrated everything. It read 150 to 200 line configuration files, made API calls through tool use, managed git operations, and tracked its own progress.

That version had three problems.

Latency. A complete ticket workflow took several minutes, dominated by the agent parsing configuration and deciding which API call to make next.

Token consumption. The agent's context window filled with mechanical details (API responses, git output, test logs) that displaced the actual implementation context.

Brittleness. The agent would skip steps, hallucinate API parameters, or lose track of which phase it was in. These failures were non-deterministic and hard to reproduce.

After moving mechanical steps to Python orchestrators, Claude Code receives a 30 to 50 line context brief instead of navigating 200 lines of configuration. Workflow latency dropped by roughly an order of magnitude. Mechanical phases now complete in under two seconds. Failures produce deterministic error messages instead of vague agent confusion. Token consumption dropped substantially, because the agent no longer processes responses it only needs to pass through.

A second-order benefit is testability. Python orchestrators can be unit-tested with mock data, so I can verify the mechanical pipeline independently of the agent. That is not possible when the agent is the orchestrator.

Separation pays off immediately. It is the single most impactful design decision in the system.

Memory and observability

A system that acts on your behalf needs two things: memory, so it does not re-derive context every session, and transparency, so you can trust what it is doing. These are deeply intertwined.

The semantic wiki

Long-term memory is a collection of Markdown pages organized by category (features, tickets, teams, decisions, architectural concepts). Each page follows a structured template with metadata, cross-references, confidence tiers, and a changelog.

A specialized knowledge agent creates and maintains the pages, synthesizing information from Jira, Confluence, GitLab, and prior conversations. The wiki distinguishes between three kinds of facts:

Verified facts: directly cited from an authoritative source with a reference ID.
Inferred facts: synthesized from patterns across multiple sources.
Human-provided facts: explicitly stated by a user in an exchange response.

This provenance tracking matters more than I expected. It prevents the most common failure mode of LLM-driven knowledge bases: the model fabricates context, the system stores it, and a week later that fabrication is being cited as truth.

Wiki pages have field-level staleness thresholds. Team ownership becomes stale after 14 days. Architectural decisions remain fresh for 90 days. Ticket status is never cached, because it changes too often. When a stale page is queried, the knowledge agent silently re-ingests it before using it.

After sustained use, the wiki has become one of the most valuable parts of the system. It contains synthesized knowledge about ownership, decisions, and cross-repository dependencies that would take hours to reconstruct from scratch. The confidence tiers are essential. Without them, agents treat inferred knowledge as if it were verified, and you compound hallucinations into authoritative-looking documentation.

The operational database

Short-term state lives in SQLite and tracks four things:

Work items: tickets, MRs, and plans with current status, CI state, and cross-repo dependencies.
Exchange items: structured proposals from agents to humans (more on these below).
To-do items: a prioritized task queue with urgency levels and ownership.
Activity log: an append-only audit trail of every external action.

This database is the substrate for the dashboard and the heartbeat process.

The dashboard

A lightweight web dashboard, a single-file application with no external dependencies, gives real-time visibility into active work, pending proposals, the to-do queue, recent activity, knowledge health (stale pages, open questions, broken cross-references), and a heartbeat indicator.

The dashboard is also the primary approval surface for exchange items, with controls for approve, defer, and dismiss. It refreshes every five seconds.

The heartbeat indicator turned out to be unexpectedly important. Knowing that the background process is alive and polling gives me confidence that the system is aware of its environment. A stale heartbeat is an immediate signal that something needs attention.

Activity logging

Every external write is logged on success. The log captures the action type, the affected resource, the associated ticket, the skill that triggered it, and the target repository. Reads and internal state changes are not logged, which keeps the trail focused on externally visible effects.

The activity log powers the dashboard's feed, generates standup reports ("what did the system do yesterday?"), prevents duplicate work (the heartbeat checks the log before re-proposing), and gives me a forensic trail when I need to debug something unexpected.

Human-in-the-loop controls

Hard limits

Some operations are never performed without explicit human confirmation:

Merging an MR (the system creates MRs, humans merge them).
Transitioning a ticket to "Done".
Deleting branches, files, or database records.
Creating tickets in protected project spaces.
Force-pushing to protected branches.
Running database schema migrations.
Sending messages to external communication channels.

These are enforced at the agent level through explicit constraints. The code review agent, for example, has a deny-list of tools (Edit, Write) so it cannot modify code. Review and implementation are separate by construction.

Labeling

All agent-created artifacts carry explicit labels:

Tickets created by agents include an ai-generated label.
MRs created by agents include an ai-automated label.
Commit messages follow a conventional format that includes the originating ticket key.

This means human team members can identify agent-produced work at a glance during review, triage, and audit. No guessing.

The exchange protocol

The governance model rests on the exchange protocol: a structured format for agent-to-human communication that replaces ad-hoc permission checks with explicit proposals.

Each exchange item has an intent (approval, decision, question, blocker, or flag), an urgency level, a Markdown body with relevant links, and a human answer field. There is no informational intent. Every exchange item requires human action. If the system cannot ask for something, it should not be telling you about it.

Items move from open to answered (when the human responds) to done (when execution completes). If execution fails, the system retries up to three times, preserving the original approval. After three failures it escalates by creating a new blocker item. Users can defer proposals for 24 hours; deferred items re-surface when the deferral expires.

I tried building a permission model first. Define what the system can do autonomously, define what needs approval. It was fragile. The risk of an action depends on context. Pushing to a feature branch is routine. Pushing to main is dangerous. Same operation, different risk.

The proposal-approval model sidesteps this entirely. The system proposes everything and executes nothing without approval, with a small list of hard-coded exceptions (like creating a to-do for CI failure triage). Simpler, easier to reason about, more trustworthy.

It also solves the asynchrony problem. Proposals created during a heartbeat cycle, when no user session is active, are queued and presented at the next session start. Every decision has a timestamp, a human answer, and an execution outcome. The whole system is auditable.

Pre-commit safety

A hook system intercepts operations before execution. Before any commit, the system runs the linter and formatter. Commits that would introduce lint violations are blocked. This prevents the agent from introducing code quality regressions even when its generated code is syntactically correct.

Proactive behavior

The heartbeat

A background process runs every five minutes, independent of any active user session. It polls external systems for state changes and creates exchange items when it detects actionable events:

A blocked ticket becomes unblocked: propose starting implementation.
An MR receives a review comment: propose investigating.
A CI pipeline fails: create a to-do for triage.
An MR has been awaiting review for more than 24 hours: flag staleness.

The heartbeat is deliberately conservative. It proposes but never executes. Its job is to keep the system aware of the engineering environment even when nobody is actively working with it.

Session initialization

Every new session begins with a checklist:

Verify the dashboard and heartbeat are running.
Fetch backlog items created since the last session.
Scan open exchange items by urgency.
List pending to-do items.
Present a concise summary before accepting input.

Continuity matters. The system picks up where it left off, instead of starting from zero every morning.

Where it breaks

Plenty.

Hallucinated references

The LLM can hallucinate a ticket key, a file path, an API endpoint. The orchestrator validates external references before acting on them. Ticket keys are checked against Jira, branch names against git, file paths against the file system. When validation fails, the orchestrator returns a structured error rather than propagating the hallucination.

Stale knowledge acting as truth

Despite staleness management, a window exists between a real-world change and the next re-ingestion. I mitigate this by never caching fast-changing data and by marking inferred knowledge with lower confidence. Agents are instructed to treat inferred facts as context, not constraint. This is not a perfect defense. It is defense in depth.

Proposal flooding

During active development, the heartbeat can generate a high volume of exchange items. Review fatigue follows: I start approving things without reading them carefully. Urgency levels and 24-hour deferral reduce the volume, but the underlying tension between proactivity and cognitive load is real and unsolved.

Scope creep

Given an implementation brief, the agent will sometimes implement more than requested. Error handling for impossible cases. Refactoring adjacent code. Abstractions for hypothetical future requirements. I mitigate this with explicit coding standards ("don't add features beyond what was asked") and the code review agent flags scope creep as a blocker. It is still one of the most common failure modes. Constant calibration.

Operational reality

Cost

The whole system runs on a single laptop. No GPU, no dedicated server, no cloud infrastructure beyond the team tools that already exist (Jira, GitLab, Confluence). The only operational cost is Claude Code's API usage, which scales with the number of tickets processed.

Mechanical phases consume zero API tokens. The knowledge agent and heartbeat consume modest amounts during re-ingestion and polling. The bulk of consumption comes from implementation and code review, which are exactly the steps where agent reasoning is genuinely needed.

The SQLite database, the Markdown wiki, and the ChromaDB vector store all run locally. The dashboard is a single-file Node.js app. This minimalism was deliberate. Every external dependency is a maintenance burden and a failure mode.

Maintainability

Three things need ongoing maintenance.

Orchestrators. When external APIs change (a new Jira field, a GitLab API deprecation), the affected orchestrator needs updating. Plain Python with structured JSON I/O makes this straightforward to test and deploy. A few hours per month.

Standards. The coding standards file is a living document. When I notice a new failure mode (the agent over-engineers, a test pattern is fragile), I update the standards. This is not different from maintaining a team style guide, except that the primary consumer is an LLM. The standards evolve through the same proposal mechanism as everything else: the code review agent flags a recurring pattern, and it becomes a candidate for a new standard.

Wiki schema. As the engineering environment evolves, the wiki's category structure and staleness thresholds need adjustment. The schema is a single YAML file, so changes are low-risk.

What does not need maintenance: the exchange protocol, the dashboard, and the activity log. Stable across months of use. The layered architecture pays off here. Stable components (governance, observability) are decoupled from evolving ones (orchestrators, standards, wiki schema).

What breaks if you stop maintaining it

If the orchestrators fall behind external API changes, mechanical phases start failing with deterministic errors. The system degrades gracefully. The agent can still reason, but the automated context assembly stops working and you have to provide context manually. Annoying, not catastrophic.

If the standards stop evolving, the code review agent keeps enforcing stale rules. They drift from what the team actually wants. The system still works, but its output becomes increasingly misaligned with reality. Subtler failure.

If the wiki stops being maintained, it becomes unreliable. Staleness thresholds mitigate this, but if the underlying sources change in ways the schema does not anticipate, the wiki compounds outdated information. This is the most dangerous failure mode, because it is silent.

Lessons

The silent overwrite

Early in deployment, the system implemented a ticket that required modifying a shared utility function. Claude Code correctly identified the function and modified it to satisfy the new ticket's acceptance criteria. In doing so, it broke three other features that depended on the function's original behavior. The test suite caught one regression. The other two had no test coverage.

The root cause was not the agent's code quality. The modification was locally correct. The root cause was the orchestrator's context assembly. Phase 1 had provided the ticket's acceptance criteria and the target file, but not the list of callers. The agent did not know what else depended on that function.

The fix was straightforward. The orchestrator now includes a dependency analysis step that identifies all callers of modified functions and adds them to the implementation brief. The code review agent was updated to explicitly check for behavioral changes in shared code.

The broader lesson is the most useful one I have.

The agent's failure modes are usually upstream, in the context it receives, not in its reasoning.

Improving context assembly has had a larger impact on output quality than any prompt engineering I have ever done.

Proposals beat permissions

The proposal-approval model replaced a fragile permission system with a simple rule: the system proposes, the human decides. Easier to implement, easier to reason about, easier to trust. The only ongoing challenge is proposal volume during active development.

Where agents still fall short

Even with the architectural mitigations, certain limitations remain.

Cross-repository reasoning. When a feature spans multiple services, the agent struggles to maintain a coherent mental model of the full change set. Structured tracking helps, but does not solve it.
Ambiguous acceptance criteria. When ticket descriptions are vague, the agent produces reasonable but often wrong implementations. The system flags ambiguous tickets as blockers rather than guessing.
Scope creep. The agent's tendency to over-engineer requires constant calibration through standards and review.
Stale context windows. In long sessions, earlier context falls out of the underlying LLM's effective attention. Session-start re-initialization mitigates but does not eliminate this.

Bounded autonomy beats the demo

Most autonomous coding agents on the market optimize for the demo. End-to-end issue resolution. Watch the agent work. Marvel at the autonomy.

I am not interested in the demo. I am interested in Tuesday morning, when someone has to debug why a merge broke staging.

Bounded autonomy with explicit human decision points is less impressive in a screencast and far more useful in practice. The system I built is deliberately the opposite of an autonomous agent. It is a tool with a strong opinion about what humans should still do.

If I had one piece of advice for someone building something similar, it would be this.

Start with the orchestrator, not the prompt. Figure out what context the agent actually needs, assemble it mechanically, and hand it over in a clean bundle. The agent will do the rest.

The hard part is not getting the agent to reason well. It is giving it the right things to reason about.

Originally published on pixari.dev

I'm Bullish on AI-Assisted Coding. That's Exactly Why I Take the Risks Seriously.

Raffaele Pizzari — Mon, 04 May 2026 08:59:48 +0000

I use AI coding agents every day. I believe they are reshaping how we build software, and I think the teams that adopt them deliberately will outperform those that don't.

I am not writing this to warn you away from AI-assisted development.

I am writing this because the loudest voices in the AI enthusiasm camp are also the most allergic to discussing what can go wrong. And that worries me more than the risks themselves.

The productivity gains are real

Let's start with what is undeniable.

By 2024, LangChain's State of AI Agents report already showed 51% of surveyed organizations running agents in production. By 2026, that number has only grown. The global AI agent market is projected to expand from $7.8 billion to over $50 billion by 2030.

This is not a hype cycle anymore. This is infrastructure.

The case studies are equally striking.

Rakuten engineers used a CLI-based agent to implement a complex activation vector extraction method within vLLM, a codebase of roughly 12.5 million lines. A task that would have taken weeks of onboarding and implementation was completed in seven hours with 99.9% numerical accuracy.

TELUS reported shipping code 30% faster with agents, saving over 500,000 hours across the organization.

These are not toy demos. This is production-grade acceleration at enterprise scale.

I find this genuinely exciting. And none of it changes what I am about to say next.

The risks are equally real

Lars Faye's Agentic Coding is a Trap struck a nerve because it named something many of us were feeling but not saying out loud. The core argument: the skills you need to supervise AI agents are the exact skills that atrophy when you over-rely on them.

The trade-offs that need honest discussion are already quantifiable:

Skill atrophy at scale. The debugging and reasoning abilities required to supervise agents degrade measurably when you stop exercising them.
System complexity to compensate for non-determinism. AI outputs are probabilistic. The guardrails, review layers, and validation infrastructure required to make them production-safe add real engineering overhead.
Vendor dependency for individuals and entire teams. Claude Code outages have already left teams at a standstill. When your workflow depends on a third-party model, their downtime becomes yours.
Unpredictable and rising costs. An employee's cost is fixed. Token pricing is a constantly moving target, dictated unilaterally by providers who can "nerf" a model and force you to burn two to three times more tokens for the same result.
A widening security attack surface. Autonomous agents with broad permissions introduce threat categories that traditional security controls were never designed to handle.
Regulatory exposure most teams are not preparing for. The EU AI Act's high-risk obligations take effect in August 2026, and many agentic workflows are closer to the compliance line than their operators realize.

These are not hypothetical concerns. Let me expand on the ones that matter most.

Cognitive debt

Lars Faye calls this the "paradox of supervision." Anthropic's own research on how AI assistance impacts coding skill formation backs it up: in a controlled study, developers using AI scored 50% on average versus 67% for those coding manually, with the largest gap appearing specifically in debugging questions.

Senior developers with decades of experience report being unable to explain systems they technically "built" with agents. I have written before about the gap between perceived velocity and actual throughput. The pattern here is the same: the metric that looks good on the dashboard is hiding a cost that only surfaces later.

The cognitive friction of writing code, hitting errors, reading documentation, and resolving conflicts manually is not wasted effort. It is the mechanism through which engineers actually understand what they are building.

As I argued in From Attention Economy to Thinking Economy, the challenge is not whether AI eliminates jobs. It is whether we protect the cognitive abilities that make us valuable in the first place.

Security surface expansion

Autonomous agents translate a single instruction into long chains of API calls, database queries, and data manipulations. If an adversary compromises an agent's input, the blast radius is exponentially larger than a traditional exploit.

Research from 2026 shows an 88% success rate in bypassing guardrails on open-source models using automated probing techniques. Indirect prompt injection, where malicious instructions hide in external content the agent reads, requires far fewer attempts than direct attacks.

Dependency poisoning can inject zero-day vulnerabilities straight into your CI/CD pipeline. A CVSS 10.0 remote code execution vulnerability discovered in Google's Gemini CLI in early 2026, exploitable specifically in CI/CD pipeline environments, made this supply-chain risk impossible to ignore.

Regulatory pressure

On August 2, 2026, the EU AI Act's high-risk obligations take effect. Under Annex III, AI systems used to allocate tasks based on individual behavior or to monitor and evaluate worker performance in employment contexts are classified as high-risk.

Coding agents do not automatically fall under this scope, but the line gets blurry fast when orchestrator systems start auto-assigning tickets, ranking PR quality, or feeding into performance reviews.

Article 14 requires that human supervisors understand the system's capabilities and limitations, remain aware of automation bias, correctly interpret outputs, and retain the ability to override them.

Organizations that let engineers rubber-stamp massive AI-generated pull requests without genuine comprehension are building a compliance liability, whether or not they realize it yet.

The real problem is not the risks. It is the denial.

Here is where I part ways with both camps.

The skeptics read all of this and conclude: stop using agents. Go back to writing everything by hand. Agentic coding is a trap, full stop.

The enthusiasts read all of this and shrug. They treat any discussion of downsides as FUD from people who "don't get it." They dismiss cognitive atrophy as a skill issue. They wave away security concerns as solvable later.

Both responses are wrong, but the second one is more dangerous.

In engineering, we do not ship without testing. We do not deploy without monitoring. We do not scale without load testing.

We never adopt a technology by pretending it has no failure modes. That is not engineering. That is wishful thinking.

The people who refuse to discuss the risks of AI-assisted development are not optimists. They are in denial.

And denial is how promising technologies get killed. Not by their limitations, but by the backlash that follows when those limitations are discovered too late by people who were told everything was fine.

I have seen this pattern play out across two decades in this industry. The technologies that survived had honest advocates. The ones that did not were oversold by people who confused enthusiasm with recklessness.

What honest adoption looks like

Anthropic's own data reveals what they call the "Delegation Paradox": engineers use AI in 60% of their workflows but can fully delegate only 0-20% of actual tasks.

This is not a failure of the tools. It is the reality that high-stakes architectural work resists probabilistic automation. Accept it and plan around it instead of fighting it.

That means building deliberate constraints into how you and your team use these tools.

Maintain your skills deliberately. Use agents where they genuinely accelerate: boilerplate, exploration, context retrieval, test scaffolding. The scaffolding use case remains the healthiest relationship most engineers can have with AI right now.

But regularly write core logic yourself. Run pair programming sessions where AI is off. During code reviews, trace the logic manually.

If you do not exercise the debugging and reasoning muscles, they atrophy within months. This is not a metaphor. It is what the data shows.

Respect the context limits. Agents suffer from measurable "context rot." A Databricks study found that model correctness drops significantly around the 32,000-token mark, well before theoretical limits.

The "lost in the middle" phenomenon means agents routinely miss critical guidelines buried in large context windows. Agents confidently invent non-existent variables, mix incompatible framework versions, or hallucinate API calls because they failed to parse intermediate contextual data.

This is not a bug that will be fixed next quarter. It is a fundamental characteristic you need to design around.

Never generate more code than you can review. If your agent produced a 10,000-line pull request overnight and your team approved it in 20 minutes, you did not ship faster. You shipped blindly.

The volume mismatch between machine generation speed and human comprehension speed is the single biggest enabler of the "LGTM" culture that is quietly degrading code quality across the industry.

Strict volume constraints are not a productivity bottleneck. They are what keeps your codebase deterministic instead of probabilistic.

Invest in specification as the primary artifact. When implementation is nearly free, the specification becomes the real engineering work.

Formal, machine-readable specs with explicit non-goals, hard constraints, and testable acceptance criteria prevent agents from filling ambiguity with hallucinated assumptions. Spec-driven development is not overhead. It is the structural response to a world where generating code is trivial and verifying it is expensive.

Watch for the junior developer trap. When confronted with bugs in generated code, many junior developers treat the problem as a "prompt engineering issue" rather than a logic flaw. They tweak prompts repeatedly instead of reading the code.

In this dynamic, the agent delivers the results, the developer takes the credit, and nobody builds real engineering skills. If you lead a team, you have a responsibility to ensure your junior engineers build foundations, not just prompting habits. Their long-term career depends on it.

Prepare for regulatory compliance now. The EU AI Act's August 2026 enforcement date is not far away. If your agentic workflows touch task allocation or performance evaluation, you may already be in high-risk territory under Annex III.

Even outside that scope, Article 12 requires continuous logging over the system's lifetime, and Article 14 requires human overseers who genuinely understand the system, not just approve its output.

If your current workflow is "agent generates, junior approves, code ships," start asking whether that process would survive regulatory scrutiny. The organizations that treat governance as infrastructure rather than bureaucracy will be the ones that scale AI adoption sustainably.

The technology deserves better advocates

The cognitive debt is real. The security surface expansion is real. The regulatory pressure is real. The skill atrophy is measurable and documented.

None of this means we should stop using these tools.

All of it means we should use them like engineers: with eyes open, with guardrails in place, and with the humility to admit what we do not yet fully understand.

The enterprises that will thrive are those that explicitly instrument their workflows to prevent human cognition from atrophying. That treat the agent as a tool of the intellect rather than a replacement for it.

The engineers who will thrive are those who master what the probabilistic agent inherently lacks: systemic architectural vision, contextual judgment, and the willingness to take responsibility for what ships.

I am betting on AI-assisted development. And that bet means taking its risks seriously enough to contain them.

Because the best thing you can do for a technology you believe in is to be honest about it.

Originally published on pixari.dev

You Cannot Mandate Your Way to AI Adoption

Raffaele Pizzari — Sun, 19 Apr 2026 12:34:18 +0000

Most AI adoption strategies in engineering organizations are failing for one of three reasons: leadership mandates tool usage, tracks individual adoption rates, or does neither and hopes something changes.

Each fails differently. Together, they explain most of the friction between executive expectations and engineering teams right now.

The polarization lives inside your organization

I have written before about the gap between AI discourse and AI reality. But there is a version of that gap that lives inside your organization, and it is more expensive than the one on LinkedIn.

Executives — often validly — see AI tools demonstrating real velocity gains in controlled environments. They see competitors moving faster. They read the reports. They push for adoption.

Engineers — also often validly — see AI-assisted pull requests failing review more often, debug time rising, and new categories of subtle bugs appearing in production. They know that the person professionally accountable for the code that ships is them, not the tool. The gains in the demos are real. So is the debugging cost that does not appear in the demos.

Both observations are correct. The problem is structural: the benefits appear where executives measure, and the costs appear where engineers work. The data confirms this split. AI-assisted pull requests contain on average 1.7 times more issues than human-authored ones. Experienced developers on complex brownfield tasks took 19% longer with AI than without. Not because AI is useless, but because it shifts the bottleneck from writing to verifying, and verification is expensive.

When those two realities meet in the same organization without a coherent strategy, you get polarization. And then you get one of three bad responses.

Mandating adoption

The most common response from leadership is the most destructive: set adoption targets, mandate specific tools, and track whether engineers are meeting the numbers.

This fails for a reason that goes beyond morale. Developers know they own the code that ships. When you mandate a tool they distrust, you are asking them to stake their professional reputation on outputs they cannot fully verify. That is not resistance to change. That is a rational risk calculation.

Boston Consulting Group has identified a ceiling for this dynamic. Only half of frontline employees effectively apply AI tools in practice when forced, because the tools are not integrated into how they actually work. Adoption numbers look acceptable on a dashboard. Actual behavior changes minimally.

What mandates reliably produce: surface compliance, metric gaming, and resentment. The developers who would have experimented most productively — the senior engineers with the institutional knowledge to evaluate AI outputs critically — become the most resistant. They recognize the pattern.

AI adoption happens because the tool is demonstrably useful to the person using it. That is not idealism. It is the only path that produces real behavior change.

Monitoring individual adoption

The second response is subtler: do not mandate, but measure. Track adoption rates, count AI-assisted commits, monitor prompt volume per engineer. Use the data to understand who is using the tools.

The intention is reasonable. The execution creates what researchers call "surveillance allergy." When AI usage becomes an individual performance signal, developers optimize for the metric instead of for the outcome. They accept AI suggestions they would otherwise reject. They avoid flagging AI-generated code they are uncertain about, because doing so creates a visible record of uncertainty.

This is exactly the wrong direction. Good AI usage depends on engineers being critical evaluators of AI output. Surveillance incentivizes uncritical acceptance — which is what drives the code quality problems in the first place.

The principle that fixes this: AI metrics should never feed into individual performance evaluations or compensation decisions. Communicate this explicitly, not just once. Measure at the system level instead. Adoption rates against change failure rates. AI-assisted PR percentages against incident volume. If quality drops as adoption rises, the process needs structural adjustment. That is a systemic diagnosis, not an individual one.

Doing nothing

The third response is laissez-faire: no policy, no approved tools, no guidance. Let engineers figure it out.

What this produces is shadow AI. Not because developers are reckless, but because they are solving real problems with the tools available to them, in the absence of anything better. It looks like individual productivity. It is actually unmanaged data risk.

When engineers feed proprietary source code, internal architecture, or customer data into unvetted public LLMs, the organization loses control of its most sensitive assets without a trace in any audit log. The risk is not that AI exists. It is that unregulated AI multiplies data paths faster than security teams can map them. Fragmented adoption across hundreds of individual tool choices makes uniform governance impossible and ROI measurement meaningless.

Shadow AI is a symptom of governance failure. The only remedy is providing a real alternative: a centralized platform of approved, enterprise-licensed tools with clear security boundaries, within which developers have genuine autonomy to choose what works for their workflow.

The identity dimension most strategies miss

Underneath all of this is a human problem that most adoption playbooks do not name: the developer identity crisis.

Senior engineers did not choose this profession to orchestrate AI. They chose it to build things. The satisfaction of tracking down a production bug, of optimizing a slow query until response times drop from seconds to milliseconds, of understanding a system at a level few others do — these are not peripheral to engineering identity. They are central to it.

Annie Vella, a Distinguished Engineer and AI researcher at Westpac, found in her research that 77% of engineers report spending less time writing code. Her blog post on this went viral with over 65,000 views — not because it was controversial, but because it named something engineers had been carrying without language for it.

The developers most valuable for AI adoption — the seniors with the contextual knowledge to catch what AI gets wrong — are the ones for whom the role shift is most disorienting. This is not a coincidence. Treating their skepticism as simple resistance misses the actual problem.

The reframe that works: the craft does not disappear, it scales. What matters now is how code is architected, how robust it is, how testable it is, how secure it is. The ability to affect quality and outcomes without typing every line is still engineering — it is a more leveraged version of the same discipline. Making this case explicitly, and creating individual integration paths based on where each engineer derives meaning from their work, is more effective than any uniform rollout policy.

What actually works

The organizations seeing durable AI adoption share a common structure.

A centralized platform team evaluates, procures, and security-validates AI tools. They produce an approved toolkit — enterprise-licensed options — and developers choose within that toolkit. No single vendor mandate. But all outputs conform to the same architectural standards and review processes, regardless of which tool generated them. The AI adapts to the organization's conventions, not the reverse.

Measurement is systemic. Adoption rates are tracked against change failure rates and incident volume at team and org level. When quality drops as adoption rises, the pace slows and governance catches up before continuing.

Integration paths are individual. Senior engineers get roadmaps based on where AI genuinely reduces friction in their specific work. Junior engineers get AI literacy training — critical evaluation of outputs, system design fundamentals — before unrestricted tool access.

The staged approach that works: start with low-risk work and no metric pressure. Let engineers discover what is genuinely useful. Then, once there is organic pull, remove the friction — documentation, environment setup, tooling integration — that slows everyday use.

The governance stakes

One more thing worth naming directly: regulatory scrutiny of AI usage in software engineering is coming. In some sectors it is already here.

The organizations with centralized platforms, audit trails, and systemic measurement will be able to answer the questions that compliance, legal, and regulators will ask. The organizations with fragmented, ungoverned shadow AI will not.

Governance is not a constraint on AI adoption. Done correctly, it is the infrastructure that makes adoption sustainable. The organizations treating it as bureaucratic overhead will spend far more time explaining their data incidents than they saved by skipping the process.

Build the governance first. The adoption follows.

Originally published on pixari.dev

The Gap Between Estimated and Delivered

Raffaele Pizzari — Wed, 08 Apr 2026 22:03:25 +0000

Here's a pattern I've seen play out dozens of times. A team estimates a feature at 5 story points. Low complexity, clear requirements, well-understood domain. By every estimation framework, it's a small task.

It ships three weeks later.

The team gets blamed for bad estimation. Leadership pushes for better grooming, more detailed breakdowns, tighter story points. The team tries harder. Next sprint, the same thing happens.

The estimate was never the problem.

Estimation does what it's supposed to do

Modern estimation frameworks are actually good at what they measure. The best ones decompose work into multiple dimensions: complexity (how hard is this to understand), effort (how much raw work), uncertainty (how many unknowns), and risk (what external factors could derail it). Each dimension gets scored, and the combination drives the story points.

This works. A 5-point story really is a 5-point story. The team correctly assessed the complexity of the code, the effort required, the technical unknowns. They weren't wrong.

The problem is that estimation measures the work. It was never designed to measure the environment the work has to travel through.

The gap has a name

Between "estimated" and "delivered" sits everything the estimation framework doesn't capture. I call it org friction: the invisible overhead that organizational structure, processes, and cross-team dependencies impose on every piece of work.

That 5-point story took three weeks not because the team misjudged the complexity. It took three weeks because a schema change needed approval from another team. The design review sat in a queue for days. Security wanted to sign off because it touches user data. The one person who understood the legacy service was unavailable. And the engineer doing the work got two focused hours per day between meetings, incidents, and "quick questions."

None of that is estimation error. It's organizational drag. And unlike technical debt, which at least gets mentioned in retros, org friction is untracked and unowned. It doesn't show up in Jira. Nobody has "reduce org friction" in their OKRs. But it's eating 30-40% of your team's capacity.

Seven sources of org friction

After paying attention to this for a while, I see the same patterns everywhere.

Cross-team dependencies. The work itself takes hours. The waiting takes days. Waiting for the Platform team to review your PR. Waiting for Design to finalize a spec they promised last sprint. The estimate captured the work. Nobody captured the queue time.

Process overhead. Change management boards, architecture review committees, compliance gates. Each one was created for a good reason. None of them were ever removed when the reason went away. They accumulate like sediment.

Knowledge silos. That one engineer who understands the billing service. That one PM who knows the historical context behind a weird product decision. When they're unavailable, work stops. Not because it's technically blocked, but because nobody else has the context to make the call.

Legacy process debt. This is different from technical debt. It's the outdated deployment process that requires manual steps. The testing pipeline that takes 45 minutes because nobody prioritized making it faster. The onboarding doc that hasn't been updated in two years. Not broken enough to fix, but slowing everything down.

Decision latency. No formal gate, just nobody with clear authority to make the call. Or the person who does keeps deferring. The feature is technically unblocked, but the team is waiting for someone to say "yes, build it this way." Estimates assume decisions happen instantly. They never do.

Context switching tax. The estimate assumed focused time. Reality: support rotations, incident responses, Slack threads from other teams, syncs that could have been async. The work takes three days of focused effort, but your engineers get two focused hours per day. The calendar is the friction nobody accounts for.

Misaligned incentives. Team A is measured on shipping features. Team B is measured on system stability. Team A needs Team B to deploy a breaking change. Team B has zero motivation to help. This isn't a technical problem. It's an organizational design problem. And it shows up as a "missed estimate" on Team A's board.

Stop improving estimation. Reduce the drag.

When teams consistently miss delivery targets, the default response is to push for better estimation. More grooming sessions. More detailed breakdowns. More precise story points.

This misses the point entirely.

The estimates are fine. A 5-point story is still a 5-point story. What changed between "estimated" and "delivered" wasn't the complexity of the work. It was the friction the work encountered on its way to production. Making teams better at predicting friction doesn't reduce it. It just makes everyone more accurately pessimistic.

The real lever is reducing the drag. And that's a leadership problem, not an engineering one.

Run a Friction Log

Here's the most useful thing I've done as an EM to make org friction visible.

For one sprint, ask your team to keep a shared doc. Call it a Friction Log. The rules are simple: every time someone is blocked, delayed, or slowed down by something that isn't the code itself, they log it. One line. What happened, how long they waited, which friction source it was.

No analysis during the sprint. Just logging. Keep it low effort.

After two weeks, read it together. What you'll see is a document that's almost impossible to argue with. Not opinions about process. Not complaints. Just a factual record of where time went that had nothing to do with the work your team estimated.

The first time I ran one, the log showed that 40% of the sprint's elapsed time was spent waiting on things outside the team's control. Cross-team reviews, decision latency, a compliance gate that took four days for a one-line config change. The estimates had been accurate. The org had been expensive.

A Friction Log does three things:

It separates estimation from delivery. You can see that the 5-point story really was a 5-point story. The extra two weeks was org friction, not estimation error. Once you can see the gap, you can talk about it.

It gives you data, not complaints. "Our estimates are always off" gets you a shrug. "Here's a log showing we spent 11 days this sprint waiting on cross-team reviews" gets you a conversation. Leaders respond to patterns with numbers.

It picks your battles for you. After one sprint, the top friction source is obvious. You don't have to fix everything. Pick the one that shows up the most and work on reducing it for the next quarter. Maybe it's getting embedded in the platform team's sprint review so your PRs don't sit in a queue. Maybe it's documenting the legacy service so you're not dependent on one person. Small, compounding improvements.

Nobody designed it this way

Nobody designed org friction into your company. It accumulated. One reasonable process at a time. One well-intentioned approval gate at a time. One team boundary at a time. Each decision made sense in isolation. Together, they created an invisible tax on every piece of work your team ships.

Your estimates aren't the problem. Your org is just more expensive than anyone's willing to admit.

The question isn't whether your team can estimate better. It's whether you, as a leader, are willing to name the drag and do something about it. Because the gap between "estimated" and "delivered" isn't a measurement error. It's a leadership opportunity.

Originally published on pixari.dev

The Engineering Manager Role Is Mutating

Raffaele Pizzari — Mon, 06 Apr 2026 07:44:34 +0000

I read Gregor Ojstersek's piece the other day. "Would I Still Go The Engineering Manager Route in 2026?". And it hit me in a strange way. Not because of the answer but because of the question. The fact that someone who's been doing this for years is publicly asking whether he'd do it again says something.

I've been in this role for almost three years. And I'd be lying if I said I hadn't asked myself the same thing.

But here's the thing. I've done this loop before.

Before I was an EM, I was an IC. Before I was an IC, I ran my own web agency for eight years. I was the CEO. I did sales, code, hiring, client management, architecture, marketing. Full ownership, full autonomy. And then I deliberately walked away from that, went back to building as an individual contributor, and worked my way through Lead and Principal before choosing management again.

I chose this role knowing what I was giving up. I'd already tasted the other side. Twice, actually.

That context shapes how I see what's happening to the EM role right now.

The job changed

When I moved into management, the deal was roughly this: you stop writing code full-time, you spend your days on people, process, and delivery. You run 1:1s. You shield the team from organizational noise. You hire, you coach, you make sure things ship. In return, you get a seat at the table and the satisfaction of watching people grow.

That deal still technically exists. But the fine print keeps growing.

Somewhere along the way, the expectations started stacking. I've seen it in my career, I've seen it in job postings, I've heard it from every EM I talk to. Be technical enough to review architecture. Be strategic enough to present to leadership. Be empathetic enough to handle burnout and conflict. Own delivery metrics. Own hiring pipelines. Own the roadmap conversation with product. And increasingly: pick up some coding too.

I don't think anyone designed it this way. It just accumulated. Browse any EM job listing from 2026 and count the responsibilities. Technical leadership, people management, delivery ownership, hiring, strategy, stakeholder alignment. Each one makes sense on its own. But the list keeps getting longer and nothing ever falls off.

I've seen this movie before

When I was running my agency, mobile was going to change everything. We had to rethink our entire business. Some agencies died. We adapted.

When I came back to IC, cloud and microservices were going to change how teams work. Some companies over-rotated, split everything into tiny services, and spent years untangling the mess. The ones who kept their heads did fine.

Now it's AI.

I've watched three cycles where a technology was supposed to make a role obsolete or fundamentally different. And the pattern is always the same: the technology matters, it does change things, but it doesn't change the things people think it will.

Mobile didn't kill web agencies. It made them busier. Cloud didn't eliminate ops. It renamed them. And AI won't replace engineering managers. But it is changing the math around the role in ways that are worth paying attention to.

What AI actually changes

AI doesn't build trust across a team over months. AI doesn't read the room in a planning meeting and realize the real problem isn't the estimate, it's that people don't believe in the project. AI doesn't tell a PM that the deadline is unrealistic and hold the line.

But AI does make small teams more productive. And when small teams are more productive, companies start asking why they need so many managers. The ratio shifts. Where you had one EM for five or six engineers, now it's eight. Ten. Twelve. The scope grew but the support didn't.

That's the actual pressure. Not "AI replaces managers." Just "we need fewer of them, and the ones we keep do more."

Combine that with the IC career track getting better, real Staff and Principal roles with real compensation, and you've got a situation where the best engineers don't need management to advance. The people who want to be EMs used to be the best engineers who also cared about people. Now some of them are choosing Staff instead. I don't blame them.

The player-coach myth

I keep hearing "player-coach" as if it's aspirational. Like we should all want to be the EM who also ships features.

I've been a player-coach. It means you do both jobs badly. You context-switch between a PR review and a difficult conversation about someone's performance. You write code in the gaps between meetings, which means you write code in 25-minute windows, which means you write bad code. Or you stay up late to get the focused time, which means you burn out faster.

The industry uses "player-coach" like it's a compliment. It's usually a budget decision disguised as a philosophy. Someone needed to cut a headcount and decided the EM could absorb the work.

I'm not saying it can't work. In early-stage startups, in small teams, when the scope is tight, sure. But in a 40-person org with multiple squads? If your EM is regularly shipping features, something is wrong with your staffing, not right with your EM.

The hard problems don't change

Here's what I keep coming back to, across all the cycles I've lived through. The technology changes. The hard problems don't.

Getting people aligned on what matters. Making decisions with incomplete information. Knowing when to push and when to protect the team. Helping someone grow when they don't see their own potential yet. Having the conversation everyone's avoiding.

Those problems looked the same when I was running my company in 2015. They look the same now. AI didn't create them and AI won't solve them. They're human problems, and the EM role exists because someone needs to own them.

That's why I'm not worried about the role dying. I've watched enough cycles to know it won't. But it will mutate. It always does. And the EMs who get left behind won't be the ones who ignored AI. They'll be the ones who forgot that the human stuff is the actual job.

It's one title, but it's not one job

I think a lot of the confusion comes from a naming problem. We all call ourselves "Engineering Manager" but we're doing wildly different jobs depending on the company, the stage, and the org.

I've talked to EMs who spend 80% of their time on architecture and code review. I've talked to EMs who haven't opened an IDE in two years and spend their days on coaching, hiring, and cross-team alignment. I've talked to EMs who are basically program managers with a different title, owning delivery timelines and stakeholder updates. All of them have the same title on LinkedIn.

This matters because most of the frustration I hear from EMs isn't really about the role. It's about the mismatch. You took the job expecting to be a people leader and you ended up being a delivery lead. Or you wanted to stay close to the technical decisions and instead you're spending your weeks in stakeholder meetings. The role didn't let you down. The expectations were just never made explicit.

If you're an EM and something feels off, before you question whether you want to be a manager at all, try a simpler question first: which version of this job is your company actually asking you to do? And is that the version you want?

Sometimes the answer is "I'm doing the wrong version of EM at the wrong company." That's a very different problem than "I don't want to be an EM anymore," and it has a very different solution.

If you're hiring an EM or about to become one: have that conversation early. Don't just talk about the team size and the tech stack. Talk about what the job actually looks like on a Tuesday. Where does this EM spend most of their time? What does success look like in six months? The clearer that picture is, the fewer EMs will burn out wondering why the job doesn't feel like what they signed up for.

The feedback loop problem

Something I've noticed talking to other EMs. A lot of us still build things on the side. Side projects, open source, weekend hacks. And when you ask why, the answer is almost never "to stay technical." It's because the feedback loop is different. You write code, you see it work, you feel something. It's immediate.

Management has its own rewards. Watching someone you coached nail a presentation. Seeing a team you built ship something complex without drama. Those moments are real. But they're slow. They happen over months. The feedback loop in management is measured in quarters, not commits. You have to learn to find satisfaction in that pace, and some weeks it's easier than others.

So would I do it again?

Yes. Without hesitation. I already did it twice.

I walked away from running my own company. I went back to being an IC. I had the full picture, the autonomy, the ownership, and I chose to come back to building first and then to management. Not because I had to. Because I'd seen enough of both sides to know which problems I actually wanted to spend my days on.

I love this job. The role is broader and harder than when I started. The industry hasn't settled on what it wants EMs to be. But that ambiguity is part of what makes it interesting. You get to shape it.

If you're an IC thinking about management: go in with your eyes open. The role is worth doing. It's also messier and more ambiguous than it looks from the outside. Talk to EMs. Ask them what their actual week looks like, not their LinkedIn version of it.

And if you're already an EM: the role is changing fast. It always has been. The technology driving the change is new but the pattern isn't. The EMs who'll thrive are the ones who can sit with the ambiguity and shape the role instead of waiting for someone to define it for them. That's always been the interesting part of this job anyway.

Originally published on pixari.dev

I Built a Game That Teaches Git by Making You Type Real Commands

Raffaele Pizzari — Thu, 02 Apr 2026 00:00:39 +0000

I work in IT, and there's one scene I keep witnessing. A developer joins the team, they're sharp, they ship features, they write clean code. And then someone asks them to rebase, and you can see the panic set in.

It's not their fault. Git is taught badly.

Every git tutorial I've ever seen follows the same formula: here's a diagram of branches, here's a table of commands, now go practice on your own repo and try not to destroy anything. It's like learning to drive by reading the car manual. Technically accurate. Practically useless.

I've watched junior developers memorize git add . && git commit -m "fix" && git push like an incantation, terrified to deviate because the one time they tried git rebase they ended up in a state that required a senior engineer and 45 minutes of git reflog to unfurl.

And I've watched senior developers, people with a decade of experience, avoid git bisect entirely because nobody ever showed them what it actually does in a safe environment.

So I built one.

Gitvana

Gitvana is a browser game. You play a monk climbing toward "git enlightenment" at the Monastery of Version Control. There's a Head Monk who assigns you tasks, a judgmental cat, and pixel art that looks like it belongs on a Game Boy.

But underneath the retro charm, there's a real git engine. When you type git init in the terminal, it runs git init. When you type git commit, it creates an actual commit in an actual repository. The repository lives in your browser, powered by isomorphic-git and an in-memory filesystem, but it's real. Every command, every SHA, every ref.

35 levels. 6 acts. 21 git commands. From git init to git bisect. No slides, no diagrams, no hand-holding. Just you and a terminal.

Play it at gitvana.pixari.dev. It's free, it works offline, and it doesn't want your email.

Why a Game

I could have written another tutorial. I could have built a sandbox. But I've been thinking a lot about how people actually learn, and the answer isn't "reading."

People learn by doing things that are slightly too hard, failing, figuring out why, and trying again. That's what games are. They're structured failure environments with feedback loops.

Every level in Gitvana has a target state, a set of conditions that the git repository must satisfy. "There must be exactly 3 commits on main." "The branch feature must be deleted." "The file config.yml must not contain the API key in any commit." The game validates these conditions in real time as you type commands. You see the checklist turn green, one objective at a time.

This isn't gamification bolted onto a tutorial. The game is the learning.

The Journey: 6 Acts

The structure mirrors how a developer actually encounters git:

Act 1: Awakening — The basics. init, add, commit, status, log, diff. You're a new monk. The Head Monk is patient. The cat is skeptical.

Act 2: The Middle Path — Branching, merging, cherry-pick, revert, stash. Things start getting interesting. You begin to understand that git isn't a linear timeline, it's a tree.

Act 3: Rewriting Reality — rebase, amend, squashing commits, purging secrets from history. This is where most developers tap out in real life. In Gitvana, you can't tap out. The monastery doors are locked.

Act 4: The Safety Net — reflog, blame, bisect, disaster recovery. The levels where you learn that git never truly forgets, and that reflog is the "undo" button nobody told you about.

Act 5: Advanced Techniques — Surgical staging, dependency chains, the operations that separate "uses git" from "understands git."

Act 6: Gitvana — The final trial.

Each act introduces new commands gradually, with in-game documentation you can pull up without leaving the terminal.

The Tech

The stack is deliberately minimal. Svelte 5 for the UI, xterm.js for the terminal, isomorphic-git for the git engine, and lightning-fs for the in-memory filesystem. No backend. No database. No accounts. Everything runs in your browser and your progress saves to localStorage.

The interesting engineering problems were all in the details:

Rebase was the hardest command to implement. The real git rebase is a multi-step, stateful operation. It collects commits, replays them one by one, and can pause mid-way for conflict resolution. I had to build a state machine that saves rebase progress to .git/rebase-merge/, handles --continue and --abort, and writes proper conflict markers when files clash.

Bisect maintains its own state files in .git/, just like real git. It performs an actual binary search across commits to find where a bug was introduced. In one level, you have to find which commit broke a test by using git bisect start, marking commits as good or bad, and letting the algorithm narrow it down.

The blame algorithm walks the entire commit history, builds a content-at-commit map, and attributes each line to the oldest commit where it appeared unchanged. It's not efficient. It doesn't need to be, these repos are tiny. But it's correct.

The level validator checks 12 types of conditions in real time: file existence, file content, branch existence, HEAD position, commit count, commit message patterns, merge commits, conflict state, staging area state, and tag existence. Every keystroke can potentially satisfy an objective, and the UI updates instantly.

Sound effects are procedurally generated with the Web Audio API. No audio files. Just oscillators, frequency envelopes, and square waves. Every commit gets a satisfying chiptune beep. Every merge conflict gets an ominous buzz.

What I Learned Building It

Building an educational game taught me more about git than 15 years of using it.

I had to read the git internals documentation to implement commands correctly. I discovered that git stash is essentially syntactic sugar over a specific commit-and-reset workflow. I learned that the reflog is just a flat file of HEAD movements. I finally understood, at the implementation level, why a detached HEAD happens and what it actually means in terms of refs.

There's a difference between using a tool and understanding it deeply enough to rebuild it. This project forced the second.

The Pixel Art Problem

I can't draw. At all. My artistic ability peaked at stick figures in 1993. But I wanted Gitvana to have a specific aesthetic: 16-bit monastery vibes, cherry blossoms, monks in robes, a cat that judges your commits.

I used PixelLab to generate the sprites. I'd describe what I wanted: "pixel art monk in grey robes, standing, 64x64, side view, retro game style" and iterate until it felt right. The landing page monastery, the mountain progression map, the four monk tiers (grey, brown, blue, golden) were all generated this way.

It's not hand-crafted pixel art. But it has soul. And it's consistent, which matters more than perfection when you're a solo developer trying to ship something.

Why It's Free

Because I built it for fun. That's the honest answer.

I had a problem: I wanted to understand git at the implementation level, not just the "copy this command from Stack Overflow" level. Building a game that teaches it forced me to actually learn it. Selfish motivation, great side effect.

And maybe other people have the same problem. Maybe there's a developer out there who's been using git for five years and still gets nervous when someone says "rebase." If Gitvana helps them, great. If not, I still had a blast building it.

There's no paywall, no signup, no "premium" tier. The source code is on GitHub. It's MIT licensed. Fork it, improve it, translate it, add levels.

Try It

gitvana.pixari.dev

35 levels. Real terminal. Real git. Zero setup.

Start at Act 1. Get to Gitvana. The cat is waiting.

Originally published on pixari.dev

Why Your Team's Docs Are a Strategic Asset (Not an Afterthought)

Raffaele Pizzari — Wed, 01 Apr 2026 22:34:43 +0000

Good documentation is more than just a chore; it's a strategic asset that can transform how your engineering team operates.

Far too often, documentation is viewed as an afterthought, something to be done only when absolutely necessary, or worse, not at all.

But what if we reframed our perspective?

The Hidden Power of Good Documentation

Think of clear, concise documentation not as a task, but as an act of professional kindness.

Kindness to your future self: We've all been there, staring at old code or a system we designed months ago, wondering about a specific decision or implementation detail. Good documentation acts as a reliable memory, saving you hours of head-scratching and reverse-engineering your own work.
Kindness to your teammates: Onboarding new team members can be a time-intensive process. Comprehensive documentation allows them to get up to speed faster, understand existing systems, and contribute effectively without constantly interrupting others for explanations. It empowers them to find answers independently, fostering a more autonomous and efficient team.

Beyond the Basics: Why Documentation is a High-Leverage Activity

When documentation is prioritized and well-executed, it significantly reduces friction within a team. It clarifies processes, defines responsibilities, and provides a single source of truth for critical information. This, in turn, increases individual and team autonomy. Engineers can make informed decisions and solve problems without constant oversight, leading to faster development cycles and fewer roadblocks.

Ultimately, robust documentation forms the bedrock of a maintainable system. It ensures that knowledge isn't siloed in individual minds but is instead shared and accessible, making systems more resilient and easier to evolve. In the grand scheme of engineering activities, creating good documentation is one of the highest-leverage tasks you can perform, yielding disproportionate returns in team efficiency, system stability, and overall project success.

Making Documentation an Asset, Not an Afterthought

So, how can you shift your team's mindset and practice to make documentation an asset? It starts with recognizing its value and integrating it into your workflow, rather than relegating it to an optional, last-minute item.

What are your team's biggest struggles when it comes to documentation, and what strategies have you found most effective in overcoming them?

Originally published on pixari.dev

Why Empathy is a Hard Skill

Raffaele Pizzari — Wed, 01 Apr 2026 22:34:13 +0000

The modern workplace is full of buzzwords, and few are as overused as "empathy." We hear it in every leadership seminar and read about it in every management book. But what does it truly mean to lead with empathy, and how does it translate from a lofty concept into a practical, day-to-day skill.

For many years, I believed that leadership was about competence and results. My focus was on the numbers, the deadlines, and the outputs. I challenged my teams directly, but I often missed the "caring personally" part of the equation. It felt soft, almost like a distraction from the real work. What I eventually learned, often the hard way, is that empathy isn't a distraction, it’s a prerequisite for durable, high-performing teams.

The Problem with Overly Direct Feedback

Many of us were raised on the idea that direct, sometimes harsh, feedback is the only way to drive performance. While challenging people directly is crucial for growth, when it’s delivered without a foundation of empathy, it often lands as criticism. This creates a defensive environment where people shut down, stop taking risks, and fear making mistakes.

This is where the principles of Kim Scott’s "Radical Candor" became a powerful compass for me. The idea is simple: care personally and challenge directly. The two parts are not independent; they are a symbiotic relationship. You can't have one without the other and expect to build a team that thrives. The "caring personally" part is the empathy. It's the engine that makes the "challenging directly" part effective.

Empathy isn't Weakness, It's Leverage

Leading with empathy isn't about being everyone's friend or avoiding tough conversations. It's about taking the time to truly see and hear the people on your team. It means understanding their professional and personal aspirations, recognizing their struggles, and genuinely celebrating their wins. It’s about creating an environment where they feel safe enough to be vulnerable.

When you invest in that foundation of psychological safety, challenging feedback becomes a gift, not a threat. Your team knows that your feedback comes from a place of support for their growth, not a place of judgment. This is the ultimate form of leverage. You can demand excellence and push boundaries because your people trust that you are pushing them for their own benefit.

Practical Ways to Apply Empathy as a Leader

So, how do you make empathy a tangible part of your leadership toolkit? It starts with small, consistent actions:

Listen Actively: Put away your phone and give your full attention. Ask open-ended questions and listen to understand, not just to respond.
Show Vulnerability: Acknowledge your own mistakes. It shows your team that imperfection is normal and creates a safe space for them to do the same.
Recognize Effort, Not Just Results: Celebrate the hard work and resilience, especially when a project doesn't go as planned. This builds trust and encourages risk-taking.
Remember the Person: Ask about their weekend, their family, or their hobbies. These small moments can build a powerful personal connection.

Empathy is not a soft skill; it is a hard, pragmatic skill that directly impacts a team’s performance. It’s the difference between managing a group of individuals and leading a unified, high-performing team.

Originally published on pixari.dev

When the hard part was the point

Raffaele Pizzari — Wed, 01 Apr 2026 22:33:42 +0000

I still remember the weight of the book.

It was 2003. I was building a text search engine in Perl. I was trying to write a recursive function to traverse a directory tree without blowing up the server’s memory.

I didn't have Copilot. I didn't have ChatGPT. I didn't even have StackOverflow open, it didn't exist. I just had a heavy, physical Perl manual with a cracked spine, an open space with many loud colleagues and the hum of the computer fans.

I spent days on that function. I remember the frustration. I remember the panic of staring at a cursor that wouldn't move. But mostly, I remember the texture of the moment the logic finally clicked. It was a physical sensation, a headache dissolving into pure clarity.

For the last 20 years, I have defined my professional worth by my ability to endure that friction. I was a watchmaker, and I took an immense amount of pride in the fact that the gears were incredibly small, the manual was hard to read, and my hands were the only ones steady enough to place them.

And now, I am grieving.

The Robbery

Last weekend, I sat down to work on a personal side project. This wasn't for work; this was for me. I hit a roadblock with a particularly nasty piece of logic involving data synchronization.

A few years ago, this would have been the best part of my Saturday night. It would have been a ritual: a fresh pot of tea, a blank notebook, and three hours of deep work until I cracked the code.

This time, almost out of muscle memory, I pasted the error into a prompt window.

I didn't even get to take a sip of my tea.

Four seconds. That’s how long it took. The code appeared. It handled the edge cases. It was cleaner than what I would have written. I pasted it in. It worked perfectly.

I didn't feel efficient. I felt robbed.

I had robbed myself of the flow state. I had robbed myself of the "Aha!" moment. It was like taking a helicopter to the summit of Everest. Yes, the view is the same. But the person standing at the top isn't the person who climbed the mountain. They haven't been changed by the ascent.

The Identity Crisis

We talk about AI velocity. We talk about "10x engineers." But we aren't talking about the silence that comes after the code is written.

For many of us, engineering wasn't just a trade; it was an identity built on suffering. We were the magicians who knew the secret spells. We were the ones willing to read the documentation at 2 AM.

When the "hard parts" become trivial, when the struggle is removed, it forces us to ask a terrifying question:

If I am not the one struggling, who am I?

If my value isn't in my ability to grind through the logic, and my value isn't in my encyclopedic knowledge of the Perl manual, then what have I been doing for the last 2 decades? Was I just a slow, biological text-generator waiting to be optimized?

The Wisdom of the Scar

But perhaps I am asking the wrong question. Because there is a fundamental difference between an LLM and a Senior Engineer, and it isn't intelligence. It’s trauma.

An AI has never been woken up at 3 AM by a critical production alert. An AI has never felt the cold sweat of realizing a migration script just dropped the wrong table in production. An AI has never had to sit in a post-mortem meeting and explain to a CEO why the site was down for four hours.

That terror? That is where wisdom comes from.

The "friction" we are mourning wasn't just annoying; it was educational. Every time we struggled, we were building a map of the minefield in our heads.

The AI is an eternal optimist. It assumes the happy path will work. It assumes the API will respond. It assumes the data is clean. It has infinite knowledge, but zero scar tissue.

And in our line of work, the scars are the only things that tell you where the ice is too thin to walk.

Where the Friction Lives Now

So, how do we navigate this? If the "writing" is gone, where do we put our obsession?

I’ve realized that we don't have to abandon our standards; we have to elevate them. We need to take that restless energy that used to go into syntax and pour it into verification and architecture.

Become the Editor, Not the Writer: The AI is an eager, hallucinations-prone engineer. Your new struggle is not creating the code, but having the taste to look at 50 lines of generated logic and spot the subtle architectural flaw that will haunt you in six months. The "friction" is now in the review.
Protect Your "Gym" Time: I have made a new rule for myself. Every day I turn the AI "off" for ~50% of my coding time. I force myself to write code by hand. Not because it’s efficient, it isn’t, but because my brain needs the gym. We lift weights not to move iron, but to keep our muscles strong. We must code manually to keep our intuition sharp.
From Photorealism to Impressionism: Photography didn't kill painting. It forced painters to stop trying to be photorealistic. When the camera arrived, painters realized: “The machine can capture the light better than I can. So I must capture the feeling.”

We are at that same juncture. The era of the "Photorealistic Coder", the one who takes pride in memorizing syntax, is over. The era of the Impressionist Engineer is beginning.

Our value is no longer in how we build the wall. Our value is in knowing where to put the window so the light hits the room just right.

The "Good" Struggle

If you are reading this and feeling a twinge of sadness, I want you to know it’s okay.

It’s okay to miss the blinking cursor. It’s okay to miss the frustration of the physical manual. It’s okay to miss the era where the barrier to entry was high, because that height made us feel safe.

We can accept the new tools. We can use the helicopter when the destination is all that matters. But every once in a while, for the sake of our souls, we should still climb the mountain on foot.

I suspect I'm not the only one feeling this phantom limb.

Originally published on pixari.dev

Using AI in the Software Development Lifecycle (Without Slowing Shipping)

Raffaele Pizzari — Wed, 01 Apr 2026 22:33:11 +0000

AI is everywhere in the software development lifecycle: code completion, test generation, docs, and even design. The promise is faster, better output. The risk is typing faster but shipping slower—more generated code to review, more wrong abstractions, and more time debugging AI output. Here’s how to use AI in the software development lifecycle in ways that actually speed you up instead of slowing you down.

Where AI Helps in the SDLC

Scaffolding and boilerplate.

Generating CRUD, config, tests stubs, and repetitive code from a clear spec or prompt is where AI shines. You stay in control of design; AI fills in the tedious parts. Think “build the scaffolding, not the whole house.”

Documentation and comments.

Turning code into docs, or keeping comments in sync with behavior, is a good fit. So is generating runbooks or API descriptions from existing code—as long as someone reviews for accuracy.

Tests and validation.

Generating unit tests, edge cases, or example data can improve coverage quickly. The key is running the tests and fixing failures; don’t trust generated tests without review.

Exploration and learning.

Asking “how does X work in this codebase?” or “what’s the pattern for Y?” can speed onboarding and investigation. Treat answers as a starting point, not gospel.

Refactoring and small, mechanical changes.

Renaming, formatting, or applying a pattern across many files can be suggested or partially done by AI. Again, review and tests are essential.

Where AI Tends to Slow You Down

Big, greenfield features from a single prompt.

AI can produce a lot of code that looks plausible but is wrong, over-engineered, or doesn’t fit your system. You spend more time fixing and aligning than if you’d written a smaller, targeted slice yourself.

Critical paths and subtle logic.

Security, correctness, and performance need human judgment. Use AI to suggest or draft, then verify carefully.

When context is missing.

AI doesn’t know your product decisions, your constraints, or your team’s conventions. The more you provide (specs, examples, ADRs), the better the output—and the more you rely on “just generate it,” the more rework you get.

Practices That Keep Shipping Fast

1. Invest in context AI can use.

Good docs, clear APIs, and up-to-date specs improve AI output and reduce back-and-forth. Treat documentation as the “context window” for both humans and tools.

2. Prefer small, verifiable steps.

Use AI for small units of work (a function, a test, a doc section) that you can review and test immediately. Avoid “generate the whole feature” unless you’re willing to treat it as a draft to heavily edit.

3. Tighten the feedback loop.

Strong test coverage and fast CI mean you catch AI mistakes quickly. Without that, you risk merging broken or brittle code.

4. Set team norms.

Decide what’s acceptable to generate (e.g. tests, boilerplate, comments) and what always needs a human design (e.g. security, APIs, data models). Review generated code like any other code.

5. Measure impact.

Track cycle time, bug rate, and rework. If “AI-assisted” work takes longer or introduces more incidents, adjust where and how you use AI.

Using AI in the Software Development Lifecycle: Summary

Use AI for: scaffolding, boilerplate, docs, test generation, exploration, mechanical refactors.
Be cautious with: large features from one prompt, security/correctness-critical code, and anything where context is vague.
Keep shipping: small steps, strong tests, clear context, and team norms so AI speeds you up instead of burying you in rework.

AI in the SDLC is most valuable when it handles the repetitive, well-defined work and leaves you in control of design and quality. Use it there, and you can ship faster without slowing down.

Originally published on pixari.dev

The Year I Stopped Apologizing for the Chaos

Raffaele Pizzari — Wed, 01 Apr 2026 22:32:40 +0000

Let’s look at the data, because that’s usually where I feel safe.

My oldest is three years old.

My twins turned one just three months ago.

For the last 15 months, I have been attempting to function as a human being while outnumbered by toddlers. I work from home as an engineering manager, a role I am incredibly grateful for, but one that requires a brain that isn’t constantly running on fumes.

If you looked at my calendar this year, you saw neat blocks of meetings and deep work.

If you looked inside my room, you saw a father muting his mic to gently negotiate with a crying toddler, wiping oatmeal off his shirt five minutes before a call, and functioning on a sleep schedule that simply doesn’t add up.

It has been the toughest year of my life.

I love my children more than I thought possible. They are my world. But love doesn’t replace sleep, and deep affection doesn’t pause the backlog.

It is possible to be incredibly grateful for your family and completely broken by the logistics of raising it at the same time.

The Weight We Carry

We need to talk about the guilt. The specific, heavy kind that sits on your chest at 2 AM.

I am lucky. I work for a company that is incredibly empathetic. They support families. They understand flexibility. I am not fighting a toxic employer. I am fighting an internal narrative that tells me I should be able to do it all perfectly.

But let’s be clear: that narrative didn’t appear out of thin air. It was installed by a culture that equates human value with constant output. Even in a supportive environment, the societal pressure remains.

There are days when I close my laptop and feel I didn’t give enough to my team. Then I step away from my desk and feel I didn’t give enough to my kids.

We are constantly told, implicitly or explicitly, that if we just organized our time better, woke up earlier, or had more discipline, we could balance it all.

That is a lie.

We are living through a time where we are expected to work like we don’t have children, and parent like we don’t have jobs. That isn’t a puzzle you can solve if you just try harder. It is simply impossible.

Permission to Be Human

If you are reading this and feeling that same shame, I want to tell you something I wish someone had told me six months ago: You are not wrong.

You are not broken. You are not “bad at this.”

We have been taught that vulnerability is a weakness, that as professionals, and especially as men, we need to present a facade of calm competence. We hide the chaos. We apologize when “life” interrupts the background blur. We treat our exhaustion like a shameful secret.

But the exhaustion isn’t a sign of failure. It is the only logical response to the situation.

The shame you feel? That doesn’t belong to you. That belongs to a culture that demands the impossible and then blames you when you can’t deliver.

We need to stop hiding the cracks in the armor. We need to stop pretending that we are untouched by the chaos. Real strength isn’t about carrying the weight without stumbling; it’s about admitting that the weight is too heavy to carry alone. It is about saying, “I am struggling,” and realizing that this admission doesn’t make you less of a leader,

It makes you human.

The Privilege of Vulnerability

I just argued that we should stop hiding the cracks in our armor. But I need to be honest about why it is safe for me to drop my shield.

I am a white man with a supportive partner, a stable salary, and a company that treats me like a human being. I have every safety net money and social status can buy.

This mountain of privilege doesn’t just buy me safety; it buys me the permission to be human.

When I write about my chaos, I get applauded for my “authenticity.” When I admit to being overwhelmed, I am seen as a “relatable leader.” I can wear my vulnerability like a badge of honor because my competence is rarely questioned.

But I know that for many, especially women and mothers, showing those same cracks isn’t a badge of honor. It’s a professional liability.

Women have been carrying this load (and often a much heavier one) for generations. Yet, when they show the strain, society doesn’t offer them the same “permission slip” it hands to me. It offers judgment. It perceives their exhaustion not as a systemic failure, but as a lack of commitment.

So while I am proud to share my struggle, I am acutely aware that the ability to do so without fear of professional penalty is, in itself, the ultimate privilege.

To the men reading this: We possess the “political capital” to change this narrative, and we have a duty to spend it. We need to be the ones to break the facade. When we say, “I can’t make that 5 PM meeting, I have childcare duties,” we create a blast radius of safety for everyone else who is terrified to ask for what they need.

We have to stop pretending everything is fine just because we are scraping by.

You Are Doing Enough

If you are just surviving right now, you are doing enough. The work will always be there. The deadlines can move. But you, and your family, are the only version that exists.

This season of life is relentless, but it is also finite. Don’t measure your worth by your productivity during a crisis. Measure it by your ability to be kind to yourself when the world demands you be a machine.

We need to stop hiding the scars and start supporting the people. We need to build a culture where “I am tired” is a valid status update, and where asking for help is recognized as an act of courage, not a confession of weakness.

Resilience isn’t about never breaking; it’s about knowing that you shouldn’t have to carry the weight alone.

I am still navigating this myself. I don’t have a roadmap for how to be the perfect parent in this chaos. I am also actively trying to be a better ally to listen to the experiences I don’t share and to speak up when silence would be easier. I’m just trying to draw the map as I go. But I know that navigating it alone is the hardest way to travel.

If this resonated with you, I’d love to hear your story. Whether you want to vent about the impossible schedule, share a small win, or just tell me how you’re keeping the lights on, my inbox is open. We can’t fix the whole system today, but we can start by making sure no one has to debug this mess alone.

Originally published on pixari.dev