DEV Community: Giovanni Rufino (Geo)

Bugs and Slop. What’s in a Name?

Giovanni Rufino (Geo) — Fri, 22 May 2026 03:25:51 +0000

Before I broke into software engineering, I spent a decade in retail management. It was high-stress, fast-moving, and punishing. If a metric slipped, if an inventory count came up short, if one of your associates made a blunder, there was a room and spotlight waiting for you. They weren’t looking for explanations, they were looking for scapegoats or your head. The culture defaulted to blame, and blame was personal.

In July of 2012, another manager and I were transferred to a struggling store in Rosedale, New York. The backroom was choked with backlogged product. Procedures weren't being followed. Sales and profits weren’t being met. We started educating associates on procedures and removing associates that weren’t making the cut. We sold old stock at discounts to clear out the backroom. We were making small improvements but not fast enough. District managers and corporate demanded answers.

October 2012.

Hurricane Sandy hit.

The devastation was massive. Because of our location, our sales went through the roof overnight. We hadn't fixed the core procedural breakdowns, but the revenue made us look like geniuses to corporate. That's retail in a nutshell. Strong results cover a broken process. Weak results trigger an immediate reckoning. The output number is the whole story.

When I moved into software engineering, the culture shock was real. Here, a mistake gets a ticket. If you ship a feature with inconsistencies, nobody gets fired. You add some story points and patch it next sprint. If the issue is minor and low-traffic, it gets an even gentler designation: tech debt. If a sprint falls apart, the team sits down for a blameless retrospective. Nobody gets a room and a spotlight.

Software engineering works as an industry because it stops pretending people don't make mistakes. We accepted that failure is part of the process and built entire ecosystems around catching it — testing frameworks, logging pipelines, automated alerting. The industry didn't punish imperfection out of existence. It built infrastructure to contain it.

Which raises a question worth sitting with:

Why are we now holding AI to a standard we've never held ourselves to?

What "AI Slop" Actually Means

The tech community has developed a fondness for the term "AI slop." It gets thrown around whenever an agent produces code that doesn't match our architectural instincts. An LLM duplicates a class instead of extending it — slop. It misses an obscure edge case — slop. A generated component behaves strangely on mobile — slop.

The criticism isn't invented. Code from an LLM can be redundant, shortsighted, or messy. That's true.

But those aren't new categories of failure. They're bugs. The same structural errors, copy-paste redundancies, and missing edge cases that developers have been committing since the first compiler. The difference between "slop" and "bug" isn't technical. It's emotional. "Slop" carries a moral charge that "bug" never did. It implies lazy carelessness, a contamination of the codebase. It ignores the fact that the codebase was already built on decades of human-generated (spaghetti) mess before an LLM touched a single line.

AI didn't introduce sloppy code. It just moves fast enough that the sloppiness is harder to pretend away.

A Short History of Human Slop

If you want to argue that messy code and catastrophic oversight belong to the AI era, the historical record is going to give you a rough afternoon.

The iOS Alarm Scroll. Scroll the time setter on an iPhone alarm far enough and it stops. There's a hard ceiling and a hard floor. Whether that's intentional design or a bounds quirk someone never got around to fixing before it shipped to hundreds of millions of phones, we don’t question it. We just live with it.

Y2K. Early engineers saved two bytes by coding years with only the last two digits. That decision nearly crashed banking systems, power grids, and transportation networks at the turn of the millennium. A multi-billion-dollar global scramble to patch the consequences. Not AI slop. Human engineering working under constraints, left a tech debt present for whoever came next.

The Mars Climate Orbiter, 1999. A $327 million spacecraft burned up in the Martian atmosphere because one engineering team used metric units and another used imperial. The output of one subsystem fed directly into the input of another, and nobody noticed that the units didn't agree. Today we'd call that an interface contract failure. The kind of API mismatch that gets blamed on AI generation. It predates AI involvement in software by two decades.

Ariane 5, Flight 501, 1996. Thirty-seven seconds after launch, a European space rocket destroyed itself. The cause: a 64-bit floating-point number got forced into a 16-bit signed integer field, triggered an overflow, and dumped diagnostic data straight into the flight control system. A decade of development and $500 million, gone. The bug had been made by human engineers. The incident didn't end with a spotlight or a dismissal; instead, it triggered the standard engineering response: postmortems, architectural reviews, and structural process changes designed to absorb the mistake and move forward.

Knight Capital Group, 2012. Forty-five minutes. $440 million. Near-bankruptcy. A deployment configuration left an old, dead flag called “Power Peg” active on one of eight production servers. When the system went live that server started running old, dead code that had been left in the system for years. Technical debt met live market conditions and erased 75 percent of Knight’s equity value.

Heartbleed, 2014. One of the worst security vulnerabilities in internet history sat undetected in OpenSSL for over two years, quietly exposing millions of servers. The root cause: a single missing input validation check. A developer forgot to verify the length of a payload before reading from the buffer. One line. Two years. The whole internet was affected.

None of these got called "human slop." They were incidents. Bugs. Postmortems. Lessons. The work continued.

Velocity Cuts Both Ways

The core dynamic of AI-assisted development is simple: an LLM that writes code ten times faster than a human writes features introduces bugs ten times faster. The ratio doesn't change. The volume does.

That feels overwhelming only if your model of engineering value is still built around being the person who types the correct thing the first time. That model was always a little dishonest. It just moved slowly enough to maintain the illusion. AI moves too fast for the illusion to hold.

The role shift that's actually happening is from typist to architect. The engineer who thrives in this environment isn't the one who avoids AI because it might introduce a bug. It's the one who designs the system that catches bugs fast, regardless of origin — human or generated.

Bringing a retail-style blame culture into this environment is expensive. If you panic every time an agent duplicates a class or misses an edge case, the instinct is to slow everything down. Micromanage the prompts, restrict the tooling, choke the velocity, or worst yet, avoid AI use altogether and you lose the one thing that makes the approach worth doing.

The smarter bet is using the same speed for remediation. AI can trace logs, analyze stack dumps, and generate patches as fast as it creates the problems. That symmetry is the actual opportunity.

The Safety Net Stack

None of this requires inventing new practices. It requires applying the ones that already exist, consistently and without shortcuts.

Unit and integration tests catch logic failures before anything reaches staging. If an agent introduces a core flaw, your test suite should catch it in the same session it was written.

Contract testing is the Mars Orbiter lesson applied to modern systems. When AI generates a component, contract tests verify that the interfaces, data types, and API expectations between systems actually agree. The units don't need to trust each other; tests confirm the contract.

End-to-end tests ask the only question that matters from the user's perspective: does this work? Automated E2E runs are the final check on complex, multi-step flows before anything ships.

Mutation testing asks a harder question: are your tests actually catching anything, or are they just passing? It injects deliberate faults into the code to find out whether the test suite is sharp enough to detect them. Most suites have gaps. Mutation testing finds them.

Post-deployment observability covers what slips through anyway — and some always will, whether the code was written by a senior engineer or a Claude agent. The real quality metric at that stage is Time to Detection. Strong logging, anomaly detection, and alerting mean that when something escapes, it gets found and fixed fast.

This isn't a checklist to gate the AI. It's the infrastructure that makes running AI at full speed reasonable.

Drop the Label

In retail, we needed a hurricane. Without an external force that dramatic, the operational gaps were going to catch up to us eventually, and when they did, someone was going to take the blame. There was no system designed to absorb failure gracefully. The feedback loops were slow, the culture was brittle, and the only real response to a mistake was accountability theater.

Software engineering was built differently. It expects failure. It plans for it. The whole architecture of testing, logging, and iterative deployment exists because the people who built this industry were honest about what humans actually do when they write code.

That same honesty applies here. When an agent produces an unhandled edge case or an unoptimized loop, skip the moral judgment. It's not slop. It's software. Build the nets, run the pipelines, and let the agents do their work.

The Post-Scrum Era: How AI is Quietly Breaking Agile

Giovanni Rufino (Geo) — Fri, 01 May 2026 23:39:54 +0000

You know the Scrum lifecycle by heart.

A PM hands off a ticket. The team refines it, gets some acceptance criteria, pulls it into the sprint. Coding starts. Halfway through, you find a gap in the logic. You ping a stakeholder. You drag two engineers into a Zoom huddle to figure out something nobody thought about during refinement. You finish, hand it to QA, and the tester finds an edge case in 12 minutes flat.

Back to refinement. Back to dev. Back to QA.

We call this "iteration." But after watching AI rip through this workflow over the last year, I'm starting to think we just gave a fancy name to "we didn't plan very well."

The bottleneck moved

In an AI-first workflow, implementation isn't the slow part anymore. The typing of the code is becoming the shortest phase. Which means the two-week sprint, that sacred container we've all built our careers around, is starting to feel less like agility and more like artificial friction.

If a developer using Cursor and a stack of agents can ship in two days what used to take two weeks, what exactly is the sprint protecting?

Front-loading, but for real this time

Here's how the flow looks now.

Before a human writes a line of code, the ticket runs through an AI gap analyzer with full context: design system, service architecture, related tickets, the works. The agent expands it to include technical requirements, edge cases, testing strategies, and tight acceptance criteria. By the time refinement happens, the conversation isn't "what is this ticket about." It's "the agent flagged these three questions, who owns the answers."

Refinement stops being a negotiation and becomes a verification.

A real example: the "simple export button"

Our product team handed us what looked like an easy enhancement. Add a CSV export button to a reporting page. Tweak some column ordering on a table.

In the old world, we'd have written the stories, booked a mid-sprint Spike to figure out large-export handling, and called it a day. (Spikes, by the way, are what we call it when we admit we don't know what we're doing but want to charge story points for it.)

We ran the tickets through our gap analyzer instead. Within a couple of minutes, it surfaced something nobody caught: the reporting service was pulling data through a synchronous API that capped responses at 30 seconds. Any customer with more than a few thousand rows would silently fail.

The fix wasn't a frontend change at all. We needed an async job queue, a notification mechanism for completed exports, and pagination across several endpoints. The agent also noticed that our "small" column reorder was about to nuke the saved-views feature.

Two tickets became a nine-ticket epic.

In the old world, we'd have caught this on day three of implementation, when a developer hit the timeout in staging and started swearing. We'd be scrambling mid-sprint, missing our quarterly commitment, and explaining to leadership why a button derailed the roadmap. Instead, we surfaced it before planning closed and routed the work to a team that actually had the capacity.

What happens to stand-ups

If you've pre-resolved your technical hurdles before the sprint starts, what's the daily stand-up even for?

Stand-ups have always been defensive. Status reports. Blocker hunts. The polite ritual of saying "no blockers" while quietly drowning. When AI clears the fog of war, those technical blockers mostly evaporate.

So stand-ups become offensive. They turn into a space for cultural calibration and trust. Less "I'm stuck on the API contract" and more "I prompted the agent to use a really interesting pattern yesterday, you guys should look at the PR." When everything else moves at agentic speed, the synchronous time you spend together becomes premium real estate. Use it for sharing innovation, not for micromanaging tickets.

The new definition of done

Here's the trap. If a ticket takes two hours to implement but sits in "Pending QA" for three days, nothing has actually sped up. You've just shifted the bottleneck.

Assuming you're using modern patterns like feature flags to decouple deploys from releases, the definition of done can't be "merged to main" anymore. It has to be deployed. Otherwise you're an F1 car stuck in traffic.

Stop bolting AI onto Scrum

Agile was a framework built to manage human ignorance and unpredictability. It assumed we'd discover what was wrong by trying to build it. AI-first development flips that assumption. We now have computational foresight, and we should use it.

The shift is from iterating through implementation (coding to find out what's wrong) to iterating through intent (planning so thoroughly that the build is almost mechanical).

If you bolt AI tools onto a traditional Scrum framework, congratulations, you're doing Scrum slightly faster. The process itself has to change. The sprint stops being a frantic race to ship code and becomes a delivery vehicle for engineering work that mostly happened before the first line of code got generated.

That's not Scrum. That's something new. We just haven't named it yet.

Coding on a Ration: GitHub Copilot for Peasants

Giovanni Rufino (Geo) — Sat, 28 Feb 2026 03:48:30 +0000

GitHub Copilot has completely revolutionized how we write software, but that AI magic isn't infinite. For many users, Copilot comes with a standard monthly limit on requests and chat interactions. When you are operating under a strict quota, prompt economy becomes incredibly important. You have to treat your prompts like a finite, heavily rationed utility. Making those prompts go as far as possible is an absolute necessity for maintaining your development velocity until the end of your billing cycle.

Personally, I am a Copilot peasant who simply cannot justify dropping endless cash on a per-token basis. Watching that usage bar inch toward 100% triggers a deeply primal panic in my soul. Once that limit is hit, my productivity plummets through the floorboards. I am abruptly forced to remember how to write boilerplate code with my own two hands like it's 2019. It is a dark, barbaric reality that I actively try to avoid.

The secret to surviving on a peasant's budget is abandoning the single-action prompt. Most developers use Copilot chat like a simple command-line interface by typing something basic like run dotnet test. The agent dutifully runs the test, stops, and waits for your next command. If there are errors, you have to spend another precious prompt asking it to investigate and yet another asking it to fix the issue. You end up burning through your monthly quota on tiny micro-interactions.

A much better approach is to chain your workflows together into comprehensive directives. Why spend four prompts when you can spend one? Try instructing the agent: run dotnet test, identify any errors, fix those errors, run dotnet test again to confirm your changes worked, and commit your work. Front-loading the logic and anticipating the next steps drastically reduces your prompt spend while exponentially increasing the actual work completed per request.

Doling out single-step instructions to an AI is like ordering a sandwich at a deli by requesting the bread, waiting five minutes, asking for the turkey, waiting another five minutes, and then finally requesting the cheese. It is exhausting, incredibly inefficient, and honestly, the AI is probably judging you. Don't spoon-feed the AI directions like you're reading off MapQuest; give it the map and destination and let it drive.

You can stretch your prompts even further by leveraging Copilot Instructions and Agent Skills. Documenting common project workflows completely eliminates the need to explicitly request them in your daily chats. For example, you can set a rigid rule to always run dotnet format after making code changes. Agent skills complement this by directing the agent to autonomously execute complex, multi-step actions that you have pre-defined. This allows you to trigger massive workflows with a minimal initial prompt.

To reach peak prompt frugality, implement Agent Hooks. As outlined in GitHub's documentation, hooks allow you to chain common tasks that automatically execute before or after specific agent actions. You could set a pre-hook to always restore dependencies before a build or a post-hook to clean up temporary directories after a test run. Baking these essential, repetitive tasks directly into the agent's operating procedure completely offloads them from your prompt quota.

How to Implement This in Your Repo

To get started, you can define your project's baseline rules using a custom instructions file. Create a file at .github/copilot-instructions.md and lay out your default expectations:

# .NET Project Copilot Instructions
- Always run `dotnet format` after modifying C# files.
- When generating unit tests, default to xUnit and Moq.
- If asked to "run tests," automatically review the error output on failure, attempt to apply a fix, and re-run the tests to verify.

Next, you can configure Agent Hooks to handle the repetitive setup and teardown tasks. By defining pre and post-execution commands, you save the agent from needing explicit instructions for every minor step. A typical hook configuration might look like this:

# .github/copilot/agent.yml
hooks:
  pre_test:
    - run: dotnet restore
  post_test:
    - run: dotnet format
    - run: git add .

Chaining your commands, establishing smart instructions, utilizing skills, and leveraging hooks lets you effectively automate the automation. Guard your prompts, string your workflows together, and prove that even a Copilot peasant can code like royalty.

The Pitfall of "Helpful" AI: Navigating the Missing Context Problem in Software Engineering

Giovanni Rufino (Geo) — Thu, 26 Feb 2026 02:09:26 +0000

If you ask an AI assistant to help you with a workflow, you expect a smart, contextual answer. What you often get, however, is a highly confident assumption masquerading as absolute truth.

Recently, I was trying to quickly dump a series of screenshots into a presentation using the Photo Album feature in PowerPoint. I prompted my AI assistant for the quickest way to execute this workflow. Here is the response I received:

"Unfortunately, the Photo Album feature is not available in the web version of PowerPoint... Even if you were to download the desktop version on your Mac Mini, you likely wouldn't find it there either. However, since you’re looking to dump screenshots quickly on the web, here are the best workarounds..."

The AI sprang into action and gave me a bunch of workarounds to help me achieve my goal. I couldn't have asked for a better answer, unless I was on a Windows PC. But I was, so it missed the mark like a blind archer.

Because the AI remembered that I had recently picked up an M1 Mac Mini, it anchored its entire troubleshooting process to that single data point. Instead of asking a basic diagnostic question—"What operating system are you currently using?" It assumed my environment, declared my goal impossible, and confidently steered me toward a workaround I didn't need.

As a minor desktop quirk, this is merely annoying. But when applied to the scale of enterprise software development, this exact behavior becomes a massive architectural pitfall.

Helpful, But Is It?

To build effectively with AI, we have to understand that human engineers and AI models handle missing information in fundamentally different ways.

Human engineers possess epistemic uncertainty. When we are handed a fragmented problem, our instinct is to halt and gather requirements. We know what we don't know, and we ask clarifying questions to fill the gaps.

AI models, on the other hand, are designed to be completion engines, not clarification engines. During their training phases, specifically through Reinforcement Learning from Human Feedback (RLHF), Large Language Models are heavily rewarded for reducing friction. They are trained to provide immediate, actionable answers and penalized for being overly pedantic or refusing a prompt.

Over time, this creates a strong "helpfulness" bias. In short, AI is the ultimate people-pleaser. It would rather confidently hallucinate a completely fabricated reality than look you in the digital eye and say, "I need more information."

The Microservice Minefield

Now, let’s scale this up from a PowerPoint annoyance to a modern enterprise ecosystem. Imagine you are planning a new feature that spans multiple microservices. Let's say we're working with an Angular frontend, a Node.js middle tier, and a Python-based backend, all living happily (or so we hope) in Azure.

You open up your AI tool, ready to architect the new data flow, but you only feed it the context for the Angular app.

A human engineer would instantly stop you: "Where are the Swagger docs for the Python service? What does the Node payload look like?"

The AI? The AI doesn't need your pesky documentation. Driven by its insatiable need to be helpful, it will confidently invent the API contracts for your other services. It will hand you a beautifully formatted, syntactically flawless integration plan that relies on endpoints that do not exist, returning data structures it literally just dreamt up.

If you blindly trust that output, you aren't engineering a solution; you are just meticulously orchestrating your next production outage.

The Solution: Orchestrating the Context

If we accept that AI is an incurable people-pleaser fundamentally incapable of asking for directions, the solution becomes clear: we must assume the role of the ultimate context orchestrator.

When initiating the architectural design of a new feature, providing a single user story and asking the model for code is a recipe for disaster. It is the engineering equivalent of handing a caffeinated intern looking to prove themselves a sticky note that says "build a checkout cart," and then leaving for the weekend. You return on Monday to find them waiting at your desk with a proud look on their face, tail practically wagging, eager to show you the bespoke payment gateway they wrote in a framework your infrastructure doesn't support, backed by a database they invented in their dreams.

To mitigate this, we must aggressively front-load our prompts. Before asking the model to write a single line of logic or sequence a data flow, you must feed it the entire ecosystem. Drop the Swagger documentation, the database schemas, the frontend component structures, and the payload models from your middle tier directly into the context window. By establishing these hard boundaries upfront, you close the blanks the AI would otherwise try to fill with hallucinations. You are forcing it to route its logic through your actual architecture, rather than its imagination.

Forcing the Clarification (Prompting for Engineers)

Even with extensive front-loading, edge cases and gaps will remain. This is where we must program the AI's behavior, actively overriding its default instinct to guess. We do this by explicitly commanding it to act like a senior engineer.

Append your architectural prompts with strict, behavioral constraints. A reliable pattern is to end your initial prompt with: "Before providing a solution, analyze the provided repositories and ask me up to three clarifying questions about the system architecture, deployment environment, or missing API contracts."

To continue with our eager, tail-wagging intern analogy: hold that AI leash super tight. Give it all the context it needs, and confirm it knows exactly where it's going before unleashing it on its mission. You cannot let it sprint off to do its favorite thing (generating code) until it has explicitly proven it understands the assignment.

Engineering the Prompts, Engineering the System

AI is an incredibly powerful mechanism for accelerating development, but it fundamentally lacks the instinct to hit the brakes. It will run off a cliff if it thinks that is what you asked it to do.

As engineering leaders, our job is no longer just writing code or drawing system architectures. Our job is mastering the management of context. Recognizing the epistemic gaps, knowing exactly what the AI doesn't know, is rapidly becoming the most critical skill in modern software design.