DEV Community: fjavierm

Compliance Is Becoming a Software Engineering Problem

fjavierm — Tue, 23 Jun 2026 11:08:10 +0000

I try to spend as much time as I can talking to software engineers – some I work with, others I meet through meetups or conferences. Over time, I’ve started to notice something curious.

Most are absolute experts at their craft. I am constantly amazed by the mountain of information and acronyms we carry in our minds without even realising it. But almost all of it revolves around things we find interesting, immediate problems we need to solve, or the “new shiny thing”. Rarely, when talking with engineers, do compliance or governance come up.

I totally get it. Compliance is universally perceived as less fun. But despite that reputation, regulatory shifts are happening right now that will directly affect how we build software, and we need to be aware of them.

Many engineers can elegantly explain Raft consensus, debate the merits of eBPF, or spend hours discussing the subtleties of eventual consistency. But mention a CVE, and while most engineers will recognise the term, usually because they encounter it through scanner reports, Dependabot alerts, Renovate PRs, or Jira tickets, go a step further and mention a CWE or a CRE, and familiarity drops dramatically even with experienced developers.

For many engineers, vulnerability identifiers belong to security teams, auditors, or compliance departments. They are things that appear in scanner reports, Jira tickets, Dependabot alerts, or Renovate PRs. They are somebody else’s problem.

That perception is becoming increasingly difficult to sustain.

Modern software development is inseparable from security. Every application is built atop layers of frameworks, libraries, containers, operating systems, and cloud services. Vulnerabilities emerge continuously throughout that supply chain. More importantly, governments and regulators have started treating vulnerability management not as a polite recommendation, but as a legal obligation.

In Europe, the Cyber Resilience Act (CRA) marks a massive shift in thinking. Beginning in September 2026, manufacturers of digital products face mandatory reporting obligations for actively exploited vulnerabilities and severe incidents. In the years that follow, broader security requirements will become legally enforceable, with penalties reaching millions of euros or a significant percentage of global turnover.

We can no longer live in a restricted world where our sole purpose is to resolve problems in clever, performant ways. Understanding vulnerabilities is no longer exclusively the domain of security specialists – it is becoming a core part of software engineering itself.

After years of reading literature around shift-left, DevOps, DevSecOps, and SRE, I used to assume everyone was on the same page. I’ve since realised that was just my personal bias as a cybersecurity hobbyist.

Before we can dive into the legislation that will soon affect us, we need to understand the vocabulary that underpins modern vulnerability management.

Common Vulnerabilities and Exposures (CVE)

Imagine trying to coordinate a fix for a defect without a common naming system. One security researcher publishes a blog post describing a flaw in OpenSSL, a cloud provider releases an advisory using a different title, vulnerability scanners invent their own proprietary identifiers, and operating system vendors create yet another naming scheme. Chaos follows.

The CVE program exists to prevent precisely that. A CVE identifier answers a simple question: “Which specific vulnerability are we talking about?”. For example, Heartbleed became CVE-2014-0160, and Log4Shell became CVE-2021-44228.

These identifiers create a shared language used by researchers, vendors, scanners, incident response teams, and governments. Once a vulnerability receives a CVE number, everyone can refer to exactly the same issue. However, CVEs describe symptoms, not causes. They tell us what is broken, but they don’t explain why.

Common Weakness Enumeration (CWE)

The CWE represents a category of software weakness rather than a specific instance of a bug. For example, CWE-89 refers to SQL Injection, or CWE-79 represents Cross-Site Scripting.

Individual CVEs always map, not without controversy sometimes, back to one or more CWEs. Log4Shell, for example, was ultimately traced back to unsafe lookup behaviour and improper handling of untrusted input. The specific vulnerability was unique, but the underlying engineering weakness had been known for decades.

Unfortunately, this is where the gap between vulnerability management and daily engineering lives. Software engineers rarely think in numeric identifiers; they think in architectural practices: input validation, output encoding, authentication, authorisation, and dependency management.

Common Requirements Enumeration (CRE)

The CRE attempts to close that exact gap. Where CVEs answer “what happened” and CWEs explain “why it happened”, CREs focus on the most practical question of all: “What should engineers do to prevent it from happening again?“

When you put the three frameworks together, they form a complete defensive picture:

A CVE describes an incident
A CWE describes the underlying weakness.
A CRE describes the engineering practices that reduce the probability of that weakness appearing in the first place.

Organisations trapped entirely at the CVE layer spend their lives reactively patching. Organisations that understand CWEs start eliminating recurring technical debt. But organisations that embrace CREs and secure engineering requirements begin preventing vulnerabilities before they ever exist – which is exactly what regulators are now expecting.

For decades, security failures were primarily treated as business risks. Companies suffered reputational damage, customers temporarily lost trust, and major breaches led to lawsuits or expensive remediation. But general regulatory intervention remained light.

That world is gone. The European Union’s Cyber Resilience Act is one of the most ambitious pieces of software legislation ever written. Its premise is straightforward: products containing digital components must be secure by design and maintained throughout their entire lifecycle.

Under this framework, non-compliance penalties can reach up to €15 million or 2.5% of global annual turnover. While these numbers inevitably grab the attention of executives and legal departments, compliance needs to be solved by software engineers:

Lawyers cannot produce a Software Bill of Materials (SBOM).
Finance departments cannot determine if an exposed container image contains a vulnerable dependency.
Executives cannot decide whether a newly reported exploit affects production infrastructure.

Only engineering organisations possess that knowledge. Because of that, engineers must understand the language used by scanners, advisories, regulators, and vulnerability databases.

Additionally, with the arrival of advanced AI tools, discovering vulnerabilities is no longer the hard part. Automated scanners can identify thousands of CVEs in minutes. The actual challenges moving forward are entirely context-driven:

Which vulnerabilities actually matter to our architecture?
Which systems are genuinely exposed?
Which structural weaknesses keep recurring in our codebase?
Which engineering practices need to change?
Which incidents legally require regulatory reporting?

Organisations are no longer overwhelmed by an absence of information; they are overwhelmed by an abundance of it. A single application may depend on thousands of open-source packages, each potentially introducing risk. Security has transformed from a periodic, pre-release gate into a continuous operational discipline.

As engineers, we have to evolve with it. Understanding a CVE is no longer just a task for a security researcher. Vulnerability management is becoming a core part of everyday software engineering, and for many organisations operating in Europe, it will soon become a legal obligation.

Implementing Durable Execution

fjavierm — Sat, 16 May 2026 11:19:08 +0000

Reading and writing about topics we are learning is great, but there is nothing better than some hands-on approach, as such, let’s build a couple implementations of the pizza example described in yesterday’s article.

The first implementation is using the Temporal SDK to allow us to get more familiar with the details of Durable Execution, and consolidate a bit better what we are reviewing. In the second example, we will try to implement our incredible tiny very reduced version of the whole thing.

First example: Using the Temporal SDK

The full implementation of this example can be found in GitHub in the repository pizza-durable-execution. The code and the repository have been heavily documented, which what I think should be enough information to understand the example. But some quick overview is:

PizzaActivities: Activities are the side-effecting operations in Durable Execution.
PizzaActivitiesImpl: Concrete implementation of the activities.
PizzaOrderStarter: Starts a new pizza order workflow instance.
PizzaOrderWorkflow: The workflow interface defines the durable process contract.
PizzaOrderWorkflowImpl: The durable workflow implementation with pure orchestration, zero side effects.
PizzaWorker: The Worker process.

Once we run it, we should see something like:

The pizza worker


=================================================
 Pizza Worker started. Polling: pizza-order-queue
 Now run PizzaOrderStarter to place an order.
=================================================
10:50:09.222 [workflow-method-pizza-order-margherita-001-019e3031-78f6-7222-9627-e9f11a3367de] INFO d.b.pizza.PizzaOrderWorkflowImpl - [WORKFLOW] Starting pizza order for: margherita
10:50:09.247 [Activity Executor taskQueue="pizza-order-queue", namespace="default": 1] INFO d.b.pizza.PizzaActivitiesImpl - [ACTIVITY] Taking order for pizza: margherita
10:50:09.247 [Activity Executor taskQueue="pizza-order-queue", namespace="default": 1] INFO d.b.pizza.PizzaActivitiesImpl - [ACTIVITY] Order created → ORDER-MARGHERITA
10:50:09.256 [workflow-method-pizza-order-margherita-001-019e3031-78f6-7222-9627-e9f11a3367de] INFO d.b.pizza.PizzaOrderWorkflowImpl - [WORKFLOW] Order accepted → ORDER-MARGHERITA
10:50:09.259 [Activity Executor taskQueue="pizza-order-queue", namespace="default": 1] INFO d.b.pizza.PizzaActivitiesImpl - [ACTIVITY] Kitchen preparing pizza for order: ORDER-MARGHERITA
10:50:09.260 [Activity Executor taskQueue="pizza-order-queue", namespace="default": 1] INFO d.b.pizza.PizzaActivitiesImpl - [ACTIVITY] Pizza ready → PIZZA-ORDER-MARGHERITA
10:50:09.262 [workflow-method-pizza-order-margherita-001-019e3031-78f6-7222-9627-e9f11a3367de] INFO d.b.pizza.PizzaOrderWorkflowImpl - [WORKFLOW] Pizza prepared → PIZZA-ORDER-MARGHERITA
10:50:09.262 [workflow-method-pizza-order-margherita-001-019e3031-78f6-7222-9627-e9f11a3367de] INFO d.b.pizza.PizzaOrderWorkflowImpl - [WORKFLOW] Waiting 5 seconds for delivery window (durable timer)...
10:50:14.291 [Activity Executor taskQueue="pizza-order-queue", namespace="default": 1] INFO d.b.pizza.PizzaActivitiesImpl - [ACTIVITY] Dispatching delivery for: PIZZA-ORDER-MARGHERITA
10:50:14.292 [Activity Executor taskQueue="pizza-order-queue", namespace="default": 1] INFO d.b.pizza.PizzaActivitiesImpl - [ACTIVITY] Delivery confirmed → DELIVERED-PIZZA-ORDER-MARGHERITA
10:50:14.297 [workflow-method-pizza-order-margherita-001-019e3031-78f6-7222-9627-e9f11a3367de] INFO d.b.pizza.PizzaOrderWorkflowImpl - [WORKFLOW] Pizza delivered → DELIVERED-PIZZA-ORDER-MARGHERITA
10:50:14.301 [Activity Executor taskQueue="pizza-order-queue", namespace="default": 1] INFO d.b.pizza.PizzaActivitiesImpl - [ACTIVITY] Sending receipt for delivery: DELIVERED-PIZZA-ORDER-MARGHERITA
10:50:14.301 [Activity Executor taskQueue="pizza-order-queue", namespace="default": 1] INFO d.b.pizza.PizzaActivitiesImpl - [ACTIVITY] Receipt sent. Workflow complete.
10:50:14.305 [workflow-method-pizza-order-margherita-001-019e3031-78f6-7222-9627-e9f11a3367de] INFO d.b.pizza.PizzaOrderWorkflowImpl - [WORKFLOW] Workflow complete for order: ORDER-MARGHERITA

The pizza order starter


=================================================
 Starting pizza order workflow...
=================================================
=================================================
 Workflow finished. Pizza delivered!
=================================================

We can check the Temporal UI, and see our execution:

All necessary instructions for running it are present in the README of the project.

Second example: Implementing our own

The full implementation of this example can be found in GitHub in the repository mini-durable-execution-platform. The code and the repository have been heavily documented, which what I think should be enough information to understand the example. But some quick overview is:

MiniTemporal: Single class containing the whole project.
PizzaWorkflow: Durable orchestration of a pizza order.
WorkflowContext: The replay engine: the heart of durable execution.

Once we run it, we should see something like:

The pizza worker


worker started
[EXECUTING] prepare-dough
[EXECUTING] add-toppings
[EXECUTING] bake-pizza
[EXECUTING] prepare-dough
[EXECUTING] add-toppings
[EXECUTING] bake-pizza
[EXECUTING] deliver-pizza

The pizza order starter


[WAITING] prepare-dough
[WAITING] add-toppings
[WAITING] bake-pizza

=== JVM CRASH SIMULATED ===

workflowId=d2a8dc13-e428-4bc2-996e-c721d71592f7
...

[WAITING] prepare-dough
[WAITING] add-toppings
[WAITING] bake-pizza
[WAITING] deliver-pizza

===== ORDER COMPLETED =====
dough=dough-ready
toppings=toppings-added
baked=pizza-baked
delivery=pizza-delivered

=== WORKFLOW COMPLETED ===

All necessary instructions for running it are present in the README of the project.

Durable Execution: The Runtime for Distributed Systems

fjavierm — Fri, 15 May 2026 18:15:43 +0000

Note: This article has two main sections. The first one is an abstract explanation of the Durable Execution concept. The second one is a simple workflow example to try to reduce the abstraction, and show a more realistic view to anchor the explanation. Depending on how you like to learn, feel free to read the explanation first, the example first, or even alternate between them while reading.

There is a quiet but important change happening in how we build software. It isn’t a sudden “revolution”, but rather a new way of thinking about how programs run across multiple servers. We call this Durable Execution.

Durable Execution could be described as a simple inversion of responsibility: instead of treating failure as something applications must anticipate and recover from, durable execution systems assume that failure is constant and design the execution model itself to survive it. Which means we are no longer just coordinating work across services, we are starting to treat execution itself as a persistent, stateful entity.

Most modern backend systems are built on an architecture that, on paper, looks clean and composable. A request enters a system, an orchestrator decomposes it into tasks, and a fleet of stateless workers executes those tasks independently. For example, when you click “buy” on a website, a request goes to a server, which then talks to a database, a payment processor, a storage system, and a shipping service among other things. In this set up, each service is responsible for doing one thing well, and persistence is delegated to databases and queues.

This model scales remarkably well in terms of throughput and organisational clarity. It is the backbone of microservices architecture. But as systems grow in complexity, something inevitable and subtle happens: workflows begin to leak across boundaries.

A “simple” business process such as processing a payment, or fulfilling an order, quietly evolves into a distributed orchestration of services. Each step is straightforward in isolation, yet the overall process becomes fragile not because any single component is complex, but because no single component owns the lifecycle of the workflow itself. The orchestration of these systems eventually involves:

retry policies embedded in clients and workers
state stored in databases with evolving schemas
queues that act as implicit progress trackers
compensating logic scattered across services
and operational heuristics encoded in dashboards and alerts

Eventually, the boundaries of the workflow start blurring, the workflow itself ceases to exist, and becomes highly entangled with the infrastructure. This is the root tension that durable execution tries to address.

To understand the shift, it helps to contrast two models of thinking about distributed systems.

In the traditional worker-based architecture, the system guarantees delivery of work. A message will eventually reach a worker. A job will eventually be retried. A task will eventually be processed. A completed or failed result will be published. But what is not guaranteed is the continuity of execution. If a process begins, partially completes, and then fails mid-way, nothing in the infrastructure inherently remembers what step it was on or what should happen next. That responsibility is pushed upward into application code and external state stores.

Durable execution tries to flip this assumption by, instead of treating an execution as ephemeral, it treats it as a persistent object. A workflow is not something that is “run”; it is something that exists over time. It has a history, a state, and a deterministic progression that can be paused, resumed, replayed, or migrated. This is the core idea behind systems such as Temporal, which model workflows as durable state machines whose execution history is recorded and reconstructed as needed. The runtime becomes responsible not only for executing steps, but for preserving the identity of the execution itself.

At the centre of durable execution lies a constraint that initially feels unnatural to most engineers: workflow code must be deterministic. This does not mean the system itself is deterministic in the mathematical sense. It means that given the same recorded history of events, the workflow must always reconstruct the same state and make the same decisions. This requirement exists because durable systems often rely on replay. When a workflow resumes after a failure, the runtime does not “continue” execution in the traditional sense. Instead, it reconstructs the workflow by replaying prior decisions and rehydrating state from a persisted event history. This has an important consequence: side effects cannot be executed freely during replay. External interactions such as API calls, database writes, message emissions, must be carefully separated from the logical flow of the workflow. In practice, this introduces a separation between:

activities (the side-effecting operations performed by workers)
workflow logic (the durable orchestration layer)

This separation is what allows executions to be safely paused and resumed without ambiguity. While this may feel restrictive, it is precisely this constraint that enables durability.

Let’s try to put side by side some of the characteristics of a traditional system, and the characteristics of Durable Execution systems.

In traditional systems, retries are usually an implementation detail. A worker fails, a message is requeued, and eventually the task is attempted again. But retries quickly become more complex than they first appear, especially when failures happen mid-workflow rather than at the boundaries of tasks. What should happen if a payment succeeds but inventory reservation fails? Should the system retry the inventory step, or compensate the payment? What if compensation itself fails? What if the system crashes between deciding to compensate and actually doing so? These questions are not edge cases; they are the natural consequence of long-running distributed coordination.

Durable execution systems turn retries into a first-class runtime concept. Instead of scattering retry logic across services, the workflow engine tracks execution attempts as part of its history. Time itself becomes a managed dimension of the system, with timers, delays, and waiting periods becoming durable constructs rather than external scheduling hacks. Even waiting for days becomes structurally simple, because the workflow state is persisted independently of process memory. In this sense, durable execution is not just about reliability under failure. It is about treating time as a durable resource.

Moving deeper, once a workflow spans multiple services, failure is no longer binary, it is often partial. Some steps succeed, others fail, and the system must reconcile an inconsistent reality. This is where compensation logic enters the picture, often through patterns such as sagas.

If you don’t know, a saga is essentially a distributed transaction without atomicity. Instead of rolling everything forward or backward as a single unit, the system defines compensating actions that attempt to undo completed steps when later steps fail. In traditional architectures, sagas are notoriously difficult to implement correctly because their logic is distributed across services and tightly coupled to operational state.

Durable execution brings sagas into the workflow layer itself. Compensation is no longer a scattered concern but part of the execution model. The workflow runtime knows what has succeeded, what has failed, and what needs to be undone. This does not eliminate complexity, but it changes where complexity lives. Instead of being embedded in infrastructure glue code, it becomes explicit in the structure of the workflow.

Perhaps the most important conceptual shift introduced by durable execution is the normalisation of long-running processes. In traditional request-driven systems, time is implicitly assumed to be short. A request is expected to complete within milliseconds or seconds. Anything longer is pushed out of band into queues, schedulers, or cron jobs. But many real-world processes do not fit this model. They are inherently extended in time:

waiting for human approval
integrating with third-party systems
coordinating multi-stage financial flows
handling asynchronous physical-world processes

Durable execution embraces this directly. A workflow can span seconds, hours, or weeks without requiring external orchestration mechanisms to simulate persistence. These systems treat long-running execution not as an anomaly, but as a primary use case.

An increasingly important driver of interest in durable execution comes from the domain of AI agents. Modern agentic systems are not stateless request handlers. They maintain evolving context, interact with external tools, retry operations, and often run through multi-step reasoning and action loops that can span long periods of time. Without durable execution, these systems are fragile in predictable ways:

a crash loses context
a timeout breaks continuity
partial tool execution leads to inconsistent state
retries can duplicate side effects

What is emerging is a recognition that AI agents are, structurally, workflows. They are not single computations, they are long-running, stateful processes that require persistence, replayability, and controlled side effects. This is why systems such as durable workflow engines are increasingly being explored as the underlying runtime for agent orchestration, not just business process automation.

It is tempting to view durable execution systems as simply better workflow engines, but that framing underestimates what is actually changing. Traditional orchestrators coordinate tasks across workers. They are, in essence, message routers with state tracking bolted on. Durable execution systems, by contrast, begin to resemble execution environments. They manage:

state persistence
execution history
scheduling and timers
retries and failure recovery
deterministic replay
coordination across services

This is why a useful mental model is to think of them as a kind of distributed operating system. Not for hardware resources, but for business logic execution over time. The comparison is not perfect, but it is instructive. Just as operating systems abstracted away hardware complexity to allow applications to run reliably on unstable machines, durable execution abstracts away distributed failure to allow workflows to run reliably on unstable networks.

Durable execution is still an emerging paradigm. It is not yet the default model for building distributed systems, and many teams continue to rely successfully on queues, workers, and stateless services. But the pressure that led to its development is becoming more pronounced. Systems are becoming more distributed, workflows are becoming longer-lived, and AI systems are introducing a new class of stateful computation that does not fit neatly into request/response paradigms.

What durable execution offers is not a new tool, but a new boundary. It draws a line around execution itself and says: this is something worth making persistent. If databases taught us how to make data reliable, durable execution is attempting something analogous for computation. And if that trajectory continues, workflow engines may evolve from infrastructure components into something closer to what operating systems became for the hardware era: a foundational layer that quietly defines how everything else runs.

Simple workflow example

If you are like me, after reading this you are probably thinking “Yeah, that sounds good, but what does it actually look like?“. For that reason, let’s try to make the abstraction more concrete, and look at a single workflow.

Imagine a simple order fulfilment process: ordering a pizza. In a traditional system, this would be split across services, queues, and callbacks. In a durable execution system, it is expressed as a single continuous workflow, but importantly, it does not execute like a function call. It behaves more like a stateful timeline.

The workflow (conceptual view)


Workflow: PizzaOrder

Step 1 → Take order ("margherita")
Step 2 → Prepare pizza (orderId)
Step 3 → Wait for 5 seconds (or 5 hours, or 5 days)
Step 4 → Deliver pizza
Step 5 → Send receipt

At first glance, this looks trivial. The important part is what happens under the hood.

What actually happens at runtime

When the workflow starts, the runtime does not “execute everything”. It begins a controlled sequence of recorded decisions.

1. Start of execution

The system creates a durable record:


WorkflowInstance: PizzaOrder-128
State: STARTED
History: []

It then executes Step 1.


→ Schedule activity: TakeOrder("margherita")

This is not executed inline. It is dispatched to a worker. The workflow pauses.

2. First suspension point (important idea)

At this moment, the workflow is not “running”. It is:

persisted
waiting
fully safe to crash
resumable from history


WorkflowInstance: PizzaOrder-128
State: WAITING_FOR(TakeOrder)
History:
  - Scheduled TakeOrder("margherita")

A worker eventually responds:


Result: ORDER-MARGHERITA

3. Replay begins (the non-obvious part)

Now comes the key durable execution concept. If the workflow needs to continue (or recover from failure), the runtime does not simply “continue execution”. Instead, it replays the workflow from the beginning using recorded history:


Replay:
  - Step 1: TakeOrder → already completed (from history)
  - Step 2: PreparePizza(orderId=ORDER-MARGHERITA)

This is where determinism matters, the workflow code must behave consistently during replay.

4. Another suspension

The system schedules the next activity:


→ Schedule activity: PreparePizza(ORDER-MARGHERITA)

Again, execution pauses.


State: WAITING_FOR(PreparePizza)
History:
  - TakeOrder completed
  - PreparePizza scheduled

A worker completes it:


Result: PIZZA-ORDER-MARGHERITA

5. Time becomes a first-class construct

Now the workflow reaches an unusual step:


WAIT 5 seconds

In a traditional system, this would require:

cron jobs
timers
sleep threads
external schedulers

In a durable system, time itself is persisted:


Workflow state:
  Timer scheduled: +5 seconds

The workflow is now completely idle, but still alive as a durable object. It can survive:

process crashes
machine restarts
deployments
network partitions

Nothing is lost.

6. Resumption after time passes

When the timer fires:


→ Resume workflow

Replay reconstructs state again:


History:
  - TakeOrder ✓
  - PreparePizza ✓
  - Wait(5s) ✓

Now execution continues:


→ Schedule activity: DeliverPizza(PIZZA-ORDER-MARGHERITA)

7. Completion

Finally:


→ Schedule activity: SendReceipt(deliveryId)
→ Workflow COMPLETE

At the end, the system has not just executed steps. It has produced a durable execution trace:


Workflow History:
  1. TakeOrder → ORDER-MARGHERITA
  2. PreparePizza → PIZZA-ORDER-MARGHERITA
  3. Timer → 5s elapsed
  4. DeliverPizza → DELIVERED
  5. SendReceipt → RECEIPT SENT

The key insight this example is trying to surface is that what matters is not the steps themselves, any system can execute steps. What matters is that the workflow is not stored as state we manage, but as history the runtime owns. This is why durable execution feels different from:

queues
cron jobs
orchestration services
worker pools

It is not just “a better way to coordinate tasks”, it is a system where execution itself becomes a recoverable data structure. Once we internalise this model, several earlier concepts become much clearer:

retries are not logic → they are replay
state is not stored → it is reconstructed
time is not external → it is persisted
workflows are not running → they are waiting, resuming, continuing

And this is why systems like Temporal feel less like task schedulers and more like a runtime layer for distributed computation over time.

Charging for the ink, not the ideas

fjavierm — Mon, 04 May 2026 13:08:48 +0000

I am sure most of you are familiar with the story of Charles Steinmetz, or one of its many variations. Steinmetz was a brilliant engineer at General Electric, and the story goes like this.

Henry Ford was having trouble with a massive generator and called in Steinmetz. After listening to the machine for a few moments, Steinmetz took out a piece of chalk and made a small ‘X’ on a specific metal casing. Ford’s engineers opened it up and found the defect exactly where the mark was. When Steinmetz sent a bill for $1,000, Ford, ever the businessman, asked for an itemised invoice. Steinmetz replied:

Making chalk mark: $1
Knowing where to mark: $999

Ford paid the bill without further question. He understood that the physical act was trivial; the value lay in the decades of experience required to know exactly where that one-dollar mark belonged.

We are living through a remarkably similar moment, yet we may be heading in the opposite direction. Many AI companies are attempting to persuade us that the act of generation is more valuable than what is generated. They are moving away from simple subscription models towards pay-per-token billing. Even those that have not fully transitioned are clearly pivoting that way. In doing so, they are, perhaps unintentionally, asking us to pay for the weight of the chalk.

A token is a mathematical fragment, the raw material of a response. Billing by the token reflects real computational costs, but it also participates in a market that can prize volume over validity. Intelligence starts to resemble a metered utility, much like water or electricity. But intelligence is not a liquid; it is a coordinate. In software and engineering, the most elegant solution is rarely the longest. A thousand lines of generated code may solve a problem, but they are often a liability, a burden of technical debt. Ten lines of precise logic can be a masterpiece.

Under the current model, however, the thousand-line mess can end up being priced as though it were a hundred times more valuable than the ten-line stroke of genius. We can find ourselves paying for the stuttering of the machine, the computational friction it incurs while searching for an answer, rather than for the answer itself. We are, quite literally, paying for the ink and ignoring the idea.

This push towards ever greater output is accelerating beyond the limits of human review. Where an engineer might once have produced a single page of clear, concrete documentation, a model now generates thousands. This is not necessarily progress; it can become a flood. It creates an artificial demand for even more powerful and expensive models, just to process the noise produced by earlier ones. We can end up in a loop in which we need AI to summarise the verbosity of other AI.

This begins to reveal a structural tension in the current AI arms race. The incentives do not always point towards efficiency; they often reward scale. By flooding the ecosystem with information, the need for larger contexts and more powerful reasoning becomes easier to justify. These, in turn, support higher price points and more ambitious positioning. The result is not a map to the ‘X’, but an ever-growing supply of chalk, along with the tools to manage it.

This creates a perverse incentive for the future of technology. If we measure the value of AI by the number of tokens it produces, we encourage a digital world of bloat. We risk being buried under a mountain of cheap, generated noise, where quantity is mistaken for quality. It is a system that can reward the machine for being chatty rather than correct.

The true revolution of artificial intelligence should not be that it makes the chalk mark easier to produce. The revolution is that it should help us find the ‘X’ faster. But as long as the price remains closely tied to the token, the industry will tend to focus on the tool rather than the result.

We should eventually demand a different kind of invoice from the architects of these models. We should stop subsidising the cost of digital ink and start valuing the precision of knowledge. Until we shift our perspective, we are not fully purchasing intelligence; we are still largely paying for the act of writing. Steinmetz’s ‘X’ was valuable because it was singular and precise. If he had covered the entire generator in chalk, his bill would not have been worth a penny, regardless of how much chalk he used.

A deeper question follows. What if AI does not consistently deliver what is promised, not because it cannot, but because the incentives are misaligned? At present, many incentives favour the production of large volumes of content that require millions of tokens and iterations to process. They do not always favour the creation of systems that can deal with complexity in a genuinely intelligent way. It is worth asking whether the pursuit of revenue might, at times, be steering us away from the outcome we actually want.

Fact-checking GitHub controversy

fjavierm — Thu, 30 Apr 2026 16:27:11 +0000

In recent days, a familiar kind of narrative has swept across developer circles: the claim that GitHub is ‘dying’. It has appeared in videos, threads, and blog posts with the usual hallmarks of online virality: strong opinions, selective facts, and a tone of urgency that suggests an imminent collapse. Given how central GitHub remains to modern software development, it is hardly surprising that such claims attract attention. Yet, as is often the case, the reality is both less dramatic and more interesting than the headline.

The current wave of criticism did not emerge from nowhere. It was catalysed, in part, by a public critique from Mitchell Hashimoto, a figure whose opinions carry weight within the developer community. His frustration centred on reliability: repeated outages, degraded performance, and an overall experience that, in his estimation, fell well short of what one expects from a platform of GitHub’s stature. His decision to move his terminal project, Ghostty, away from GitHub was not merely a personal choice but a symbolic gesture that resonated with others who had experienced similar issues. It is important, however, to interpret this moment with care. High-profile departures can shape perception disproportionately; they signal discontent, certainly, but they do not in themselves constitute evidence of a broader exodus.

Reliability concerns are nonetheless real. GitHub has experienced intermittent instability in recent months, and while no large-scale platform is immune to outages, expectations in this domain are exceptionally high. Developers rely on such services not only for storage but for collaboration, automation, and deployment pipelines. When interruptions occur, they ripple through entire workflows. The perception that reliability has slipped, even if only temporarily, can therefore have an outsized impact on trust. What remains less clear is whether these incidents represent a systemic decline or a series of unfortunate but ultimately transient issues. At present, the evidence supports frustration, but not collapse.

Source: GitHub’s own status logs for late April 2026

Alongside these operational concerns sits a more subtle, yet arguably more consequential, shift: changes to data usage policies for GitHub Copilot. As of late April 2026, GitHub moved to an opt-out model for certain forms of data collection used in training its AI systems. In practical terms, this means that interactions, such as prompts and generated code, may be used to improve models unless the user explicitly disables this behaviour. For many developers, particularly those working across personal and professional contexts, this introduces a degree of ambiguity that did not previously exist. The concern is not that entire repositories are being indiscriminately absorbed into training datasets, as some commentary has suggested, but rather that the boundary between private work and aggregated learning has become less immediately transparent.

It is worth noting that these concerns are not without qualification. Organisational and enterprise tiers are excluded from such data usage, and the opt-out mechanism remains available. Nevertheless, defaults matter. In software, as in many other domains, what is enabled by default often defines the practical reality of a system. The shift, therefore, is less about technical risk in the strictest sense and more about a recalibration of expectations around control and consent.

A further source of unease arises from changes to pricing. GitHub’s move towards a more usage-based model for Copilot reflects a broader industry trend: the recognition that AI-assisted tooling incurs substantial and uneven costs. Under such a model, light users may see little difference, while heavier users, those running extended sessions or integrating AI deeply into their workflows, may encounter higher and less predictable expenses. It is not difficult to understand why this has been received with scepticism. Developers tend to value clarity and stability in pricing, and any departure from that can feel, rightly or wrongly, like a shifting of the goalposts.

Compounding these issues was a temporary pause on new Copilot sign-ups, justified by GitHub as a measure to maintain service quality. Although this decision was framed in pragmatic terms, it inevitably fuelled speculation about underlying capacity constraints. Whether such speculation is warranted remains unclear; what can be said is that the optics of limiting access, even temporarily, sit uneasily alongside narratives of rapid expansion and technological progress.

Taken together, these developments form the basis of the current backlash. Yet it is equally important to consider what has been overstated or misrepresented in the process. Claims of a mass departure from GitHub, for instance, are not supported by credible evidence. The platform continues to dominate its space, and while alternatives such as GitLab attract periodic attention, there is no indication of a large-scale migration. Similarly, the suggestion that GitHub’s focus on artificial intelligence has directly caused reliability issues remains speculative. Correlation, in this case, has been readily interpreted as causation without sufficient proof.

Adding to the general unease, a number of more sensational claims began to circulate online, including suggestions that Copilot had, at one point, inserted what resembled promotional content into pull requests. These reports are difficult to substantiate and have not been confirmed by reliable sources, yet their spread is telling in itself. They reflect a growing suspicion among some developers that the platform’s priorities may be shifting in ways that are not entirely aligned with user interests. Even when such claims prove unfounded, they tend to gain traction in an environment where trust is already under strain.

What, then, should one make of the situation as a whole? Rather than signalling decline, it seems more accurate to view this moment as a period of adjustment. GitHub, like many technology platforms, is navigating the complex transition towards AI-integrated workflows while attempting to balance cost, performance, and user trust. Each of the current points of contention, reliability, data usage, or pricing, reflects a facet of that broader challenge.

For developers, the appropriate response is neither alarm nor indifference, but informed attention. The concerns being raised are not trivial; they touch on fundamental aspects of how tools are built, maintained, and monetised. At the same time, the more dramatic narratives obscure as much as they reveal. GitHub is not ‘dying’, but it is changing, and not all of those changes will be universally welcomed.

In the end, the significance of this episode lies less in any single policy or outage, and more in what it reveals about the relationship between developers and the platforms they depend on. Trust, once established, can be surprisingly resilient, but it is not immutable. Moments like this serve as a reminder that even the most entrenched tools must continually justify that trust, not through promises or positioning, but through consistent, transparent, and reliable behaviour over time.

The Zero Knowledge Era

fjavierm — Tue, 28 Apr 2026 19:14:23 +0000

This article is going to be a bit controversial. So let me start by saying that I have nothing against AI. I think it is an amazing tool with plenty of use cases where it is useful and helpful, but like many tools before it, it is simply a tool. It is up to us, as professionals, regardless of the field, to decide when to use it and how to use it. Making this decision should be a conscious action based on knowledge and experience.

With that said, I must add that we are taking the wrong approach. Let’s see a few scenarios:

Scenario 1

An engineer is reviewing a pull request created by another engineer to fix a performance problem. While reviewing it, one of the changes looks ‘weird’. At this point, the reviewer decides to ask the author what the logic behind the change is. The reviewer wants to know why they think the change will offer better performance and solve the problem. Surprisingly, the author responds with ‘I don’t know, the AI suggested that’.

Scenario 2

An engineer is digging into a project and finds some code belonging to a not-very-popular framework. While the engineer is not very experienced with this particular framework, they know that their teammates have been battling with it for some time now. They turn around and ask the rest of the team for help. Most of them look at the code and come back with ‘ I have no idea’, but one of them provides what looks like a very solid answer. As a follow-up, and trying to learn a bit more, the initial engineer asks some further questions and wonders whether references to documentation can be provided. At some point in that conversation, the second engineer ends up admitting that they have no idea either; the first response was simply what the AI told them.

Scenario 3

Two engineers have been working together for a very long time. After all this time, they know each other and they know their code styles: how each one structures code, what constructions they favour, and so on. One of them, while reviewing a PR, realises that the style in which the code is written deviates from what their colleague usually writes. Additionally, they see some constructions that they have never seen in the codebase, such as some ‘clever’ bitwise logic. After thinking hard to understand the logic, the reviewer realises that some edge cases are not covered. With that information, they go to the author and ask about it. The author replies with something similar to: ‘I have no idea; the AI wrote it, and the code looks elegant and efficient.’

Individually, these seem harmless. Collectively, they point to something more concerning.

I am sure that if you work regularly with other engineers, you will recognise some, if not all, of those scenarios. And that is the problem that I am finding lately: people applying modifications or creating new code in complex and critical codebases without having an understanding of what they are doing, just trusting AI responses without double-checking why the response was suggested, or why something was implemented in this or that way.

As I said at the beginning of the article, AI is a fantastic tool. If you want to implement scripts, one-off tools, or anything you do not care about how it was built, just about the final result, you can do in hours what used to take days. But I think we need to have more discipline when we are modifying codebases that run in production, that are complex and critical, that need troubleshooting at 3 a.m. when we are on call. Making changes without understanding does not seem like the right way to go, or to survive in the long run.

Maybe, one day, AI will be reliable and trustworthy enough to write code without human supervision, AI reviews, for example, for mission-critical projects, but until we get there, we need to keep humans in the loop. And not only to push the ‘Approved’ button to comply with SOC2 requirements, but making the effort to understand what we submit for review and what we review.

It seems that the pressure to be more productive, and especially the FOMO (fear of missing out) is pushing us to be less effective, less disciplined, less knowledgeable. How long will it take for a project to turn into a beast that can only be modified by using AI because no one knows anymore what is under the hood? How long will it take to troubleshoot it when the AI cannot do it, which is not uncommon nowadays?

Let me be clear: the problem is not AI; it is engineers outsourcing understanding. It is engineers pushing changes without understanding what they are pushing. This is why a ‘zero-knowledge era’ is emerging, an environment where code is written, modified, and deployed without anyone fully understanding it, and where systems continue to function until they suddenly don’t, unless we stop it.

What do you think?

Think in Tradeoffs, Not Best Practices

fjavierm — Wed, 22 Apr 2026 18:24:06 +0000

“Best practices” is one of the most popular phrases in software engineering, and also one of the most misleading. It carries an air of safety and responsibility, suggesting that difficult decisions have already been settled elsewhere by wiser people through experience, or communities of people by consensus, or technical maturity over time, and that a careful team only needs to identify the correct practice and apply it consistently.

Sometimes that assumption holds. There are areas of software where reinvention is wasteful, where certain defaults are demonstrably safer, and where repeated failure has already taught the lessons worth preserving. Some habits are justified often enough that ignoring them is simply inefficient. But the phrase becomes dangerous when it obscures the real nature of engineering work. Most meaningful decisions in software are not about selecting a “best” practice in isolation; they are about choosing a trade-off within a specific context. And context changes everything.

The right testing strategy depends on risk, team size, system shape, and release pressure. Architecture is shaped as much by domain complexity and operational maturity as by any abstract principle. Delivery processes reflect failure cost, regulatory expectations, rollback capability, and trust in automation. Even practices that seem straightforward, such as code review, abstraction, documentation, or service decomposition, shift in value depending on environment and consequence. This is why experienced engineers grow cautious around advice that sounds universal. Not because experience is unhelpful, but because it reveals where general guidance stops being general.

The idea of best practices exists for a reason. Engineering teams cannot rediscover every lesson from first principles; they need shared heuristics, conventions, and defaults that work well often enough to reduce unnecessary debate. In that sense, many so-called best practices are simply compressed experience. Advice such as validating inputs, using version control properly, automating builds and tests, avoiding hardcoded secrets, keeping dependencies updated, reviewing production changes carefully, monitoring systems, and limiting privilege is broadly sound. Much of it is essential.

The problem begins when this compressed experience is mistaken for complete reasoning. A principle can be widely useful and still be applied poorly if the team stops asking what it is trying to achieve, what assumptions the practice depends on, and what costs it introduces in a particular situation. Guidance is most valuable when it supports thought; it becomes harmful when it replaces it.

At the heart of this is a simple reality: every non-trivial engineering decision buys something and costs something. More abstraction may improve reuse but reduce clarity. More process may reduce accidental risk but slow change. More services may increase team autonomy while introducing operational complexity. More tests can improve confidence while adding maintenance overhead. Stronger security controls reduce exposure but often introduce friction and recovery costs. Flexibility can reduce lock-in but increase design burden.

This is not a flaw in engineering; it is the work itself. Good engineers learn to evaluate decisions in terms of consequences: what is gained, what becomes more difficult, who benefits, who pays later, and what must remain true for the decision to continue working well. These questions tend to be more valuable than asking whether a practice is modern, popular, or widely recommended. A best practice usually captures a remembered benefit; a trade-off analysis accounts for the cost as well.

One of the more subtle mistakes teams make is confusing “good in general” with “right now”. Strong testing discipline is valuable, but a team may still need to decide whether its next hour is better spent increasing unit coverage, fixing a failing deployment pipeline, or addressing a visibility gap that repeatedly causes production uncertainty. Documentation is important, yet not all documentation carries equal value, as probably every engineer has seen; some supports operational continuity, while some quickly becomes stale and adds maintenance noise. Loose coupling is desirable, but pursuing it too early can result in abstractions that serve hypothetical futures that may never arrive, better than present understanding.

The same applies at larger scales. Microservices may eventually be appropriate, but a modular monolith is often the better choice while a team is still clarifying the product and stabilising its delivery practices. Even code review, one of the most widely defended practices, does not deliver equal value in all contexts; its effectiveness depends on risk, team trust, system criticality, and release cadence. A shallow, ritualised review can be less useful than fewer, more deliberate reviews on meaningful changes. The relevant question is not whether a practice is respectable, but whether it represents the best use of time, complexity, and attention in the current situation.

Disagreements in engineering often reveal another limitation of the “best practice” framing: it tends to hide assumptions. Teams can argue passionately while invoking the same language. One group may describe microservices as best practice for scalability, while another argues that simpler monoliths are best practice for maintainability. One engineer may advocate strict test pyramids; another may favour end-to-end verification. One architect may emphasise standardisation; another, team autonomy.

These conflicts are rarely about the practices themselves. They are about the conditions those practices assume: expected scale, number of teams, failure tolerance, regulatory burden, tooling maturity, cost of change, team skill distribution, operational support quality, and the stability of the domain. Once those assumptions are made explicit, disagreements become easier to understand and often easier to resolve. The conversation shifts from competing claims of correctness to differing views of the environment being optimised for. Precise teams therefore spend less time appealing to abstract best practices and more time discussing constraints, risks, and desired outcomes.

What distinguishes a mature engineer is not a longer list of approved practices, but a stronger ability to trace consequences. Questions such as “What operational load will this introduce?”, “What delivery friction will this create?”, “What failures become easier if we relax this control?”, or “What debt becomes more expensive if we take the faster path now?” lead to better decisions than appeals to convention. This way of thinking is slower than slogan-driven decision-making, but far more reliable, and it produces healthier forms of disagreement. Instead of arguing at the level of identity, teams can argue at the level of impact: which risks are reduced, which costs are increased, and whether that exchange is worthwhile.

This clarity also explains why trade-off-aware teams are not necessarily more cautious. In some cases, they move faster than others precisely because they understand which risks are acceptable and which costs are not worth paying. Their speed comes from deliberate choice rather than adherence to fashion.

Another practical test of any practice is whether it can be sustained under ordinary conditions. Much engineering advice sounds compelling in ideal circumstances, but real systems operate under pressure: deadlines, fatigue, incomplete information, and evolving requirements. A review process that collapses under time pressure, a testing strategy that becomes unmanageable as the system grows, or a documentation model that cannot survive team turnover may not be best practice at all. It may simply be aspirational. Good teams therefore optimise not only for technical correctness, but for durability by choosing approaches that remain functional when systems are messy and time is limited. Sustainability is part of technical quality.

There is also a quieter benefit to thinking in trade-offs: it encourages honesty. When teams rely on the language of best practices, they can present decisions as if they were externally validated, borrowing certainty from the industry instead of owning the consequences themselves. Trade-off thinking removes that cover. It leads to more explicit reasoning: accepting certain risks because delivery speed matters more in a given context, introducing complexity because coordination costs have already become too high, deferring improvements because current failure modes are tolerable, or deliberately avoiding flexibility because the domain is not yet well understood.

This kind of clarity makes decisions easier to revisit and easier for future engineers to understand. It captures not just what was chosen, but why it made sense at the time, which is a far more durable form of knowledge than a claim that something was “best practice”.

Over the course of a technical career, many engineers move from a desire for certainty to a greater appreciation of nuance. Early on, best practices are reassuring; they provide direction and reduce ambiguity. With experience, working through projects, outages, migrations, failed abstractions, and conflicting constraints, confidence in universal answers tends to soften. Ideally, this does not lead to cynicism, but to precision. Experience should widen judgement, not harden it into dogma.

This does not mean that everything is relative or that no principles are worth defending. Some practices are strongly justified, and some trade-offs consistently favour one side. The difference is that experienced engineers tend to understand the boundary conditions more clearly: when a principle holds, when an exception is dangerous, and when competing concerns deserve more weight than usual. Trade-off thinking is not an excuse for vagueness; it is a discipline that requires attention to consequences, constraints, and priorities.

In practice, best practices remain useful as starting points. They help teams prevent avoidable mistakes, preserve lessons that should not need to be relearned, and provide shared defaults that reduce chaos. But they are not substitutes for engineering judgement. Good engineers do not ignore them; they interrogate them. They ask what a practice is protecting, what it costs, what assumptions it carries, and whether those assumptions hold in the system in front of them.

Software engineering is not a search for approved answers. It is a discipline of constrained choices, where every meaningful improvement competes with costs in complexity, speed, flexibility, or operational burden. The teams that understand this tend to build better systems, not because they know more slogans, but because they know how to think in trade-offs.

The Security Lessons Developers Keep Learning Too Late

fjavierm — Mon, 20 Apr 2026 19:05:18 +0000

Software teams do not usually ignore security because they are careless. More often, they learn certain lessons late. They learn them after an incident, after a rushed audit, after a customer asks a difficult question, after a secret leaks, after an internal tool turns out to be less internal than everyone assumed, or after a system that once felt “good enough” is suddenly subjected to a level of scrutiny it was never designed to survive.

That pattern matters because many security problems are not obscure. They are recurring, ordinary, and structurally familiar. Teams rediscover them because delivery is rewarded early while discipline is often deferred until later, as though the harder lessons can be absorbed once the product is already moving. Sometimes that works. Often it means the lesson arrives at the worst possible moment.

What follows is not a catalogue of exotic attacks. It is a set of lessons that keep becoming important only after a team has already paid for postponing them.

Internal systems are part of the attack surface

One of the most expensive assumptions in software is that “internal” means “safe”. Internal dashboards, admin tools, metrics panels, staging systems, support utilities, background endpoints, and service-to-service interfaces often receive weaker scrutiny because they are not perceived as public. They may have looser authentication, broader permissions, weaker logging, or older dependencies. Sometimes they become reachable by accident. Sometimes they are reachable through VPN access, compromised accounts, misconfigured proxies, jump hosts, or adjacent systems that were never meant to carry real trust. Have you heart of pivoting?

Attackers do not respect the boundaries teams imagine. They work with the ones that actually exist. That is why internal tooling deserves the same seriousness as external-facing systems when it has real power. Strong authentication, constrained privilege, careful exposure, useful audit trails, and explicit ownership are not luxuries here. Many teams only discover that after learning, too late, that the “non-public” system had production consequences all along.

Secrets spread faster than expected

Most developers already know not to commit secrets directly to source control, that, we hope at least. The trouble is that secrets do not leak only through repositories. They leak through logs, screenshots, copied configuration files, support bundles, CI output, browser storage, shell history, chat messages, crash reports, test fixtures, and shared documents. Once a credential has appeared in enough places, rotation becomes harder, ownership becomes blurrier, and confidence in exposure status begins to collapse.

Teams often learn too late that secret management is not mainly a storage problem. It is a lifecycle problem. Where are secrets created? Who can read them? Where are they copied automatically? How are they rotated? How do you know an old credential is truly dead? Can ordinary debugging happen without exposing them? The longer a team waits to treat secrets as operational hazards instead of configuration trivia, the messier the eventual cleanup becomes.

Validation does not replace authorisation

Many teams invest serious effort in input validation and still miss one of the more consequential questions: who is allowed to do this at all? Security defects are often framed as malformed-input problems, and sometimes they are. But a large class of failures comes from requests that are valid in shape and dangerous in context. A user can access another user’s data by changing an identifier (e.g., Insecure Direct Object Reference). An internal tool exposes actions beyond its intended role. A background job can trigger privileged behaviour without the right checks. An API verifies structure but not entitlement.

This is why authorisation bugs are so persistent. The request may look legitimate in every superficial sense. The danger lies in the relationship between actor, action, and state. Strong validation does not compensate for a weak authorisation model. Teams need explicit answers to questions such as who can do this, under what conditions, according to which source of truth, enforced where, and logged how. When those answers are vague, the system usually has more authority than the interface suggests.

Dependencies are trust decisions

Modern software is assembled rather than handcrafted from scratch. That is normal and often necessary, but it means every library, plugin, container image, CI action, SDK, and transitive package introduces more than convenience. It introduces trust. A dependency brings someone else’s release discipline, someone else’s vulnerability response, someone else’s maintenance quality, someone else’s design assumptions, and someone else’s compromise risk into your own environment. Have we seen lately the increase in Supply Chain attacks?

The mistake is not using dependencies. The mistake is treating them as costless. Teams do this when they adopt libraries for marginal convenience, leave old packages unreviewed for years, or pull in tooling without understanding what it can access in CI, production, or developer machines. By the time a dependency problem becomes visible, the supply chain is often already tangled. The lesson many teams learn late is that dependency discipline is part of security architecture, not just package management.

Observability can become exposure

Observability is essential, but more data is not automatically better. Teams often improve logs, traces, and error reporting to help operations, then discover later that they have created a shadow data store full of credentials, personal data, internal identifiers, stack traces, or request content that was never meant to persist. This happens because diagnostic value and security value do not always align.

A useful question is not only whether a log entry will help during debugging. It is also whether the data should exist in logs at all, who can read it, how long it will be retained, and how easily it could be abused if copied. Many organisations learn late that logs often end up with broad access and weak review. Good observability requires discretion as well as detail.

Least privilege fails through drift

Almost everyone agrees with least privilege in theory. In practice, systems drift away from it constantly. Permissions expand because removing them feels risky. Service accounts get broad access because it is quicker. Temporary admin rights become routine. Internal tools inherit production capabilities they do not really need. Old roles survive long after the product that justified them has changed. Tokens remain valid because rotation is inconvenient.

This drift is rarely dramatic, which is precisely why it becomes dangerous. By the time someone reviews the privilege model seriously, a large amount of access may already exist without clear justification. Teams tend to learn this during compromise or audit pressure, when reducing privilege is suddenly urgent and much harder than it would have been earlier. Least privilege is not just a design principle. It is an ongoing maintenance discipline.

Controls that depend on heroics do not last

Some security measures look good on paper because they assume unusually disciplined people will always do the right thing. Manual secret rotation with no automation, patching processes that depend on exceptional coordination, release approvals that only one overloaded person really understands, and incident procedures preserved mostly in tribal memory can all work for a while. They especially seem workable in small teams with strong individuals.

But systems grow, people change roles, time pressure rises, and fatigue accumulates. Controls that depend on heroics eventually fail because ordinary conditions are more common than ideal ones. A practice that only works when everyone is careful, rested, available, and fully informed is not yet robust. Teams often realise that too late.

Attackers do not need defender-level consistency

Engineering teams naturally think in terms of repeatability, correctness, and clean design. Attackers do not need the same level of consistency. They may only need one forgotten credential, one overprivileged account, one stale endpoint, one inconsistent authorisation check, one overlooked internal tool, or one dependency that was trusted too casually. That asymmetry is one reason security often feels unfair. Teams can do many things right and still be exposed through a single weakly governed corner.

The point is not despair. It is prioritisation. Weakly governed edges often matter more than polished centres. Security improves when teams stop assuming that the parts of the system receiving less attention are therefore receiving less risk.

Retrofitting security is expensive

Many teams still treat security as something added after the main engineering decisions are settled. The product is designed, the interfaces are defined, the privileges exist, the data flows are chosen, and the integrations are approved. Only later does someone ask how the whole thing will be secured. By then, much of the answer is already constrained.

This is why late-stage security work so often feels painful. The issue is not only missing controls. It is that the architecture did not leave room for simple controls to work cleanly. No clear trust boundaries, poor separation of duties, weak auditability, shared data ownership, unclear service identities, or fragile secret distribution all become more expensive once the system is already established. The earlier security enters design, Shift Left Security, the more it looks like good engineering. The later it enters, the more it looks like compensation.

What teams usually need first

Security produces a large vocabulary: zero trust, shift left, defence in depth, supply chain risk, secure by design, posture, resilience. These ideas can be useful, but teams often absorb the language faster than they improve the habits. The recurring habits that matter are much less glamorous. Clarify trust boundaries. Constrain privilege. Choose fewer unnecessary dependencies. Keep sensitive data out of logs. Authenticate internal tools seriously. Review architectural exposure early. Make controls sustainable. Understand where authority is actually enforced.

These habits do not sound novel, which is part of why they are learned so late. They are not exciting enough to attract attention early, but consequential enough to become obvious after failure.

What matters in practice

Teams usually remember the security lessons that hurt. The challenge is to learn some of them before the expensive reminder arrives. Across many incidents, the pattern is not that teams lacked intelligence. It is that they postponed discipline in places that did not feel urgent yet: internal tooling, authorisation, secrets, dependency governance, privilege boundaries, sustainable controls, architectural clarity.

That postponement is understandable. It is also costly. The most valuable security lessons are often the least theatrical ones. They depend less on exotic attackers than on whether a team treats trust, access, exposure, and operational reality as first-class engineering concerns before something forces the issue. That is usually the difference between a hard lesson and a late one.

When Engineering Performance Starts Looking Like Capital

fjavierm — Sat, 18 Apr 2026 10:35:49 +0000

There has always been an uncomfortable truth in software engineering: performance is never measured in a vacuum.

What is changing is how directly that performance is now mediated by tools that are unevenly distributed, commercially controlled, and increasingly central to the work itself.

The industry likes to talk as if engineers are evaluated in clean, individual terms. Talent. Skill. Productivity. Output. Technical depth. But anyone who has worked across different environments knows the picture has never been that pure. Some engineers work with strong infrastructure, good internal tooling, realistic deadlines, thoughtful leadership, and enough slack to think clearly. Others work with brittle systems, thin budgets, weak tooling, constant interruption, and an atmosphere where every shortcut becomes tomorrow’s technical debt.

Now AI is intensifying that difference.

Not because unequal conditions are new, but because of the speed, visibility, and scale at which they now shape perceived engineering performance.

An engineer with access to the best models, the highest limits, the best integrations, and a company willing to spend freely on AI assistance can look dramatically more productive than the same engineer in a more constrained environment. Faster drafts. Faster debugging. Faster summarisation. Faster code review. Faster scaffolding. Faster documentation. Faster exploration of unfamiliar systems. The same person can appear sharper, more responsive, more prolific.

And that should make us uneasy. Because the closer engineering output gets to being mediated by paid cognitive leverage, the easier it becomes to confuse judgment with access.

The same engineer, different budget

This is the part people should sit with more seriously.

Take one competent engineer. Put them in a company with generous AI budgets, premium models, high usage caps, strong internal automation, well-funded infrastructure, and managers who encourage experimentation. Now put that same engineer in a company where model access is restricted, rate limits are low, approvals are slow, context windows are smaller, integrations are weak, and every token of usage is treated like a cost centre to be justified.

Those are not equivalent working conditions, and they do not produce equivalent visible performance.

In the first environment, that engineer may look unusually fast. In the second, the same engineer may look merely solid, or even slow relative to peers who have learned to optimise for scarcity in less visible ways.

This matters because the industry is beginning to talk about AI-augmented output as if it were simply personal productivity. But a growing part of that productivity is purchased.

Not entirely purchased. Judgment still matters. Taste still matters. Verification still matters. Knowing how to decompose a problem still matters. Access is a multiplier, not a substitute. But multipliers do not only amplify real strengths. They also change how those strengths are seen, and how quickly weaker work can pass as stronger than it is.

That means the performance gap we are observing is not always a pure skill gap. Sometimes it is a capital gap wearing the language of merit.

What better model access looks like in practice

The abstraction matters less once you reduce it to day-to-day engineering work.

An engineer debugging a failing production issue with a high-context model, generous limits, and repository integration can iterate through logs, traces, likely failure paths, and code history in minutes. Another engineer with a weaker model and a copy-paste workflow spends the same hour reconstructing context by hand.
An engineer reviewing a pull request with AI assistance can get fast summaries, risk prompts, and test suggestions before leaving a comment. Another engineer without that support may still reach the better conclusion, but more slowly and with less visible throughput.
An engineer onboarding into an unfamiliar codebase with large context windows and solid integrations can ask better questions sooner. Another engineer may have the same underlying ability but appear less fluent simply because their tooling keeps breaking the thread of understanding.

None of this invalidates the first engineer’s output. It means the visible difference is no longer easy to read as an individual property.

Richer environments make mediocre engineers look better too

There is another uncomfortable point here.

Better models do not only help great engineers. They also help mediocre ones appear more capable for longer.

A weaker engineer with powerful tools can produce cleaner-looking drafts, more plausible explanations, more complete-seeming implementations, and more polished communication than they could have produced on their own a few years ago. Sometimes that lift is genuinely useful. Sometimes it makes them operationally better. But sometimes it mainly improves surface performance.

This creates a new problem of legibility.

If better access can elevate everyone’s visible output, how do we distinguish between:

strong judgment and strong assistance
real depth and fluent assembly
durable engineering ability and temporary model-backed competence
someone who understands the system and someone who is good at navigating augmented workflows around the system

This is not an argument against augmentation. It is an argument against naivety.

AI compresses visible differences in some layers of work while amplifying hidden differences in others. Engineers who can decompose problems well, ask better questions, and verify aggressively often get far more leverage than engineers who cannot. So the effect is not that skill disappears. It is that skill becomes harder to read quickly, because access and ability are now more entangled on the surface.

And in a fast industry, harder to read quickly is a serious problem.

The old signals are getting noisier

Software engineering has never had perfect performance metrics. But many of the usual signals are becoming less reliable in an AI-heavy environment.

Large output volume? That may reflect good model access.
Fast prototype delivery? That may reflect better tooling and paid inference.
Polished internal docs? That may reflect better AI assistance.
Quick onboarding into a codebase? That may reflect premium context windows and better integrations.Strong pull request throughput? That may reflect a well-funded augmentation stack as much as personal efficiency.

None of this means those outputs are fake. It means they are no longer easily interpretable as individual signals.

The risk is that companies, managers, and even engineers themselves start mistaking tool-mediated acceleration for personal superiority. Once that happens, the mythology rebuilds itself very quickly. The high-performing engineer. The ten-times team. The unusually productive hire. The engineer who just gets more done.

Maybe. Or maybe they sit closer to capital.

The industry already knows this pattern, but prefers not to say it plainly

This is where the broader economic parallel becomes hard to ignore.

Modern industries are very good at turning environmental advantage into stories about individual merit. Better schools become talent. Better networks become hustle. Better tools become personal brilliance. Software engineering has never been immune to that pattern. People already know that prestige companies, strong peers, better hardware, stronger internal platforms, and healthier engineering cultures can radically change how capable someone appears.

With LLMs the effect becomes more immediate and more intimate, because the augmentation sits so close to the act of thinking and producing.

The tool is no longer just around the work. It is embedded directly in it: drafting, exploring, reframing, and accelerating.

So unequal access to the tool starts to shape not only output, but the social interpretation of intelligence and competence.

That should make us more cautious about how quickly we attach status to visible performance.

Great engineers still exist, but it may take longer to recognise them

I do not think this makes engineering judgment less important. If anything, it makes it more important.

The best engineers will still be the ones who can evaluate bad suggestions, spot hidden risk, reason about tradeoffs, notice when the model is confidently wrong, preserve simplicity, and make decisions that hold up after the first draft. They will still be the people who can operate under ambiguity, understand consequences, and see the system rather than just the generated artifact.

But here is the catch: those qualities may become slower to detect.

When tools make mediocre work look more competent on first impression, the gap between a great engineer and an average one may reveal itself less in raw speed and more in second-order effects:

what breaks later
what remains maintainable
what scales cleanly
what survives ambiguity
what still makes sense six months after the sprint demo
who can still deliver when the model is wrong, limited, unavailable, or misleading

In other words, the difference may become less obvious in immediate output and more obvious in accumulated consequences.

That is a harder kind of excellence to measure, and a less convenient kind for an industry that increasingly wants instant signals.

Big tech benefits from blurred attribution

This part is less about conspiracy than incentive structure.

The companies selling the strongest models benefit from a story in which access to their systems looks increasingly similar to access to intelligence itself. The more engineering success is narrated through model quality, model tier, model integration, and model scale, the easier it becomes to normalise an industry where cognition is rented through large platforms.

That does not mean the tools are useless. Many of them are genuinely useful. It means the surrounding incentives are not neutral.

If the dominant story becomes the best engineers are the ones who perform best with the best paid models, then providers are not merely selling tools. They are also benefiting from a standard of competence that becomes harder to separate from subscription tier, integration depth, and spending power.

That should concern people even if they are optimistic about AI.

Because once a profession starts tying visible excellence too tightly to expensive proprietary augmentation, it becomes easier for power to consolidate around whoever controls the highest leverage.

Open access is not a complete answer, but it matters

One reason open models and more accessible tooling matter is not only competition. It is legibility and fairness.

If meaningful AI leverage is available only through the most expensive systems, then the profession drifts toward a world where capital increasingly determines who gets to look fast, polished, and capable. Wider access does not remove inequality, but it does reduce one layer of artificial advantage.

That matters for individuals. It matters for smaller companies. And it matters for the health of the engineering profession.

A field that cannot distinguish between deep skill and expensive augmentation becomes easier to manipulate, easier to stratify, and harder to trust.

What we should be more honest about

We are entering a period where some engineers will genuinely become better because of AI, and some will merely become better-presented.

Those are not the same thing.

The distinction will often be hard to see early. It will take better judgment from managers, better skepticism from teams, and more patience from the industry than the industry usually likes to show.

We should be more honest that money now buys not only infrastructure and labor, but increasingly buys cognitive amplification. We should be more honest that this will distort performance comparisons. We should be more honest that some of the new language of merit will quietly be language of access. And we should be more honest that the market has a direct interest in making this dependence feel natural, inevitable, and desirable.

This does not mean rejecting AI. It means rejecting the fantasy that a heavily capitalised form of augmentation is the same thing as neutral progress.

The question underneath the question

The real issue is not whether tools improve engineers. Of course they do.

The real issue is what happens when the means of improvement are unevenly distributed, commercially controlled, and socially misread as proof of individual superiority.

At that point we are no longer just talking about productivity.

We are talking about who gets to appear excellent, who gets mistaken for mediocre, who gets hired faster, who gets promoted sooner, whose judgment is trusted, and whose limits are treated as personal failure when they may actually be budgetary conditions imposed from above.

That is not a side issue. It is the issue.

The earlier the industry learns to say that plainly, the better its chances of using these tools responsibly rather than drifting into a future where engineering merit becomes difficult to separate from purchased advantage.

Resilience. Keep Distributed Systems Alive

fjavierm — Sun, 01 Mar 2026 19:39:44 +0000

Talk to enough backend engineers and you will eventually hear some version of this story:

“Nothing actually broke. Everything just got slower… until it stopped working.”

Distributed systems rarely fail with a bang. A service times out, clients retry, queues fill, latency spreads, and suddenly the entire platform behaves like a crowded motorway where every driver keeps tapping the brakes.

What’s striking is not that this happens, it’s that many engineers building production systems have never been formally introduced to the ideas designed to prevent it. Terms like exponential backoff, circuit breaker, bulkhead, token bucket, or load shedding sound esoteric, even though they describe mechanisms as fundamental as memory management or indexing. These are not implementation details. They are the control theory of modern software.

And as AI makes it trivial to generate functioning services, this kind of systems thinking is becoming the real differentiator between software that works and software that survives. And, probably, the difference between engineers who design durable systems and those who unknowingly ship fragile ones.

In traditional software, failure was often discrete. A process crashed, a machine went offline, a database corrupted. You debugged, fixed, restarted (the good old times).

Cloud-native systems introduce an entirely new class of failure modes. They are alive with partial availability:

A dependency slows but does not fail
A region degrades but still responds
Requests succeed… just too slowly
Retries amplify load
Healthy components become collateral damage

This phenomenon is explored deeply in works like Release It! and Designing Data-Intensive Applications, but many engineers encounter it only during their first major incident. The core danger is not failure itself. It is uncontrolled reaction to failure. The following ideas didn’t emerge from theory. They emerged from postmortems on systems that failed in exactly these ways.

Exponential Backoff

Let’s imagine we are on-call, and a service call times out. In this scenario, the most instinctive answer is to try again. While it is not wrong, it can be incomplete. If thousands of clients retry immediately, the struggling service receives a sudden surge of new requests precisely when it is least capable of handling them. The system is not recovering; it is being hammered.

This is where exponential backoff enters the picture. The idea is simple: the more failures you observe, the longer you wait before trying again. Crucially, different callers wait for different lengths of time, so they don’t all stampede back at once. Conceptually, this mirrors real-world congestion control. When traffic jams form, metered ramps and staggered entry prevent waves of cars from worsening the blockage.

While the pattern doesn’t fix the underlying issue, it prevents panic from making it worse.

Circuit Breaker

But retries alone cannot solve everything. If a dependency is failing consistently, continuing to call it at all may be wasteful or dangerous.

Borrowed from electrical systems, the idea is almost philosophical: after enough failures, stop trying. Fail fast. Give the system space to recover. Instead of waiting on timeouts that tie up resources, the application immediately returns an error or fallback response. After a cooling-off period, it cautiously tests whether the dependency has recovered. While this behaviour feels counterintuitive because engineers are trained to maximise success rates, in distributed systems, refusing work can be the act that preserves the ability to do any work at all.

Bulkhead Pattern

Even with smart retries and fast failure, trouble in one part of a system can spread through shared resources.

Consider a service that talks to multiple downstream systems. If one of them becomes slow, threads accumulate waiting for responses. Eventually, there are no threads left for anything else, including healthy dependencies.

The bulkhead pattern addresses this by isolating resources. Just as ships are divided into watertight compartments, systems allocate separate pools for different activities. One flooding compartment does not sink the vessel. This principle appears everywhere once you start looking for it: separate queues, isolated worker groups, per-tenant limits, even independent microservices.

Rate-limiting

So far we’ve discussed reactions to failure. But many outages are caused not by faults, but by sheer volume. Every system has a finite processing capacity. When incoming requests exceed that capacity, queues grow, latency spikes, and eventually the system collapses under its own backlog.

Rate-limiting mechanisms enforce a simple rule: requests are allowed at a sustainable pace, with limited tolerance for bursts. Excess traffic is delayed or rejected. This is not just about protecting infrastructure. It’s about fairness and predictability. Without limits, a single noisy client can degrade service for everyone.

Large platforms use these mechanisms not as emergency tools but as everyday traffic shaping, the software equivalent of speed limits and traffic lights.

Load Shedding

Load shedding may be the most counterintuitive pattern of all. When a system is overwhelmed, the instinct is to try harder: spin up more workers, process faster, squeeze every ounce of throughput from the hardware. But beyond a certain point, this effort becomes self-destructive. The system spends more time managing overload than serving useful work.

Load shedding flips the perspective. Instead of attempting to serve everyone poorly, the system deliberately refuses some requests so it can serve others well. Nonessential features may be disabled. Expensive operations deferred. Low-priority traffic rejected.

Airlines do this. Power grids do this. Even the human body does this under stress. Graceful degradation is not failure. It is survival.

Individually, each technique addresses a specific problem. Together, they express a deeper principle: distributed systems must regulate themselves under stress. One pattern slows demand. Another isolates damage. Another prevents futile work. Another enforces fairness. Another sacrifices noncritical functionality to preserve core operations. Seen this way, resilience engineering begins to resemble ecology or economics more than programming. You are designing feedback loops, not just writing code.

AI tools can now generate working services in seconds (let’s not go deeper in this assessment). They can scaffold APIs, configure deployments, and even suggest architecture diagrams. What they do not yet do reliably is reason about emergent behavior under failure. As software creation accelerates, two trends emerge:

Systems become more interconnected
Failure modes become more complex

The bottleneck shifts from writing code to designing systems that remain stable under unpredictable conditions. In that environment, understanding resilience patterns is less like knowing a framework and more like understanding physics. It shapes every design decision, even when invisible. Engineers who internalise these ideas will build platforms that feel calm and dependable. Those who don’t will unknowingly construct systems that work beautifully, right up until they don’t.

Users rarely notice resilience when it works. They only experience its absence. Behind every highly available service is not just redundancy or scaling, but a network of small, deliberate decisions about how the system behaves when things go wrong.

Retry: but not too quickly
Call dependencies: but not blindly
Share resources: but not indiscriminately
Accept traffic: but not endlessly
Serve features: but not at the cost of survival

These decisions are not implementation details. They are the difference between a platform that collapses under pressure and one that bends without breaking. In the end, resilience is not a component you install. It is a mindset you design into the system from the start, a quiet architecture of restraint, isolation, and controlled imperfection that keeps everything running when the world inevitably catches fire.