DEV Community: Doogal Simpson

The To-Do Method: Control AI Coding Agents in Your IDE

Doogal Simpson — Sat, 25 Jul 2026 10:52:26 +0000

TL;DR: Tired of AI agents writing sloppy, off-target code? The "To-Do" method restores fine-grained control inside your IDE. By dropping explicit inline comments (like // TODO: agent - refactor this) directly into your files, you can guide agents through iterative, high-standard implementations while you review and queue up tasks in real time.

When you delegate a feature to an AI coding agent, you often end up spending more time fixing its architectural shortcuts than you would have spent writing the code from scratch. The agent produces code that technically runs, but fails your standard for naming, structure, or performance. Instead of fighting with complex prompts in a chat window, you can manage agent code quality directly in your files using the To-Do Method. It treats the agent like a hyper-fast pair programmer who just needs constant, highly localized steering inside your editor.

What is the To-Do Method for AI Agents?

The To-Do Method is a workflow where you direct an AI agent entirely through structured comments left directly inside your codebase. Instead of writing external prompts, you write inline notes, tell the agent to run, and then review and iterate on its output in place.

Think of it like leaving red-pen edits on a draft. You drop highly targeted instructions right where the work needs to happen. For example, you can write:

// TODO: agent - Write a function that returns true if email contains an @ symbol.
export function isValidEmail(email: string): boolean {
  return email.includes('@');
}

You point the agent to your file and say, "Go fix the comments." The agent scans the code, finds your markers, replaces them with code, and stops.

How Do You Keep AI Code Up to Your Engineering Standards?

You maintain high standards by using tight, iterative feedback loops and treating agent-generated code as a draft that you ruthlessly critique. When an agent delivers code that isn't up to par, you do not rewrite it yourself; you write a comment telling the agent exactly why it failed and how to refactor it.

For example, if an agent writes a sprawling, unreadable nested conditional, don't accept the PR. Drop a comment directly above it:

// TODO: agent - This implementation is too complex. Refactor this using guard clauses and extract the validation logic into a helper function.

The To-Do Method Lifecycle

Phase	Developer Action	Agent Action
1. Direct	Write explicit `TODO: agent` comments in the codebase.	Idle.
2. Execute	Trigger the agent and point it to the modified files.	Reads comments, replaces them with code, and removes the TODOs.
3. Critique	Review the diff. If it's messy, write a new `TODO: agent` to refactor.	Idle.
4. Parallelize	Write TODOs in File B while the agent works on File A.	Implements File A.

This method pairs beautifully with Test-Driven Development (TDD). You can write your test cases first, leave a comment like // TODO: agent - Make these tests pass using the strategy pattern, and let the agent do the manual labor while you watch the test runner.

Can You Work in Parallel with an AI Agent?

Yes, you can work in parallel with an AI agent by queueing up tasks in different files while the agent is executing your previous instructions. Because the agent is confined to the specific files and comments you tell it to watch, you can stay focused in another part of the editor.

Imagine you are building a complex checkout system. While the agent is busy implementing a calculated tax helper in your service layer, you can move ahead to the controller or the frontend components, writing your next set of TODO: agent comments. By the time you finish mapping out the architecture, the agent has finished the boilerplate, and you are ready to review its work. It keeps your hands on the keyboard and your mind focused on system design.

FAQ

How do I prevent the AI agent from deleting my manual comments?

Clearly prefix your instructions (e.g., // TODO: agent - [instruction]) and specify in your system instructions or agent settings that it should only remove a comment once the requested implementation is fully complete and verified.

Does this method work with all AI coding assistants?

Yes, this workflow works with any agentic IDE tool (like Cursor, Copilot Workspace, or Windsurf) that can read your workspace context and modify files based on natural language commands.

Isn't writing inline comments slower than just chatting with the AI?

While it takes a few extra seconds to type comments in-file, you save time by keeping the agent focused on precise context, eliminating the need to copy-paste code back and forth between your IDE and a browser.

Game Theory in Software Engineering: How Cooperation Wins

Doogal Simpson — Thu, 23 Jul 2026 10:56:16 +0000

In software engineering, optimizing for individual metrics at the expense of your team is a losing strategy. I believe that while "selfish" coding yields quick short-term wins, long-term career compounding and technical success come exclusively from consistent, reputation-building cooperation.

I've always found game theory fascinating, especially when it maps directly to how we build software. Imagine walking into a room where you can grab a quick fifty quid by taking advantage of someone, or earn a single pound by cooperating. If you are only playing this game once, the selfish move makes mathematical sense.

But in software engineering, I see devs make the mistake of treating their daily work like a single-play game when it is actually a long, repeated interaction. The codebase is the room, and your engineering team is the crowd watching your every commit.

Why does game theory matter in software development teams?

I've observed that game theory, specifically the Iterated Prisoner's Dilemma, directly models how trust and collaboration scale in technical teams. While a developer can temporarily boost their personal metrics by ignoring team needs, I find that this selfish behavior ultimately destroys the trust required for long-term career growth. Consistent cooperation creates a compounding network effect where everyone wins.

Let’s look at how this plays out in a normal sprint. If you decide to work in a complete silo—cranking out features, ignoring junior developers, and skipping thorough code reviews to inflate your personal ticket count—you are playing a single-round game. You might get the temporary satisfaction of looking highly productive to a manager who only looks at Jira charts.

But I often see this backfire when you need help debugging a complex production issue, or when you need a crucial pull request reviewed quickly. Your team remembers your past choices, and they naturally adapt. Cooperating with a non-cooperative player is a losing strategy, so your teammates will eventually stop prioritizing your requests, leaving your work stalled.

What is the difference between single-round and iterated games in engineering?

From what I've seen, a single-round game assumes you will never interact with the other party again, making selfish optimization highly rational. In contrast, an iterated game relies on repeated interactions where players remember past choices, making cooperation the only strategy that yields compounding, long-term rewards.

I like to break down specific engineering choices to show how they play out across these two strategies:

Engineering Action	Single-Round Strategy (Selfish)	Iterated Game Strategy (Cooperative)	Long-Term Career Impact
Pull Request Reviews	Rush through with a generic "LGTM" to get back to your own tasks.	Leave thoughtful, constructive feedback and help mentor the author.	High trust, cleaner codebase, and faster reviews when you submit your own PRs.
Knowledge Sharing	Hoard domain knowledge to make yourself seem indispensable.	Document systems, write clean runbooks, and actively pair-program.	Reduced burnout, seamless delegation, and clear readiness for leadership roles.
Technical Debt	Ship hacky code quickly to hit individual sprint deadlines.	Write clean, maintainable code and refactor nearby debt where possible.	Shorter lead times, fewer production bugs, and a team that actually enjoys working with you.

How does team-level cooperation drive individual developer success?

I believe that while cooperating reduces your personal daily output slightly, it exponentially raises the productivity of the entire team. Because engineering is a team sport, I am convinced that being part of a highly successful team is far more career-advancing than being a high-performer on a failing team.

I think of this as a compounding interest rate. Earning that "one pound" of mutual trust every day doesn't look like much on a random Tuesday. But over a year, a team operating on high trust moves incredibly smoothly.

If you choose to help a teammate unblock their local environment instead of writing your next function, you might close one fewer ticket this week. But you have built a critical node in a resilient network. In my experience, when the team succeeds, everyone's stock rises. Hiring managers and tech leads do not look for the lone wolf who wrote thousands of lines of unmaintainable code; they look for the force multiplier who made the entire department better.

FAQ

How do you handle a teammate who is consistently playing a selfish game?

I recommend using the classic game theory strategy of "tit-for-tat with forgiveness." You should mirror their boundary by not over-extending to bail them out of self-inflicted blocks, but immediately resume cooperation the moment they show collaborative behavior.

What are the risks of over-cooperating in a low-trust engineering culture?

I've noticed that over-cooperating in an environment where leadership only rewards individual output can lead to severe burnout and low performance ratings. If your organization's performance rubrics do not value collaboration, I suggest balancing team support with visible individual contributions while searching for a healthier team culture.

Can you automate cooperation in a software delivery pipeline?

I believe you can bake cooperative behavior into your team's workflow through automated guardrails like linting, automated testing, and branch protection. By codifying standard practices, you remove the friction of personal negotiations and make cooperative coding the default path of least resistance.

8 Levels of AI Autonomy in Software Engineering

Doogal Simpson — Thu, 23 Jul 2026 10:25:20 +0000

TL;DR: AI tooling in software engineering is a spectrum, not a binary switch. We are moving from simple inline autocompletion to managing parallel agents, building hierarchical \"Gas Town\" setups, and eventually operating \"dark factories\" where autonomous systems write, deploy, and self-heal codebases based directly on production metrics.

Working in software right now is weird. The industry is obsessed with AI, but most developers are stuck thinking about it as a binary choice: either you use Copilot to write boilerplate slightly faster, or you get replaced by a bot.

It is not that simple. We are looking at a clear spectrum of autonomy—and understanding where you sit on it is the only way to avoid drowning in the noise.

How do AI tools differ in software development?

AI tools differ primarily by their degree of autonomy and the scope of context they can access. At the low end, they operate as passive assistants on a single line of code; at the high end, they act as autonomous systems capable of diagnosing issues, writing code, and deploying software without human intervention.

To map out this landscape, we can categorize AI integration into eight distinct levels of maturity:

Level	Name	What it does	What you actually do
1	Spicy Autocomplete	Tab-completion	Type code, hit tab
2	Agent Advisor	Smart search & codebase querying	Ask questions, read output
3	Pair Programmer	Collaborative task execution	Code and review line-by-line
4	Agent Manager	Spins up parallel worker bots	Feed tasks, triage PRs
5	Manager of Managers	Hierarchical agent networks	Rule from the top (Gas Town)
6	Light Factory	Ticket-to-PR automated pipeline	Manually sign off on changes
7	Dark Factory	Fully automated deployments	Watch the pipeline run
8	Closed-Loop System	Self-healing production loops	Design the system's guardrails

What are the different levels of AI integration for developers?

Integration levels scale by delegating decision-making power away from the human keyboard. It begins with micro-tasks like writing helper functions and climbs to structural orchestration where you manage multiple agent workloads at once.

Is there a difference between AI pair programming and AI agent management?

Yes, AI pair programming involves a single developer and a single AI agent collaborating on a single task in real-time. Agent management, however, elevates the developer to a coordinator role, feeding distinct tasks to multiple agents working simultaneously across different parts of a codebase.

Level 1: Spicy Autocomplete. This is the baseline. You write code, hit tab, and get a single-line or single-block suggestion. It is low-risk and has virtually zero global context.
Level 2: Agent Advisor. The AI isn’t writing code here. Instead, it is a search engine on steroids. You use it to parse legacy codebases or figure out how to wire up a tricky integration, saving you thirty minutes of digging through outdated documentation.
Level 3: Pair Programmer. You and a single agent tag-team a ticket. You write the interface, it stubs out the implementation, you fix its hallucinations, and it generates the tests. You are both actively working on the same branch at the same time.
Level 4: Agent Manager. You stop coding in the traditional sense. You coordinate multiple agents simultaneously. One refactors a utility library, another writes API docs, and a third patches a dependency. You aren't writing code; you are feeding them work and triaging their output.
Level 5: Manager of Managers (The \"Gas Town\" Pyramid). This is the Gas Town setup. Instead of micromanaging five individual worker bots, you sit at the top of the pyramid and manage a single \"boss\" agent. This manager agent spins up, coordinates, and reviews its own sub-agents to execute broad project goals. You just hand down the decree from above.

What is a \"dark factory\" in software engineering?

A \"dark factory\" is an entirely automated production line where issues or features go in, and deployable code comes out with zero human intervention. The final iteration is a closed-loop system that monitors its own errors and self-heals in production.

Level 6: Light Factory. A ticket lands in your project tracker, agents pick it up, run the builds, write the code, and open a PR. A human still sits at the gate, reading the diff and hitting the \"merge\" button.
Level 7: Dark Factory. The human gatekeeper is removed. The pipeline is robust enough to autonomously test, build, and deploy the agent's work directly to production. If it passes the automated tests, it goes live.
Level 8: Closed-Loop Systems. The final stage closes the feedback loop. Agents watch APM metrics and log pipelines. If an endpoint starts throwing 500s or latency spikes, the observing agent automatically generates a ticket, assigns it to a worker agent, builds a hotfix, tests it, and deploys it. It is self-repairing software.

Knowing where you sit on this spectrum helps you prepare for what's next. Whether you are currently leveraging spicy autocomplete or building hierarchical agent networks, the goal remains the same: knowing when to write the code yourself and when to hand over the keys.

FAQ

How do I transition my team from Level 3 (Pair Programming) to Level 4 (Agent Management)?

You have to stop writing code and accept that your job is now writing specs and QA tests. If you cannot write a perfectly clear, unambiguous API specification and pair it with rigorous integration tests, Level 4 will just generate massive amounts of parallel technical debt that you will have to manually clean up later.

Are \"dark factories\" actually realistic, or is it just vaporware?

They are realistic for trivial, highly-templated tasks, but incredibly risky for complex business domains. To make them work without breaking your product, you have to invest heavily in bulletproof integration suites, sandbox environments, and instant rollback triggers—otherwise, the bot will deploy broken code at machine speed.

Which codebases are the easiest to automate with agents?

Rigid, heavily typed codebases with loud compiler errors. Statically typed languages like Go or Rust give agents immediate feedback loops when they break things. If you try to run an advanced agent pipeline on a massive, dynamic, weakly typed legacy JavaScript codebase, it will hallucinate itself into a corner in minutes.

ETL vs ELT: Why Cheap Cloud Storage Changed Everything

Doogal Simpson — Wed, 22 Jul 2026 10:52:06 +0000

TL;DR: Traditional ETL (Extract, Transform, Load) processed data before saving it to minimize expensive storage. Today, dirt-cheap cloud storage has flipped this paradigm to ELT (Extract, Load, Transform). By loading raw data first and transforming it later, engineers prevent data loss, simplify pipeline debugging, and easily adapt to changing business requirements.

Imagine brewing a giant pot of coffee for a crowded room, but you are forced to mix in the exact ratio of milk and sugar for every single person before you pour it into their mugs. If you miscalculate someone's preference, or if the milk curdles mid-pour, you have to dump the entire batch down the sink and start over.

That is traditional ETL. You extract raw data, force it through a rigid transformation middleman, and only then load it into your database.

Today, we do the opposite. We pour everyone a cup of raw, black coffee first, and let them customize it at the table. This is ELT (Extract, Load, Transform). This architectural shift isn't just a trend; it is a direct consequence of how changing cloud economics reshape system design.

What is the difference between ETL and ELT?

ETL (Extract, Transform, Load) cleanses and structures data on an external staging server before saving it to a destination database. ELT (Extract, Load, Transform) dumps raw, unmodified data directly into a cloud data warehouse first, postponing all transformations until they are actually needed. This shift preserves the original data state and offloads processing to modern, highly scalable database engines.

Historically, our system architectures were constrained by the high cost of spinning disks. Storage was expensive, so databases had to be lean. We couldn't afford to store messy, redundant, or uncompressed raw JSON payloads. The transformation step was a gatekeeper, stripping away "unnecessary" data to keep the database footprint as small as possible.

Today, cloud data warehouses like Snowflake, BigQuery, and Redshift have decoupled compute from storage, making raw storage virtually free. Because we no longer need to be conservative with space, we can dump raw, unstructured datasets straight into our environments and transform them on demand using SQL or tools like dbt.

Feature	ETL (Extract, Transform, Load)	ELT (Extract, Load, Transform)
Primary Driver	High storage costs; limited DB compute	Cheap cloud storage; massive DB compute
Transformation Location	Separate middleman/staging server	Target data warehouse/lakehouse
Data Loss Risk	High (raw data is discarded pre-load)	Minimal (raw data is permanently preserved)
Flexibility	Rigid (schema changes require pipeline rewrites)	Agile (queries are rewritten, data stays intact)

Why did cheap cloud storage kill ETL?

Historically, high database storage costs forced engineers to compress and transform data before saving it to avoid ballooning bills. As cloud object storage made storage cheap, the economic bottleneck shifted from hardware constraints to developer time and data flexibility. It became significantly cheaper to store everything first and define schemas later.

When storage was a premium resource, we spent countless engineering hours designing complex staging pipelines. If business requirements changed and you needed a field you discarded six months ago, you were out of luck. The data was gone.

Now, the math has changed. It is far more cost-effective to store petabytes of raw, unstructured log files in cheap object storage than to pay engineers to maintain fragile preprocessing scripts. If you decide next year that you need to analyze a nested JSON property you ignored today, the raw data is sitting there, waiting to be queried.

How does ELT prevent permanent data loss during pipeline failures?

Under an ELT architecture, a buggy transformation query never results in data loss because the raw, unmodified data remains safely stored in the destination warehouse. If a transformation fails, you simply fix the SQL query and re-run it against the existing raw data. There is no need to ask upstream teams to replay events or re-send lost files.

Imagine a scenario where your team is building a modern SaaS platform. You have a microservice emitting user interaction events.

Under an ETL model, if your transformation server encounters an unexpected null value or an undocumented API schema change, the pipeline crashes. The data currently in flight can easily be corrupted or dropped before it ever reaches the database.

With ELT, those raw event payloads are safely written directly to your database. Even if your downstream transformation queries fail and your analytics dashboard breaks, your source data is perfectly safe. You don't lose history; you just redeploy your transformation code, run the query again, and your dashboards recover instantly.

FAQ

Is ETL completely dead, or are there still valid use cases for it?

ETL is still highly relevant when dealing with strict compliance and privacy laws. If your data contains sensitive PII (Personally Identifiable Information) that cannot legally be stored in its raw form on cloud servers, you must use ETL to mask, anonymize, or redact that data before loading it.

Does shifting to ELT significantly increase cloud compute costs?

While ELT leverages the warehouse's compute engine to do heavy lifting, it is often more cost-effective than running dedicated, idle integration servers. Modern cloud databases optimize query execution plans and allow you to scale compute up or down dynamically, meaning you only pay for the exact seconds your transformations are running.

How does ELT affect data governance and organization?

Because ELT allows dumping raw data directly into the warehouse, it can lead to a "data swamp" if left unmanaged. Successful ELT implementations rely on clear schema separation—keeping a clean, untouched raw schema isolated from the clean, transformed production views that business intelligence tools query.

How to Fix the Thundering Herd (Cache Stampede) Problem

Doogal Simpson — Mon, 20 Jul 2026 12:38:24 +0000

The thundering herd problem happens when a highly cached data item expires, forcing millions of concurrent requests to bypass the cache and slam your database at once. You can solve this by implementing cache locking (or single-flighting), which permits only the first request to rebuild the cache while holding the others in queue.

When I browse Reddit or a major news platform and suddenly get a 504 Gateway Timeout, my mind immediately jumps to system architecture. It is often not a coordinated DDoS attack causing the outage, but rather a self-inflicted system failure: the thundering herd problem. Let's look at how this failure mode works and how I approach fixing it.

What is the thundering herd problem?

The thundering herd problem occurs when a heavily requested cache key expires, forcing all incoming traffic to query your database simultaneously. Instead of reading fast, temporary data from memory, millions of concurrent reads bypass the cache and hit the persistent storage layer all at once.

I visualize this using a physical analogy. Imagine a stadium concourse as your cache and the database as a single turnstile gate. While the concourse is open, thousands of visitors move freely. But the moment that cache key expires, it is as if the main gate vanishes.

The entire stadium crowd tries to squeeze through that single turnstile at the exact same millisecond. The database's CPU spikes to 100%, and it stops accepting any connections—taking the whole application down.

Why does cache expiration cause database failure?

When a cache key expires, the application registers a cache miss and routes the query to the database to rebuild the cache. If thousands of requests arrive during the fraction of a second it takes to fetch and write that data back, they will all hit the database at the same time.

I always remind engineers that caches like Redis or Memcached serve data in microseconds because they operate in memory. Databases, being the disk-backed source of truth, take milliseconds to process queries.

If you have a high-traffic microservice, even a tiny 50-millisecond window where the cache is empty is plenty of time for thousands of concurrent requests to pile up. The database gets overwhelmed trying to run the exact same heavy query thousands of times concurrently, resulting in connection pool exhaustion.

How do you solve the thundering herd problem?

You solve this by implementing cache locking, which is also known as mutex locking or the single-flight pattern. When a cache miss occurs, the system locks that key so only the first request queries the database, while forcing all subsequent concurrent requests to wait until the cache is repopulated.

Instead of letting every request hit the database, I establish a gatekeeper. The first request that encounters the expired cache grabs a lock, queries the database, writes the fresh data to the cache, and releases the lock.

Meanwhile, the other thousands of pending requests wait for a few milliseconds. Once the lock is released, they are served directly from the newly updated cache without ever touching the database.

Which tools support thundering herd protection?

Many modern reverse proxies and caching layers have built-in directives designed to serialize cache-miss requests automatically. Configuring these native features is usually much safer and simpler than trying to write your own custom distributed locking mechanisms.

Technology	Mechanism	Configuration Strategy
NGINX	`proxy_cache_use_stale` / `proxy_cache_lock`	Serves stale cache data to incoming requests while a single backend request updates the cache.
Redis	Distributed Locks (Redlock)	Allows application nodes to synchronize cache population using lightweight keys.
Go (Singleflight)	`sync.Mutex` group	Merges duplicate concurrent requests into a single execution shareable by all callers.

FAQ

Is the thundering herd problem the same as a cache stampede?

Yes, "cache stampede" and "thundering herd" are different names for the same architectural failure mode where expired keys trigger an unmanageable wave of database queries.

Can we prevent this by using longer cache TTLs?

No, increasing the TTL only reduces how often the event occurs; it does not solve the underlying vulnerability. When the key eventually expires under load, the system will still crash unless you have locking or background updates in place.

What is background cache regeneration?

It is an alternative strategy where background workers or cron jobs proactively update cache keys before they expire. Active traffic always reads from the warm cache and never triggers a synchronous database query on a miss.

Automate Video Editing with Whisper, LLMs, and FFmpeg

Doogal Simpson — Sun, 19 Jul 2026 22:02:27 +0000

TL;DR: To stop viewers from getting bored staring at my face during short technical videos, I built a system that parses my transcripts, generates relevant illustrations using AI, and splices them into the final video. This pipeline automates the visual storytelling, keeping things engaging without requiring manual editing.

Let’s be honest: staring at my face for 90 seconds while I explain abstract software architecture is a big ask. I get it. I’ve seen myself in the mirror.

When we explain complex engineering ideas—like database sharding or async loops—visuals make everything click. But manually editing graphics into a timeline is incredibly tedious. To fix this, I built a system that listens to what I say, figures out what to show, generates the visuals, and drops them into the video.

Why is manual video editing so inefficient for technical content?

Manual editing requires you to storyboard, design graphics, and line them up on a timeline just to explain a simple 90-second concept. Automating this process gets rid of the editing bottleneck entirely, letting you focus on the explanation while software handles the asset pipeline.

Imagine you are explaining how a Redis cache sits in front of a database. In a traditional workflow, you write the script, record the video, open up an editor like Premiere or DaVinci Resolve, draw a diagram, and manually align it with your voice.

Moving this logic into a software pipeline automates the tedious parts of video post-production. The code parses what is being said, determines where a visual cue should go, and handles the composition programmatically.

How can you build an automated transcript-to-image pipeline?

You build this by parsing timestamps from your video transcript, passing those segments to an LLM to generate image prompts, generating the assets via an image API, and splicing them into the video using FFmpeg. This connects your spoken words directly to matching visuals.

Here is how the automated workflow breaks down from raw video to a visually augmented final cut:

Stage	Action	Tools Used
1. Parse Transcript	Extract words and timestamp offsets from the audio	Whisper API / SRT Parser
2. Identify Beats	Find the moments that need visual explanation	LLM API
3. Generate Visuals	Turn the concept prompts into clean illustrations	Stable Diffusion / DALL-E
4. Splicing	Overlay the generated images onto the video	FFmpeg

By matching the transcript timestamps with our image generator, we can pinpoint exactly when a graphic needs to fade in and out without touching a timeline editor.

Does automated B-roll actually improve technical comprehension?

Automation speeds up production, but the real test is whether the generated visuals actually make the concepts easier to understand. If the images match the architecture you are describing, they help; if they are too random, they just become distracting.

Imagine you are explaining how garbage collection works. A simple, timed graphic showing memory blocks being freed is miles ahead of watching a talking head.

I want to find the sweet spot between automation and clarity. That is why I need your feedback: do these images actually help you grasp the concepts, or are they just making things more confusing? Drop a comment on my latest videos and let me know your thoughts.

FAQ

What are the best tools for programmatically editing images into videos?

FFmpeg is the industry standard for rendering, scaling, and overlaying images onto video streams via the command line. If you prefer working in Python, libraries like MoviePy offer simple wrappers over FFmpeg to handle the overlays.

How do you sync generated images with specific spoken words?

By using speech-to-text engines like Whisper, you get timestamped JSON outputs for every word. The pipeline matches these timestamps with the generated image files to calculate the start frame and duration for the video overlay.

How do you prevent AI image generators from outputting gibberish text in diagrams?

Tell the LLM in your prompt to generate abstract, non-textual diagrams and keep words out of the image entirely. Focus on shapes, flows, and clean icons rather than literal text labels.

How Quorum Prevents Split-Brain in Distributed Systems

Doogal Simpson — Sun, 19 Jul 2026 22:01:55 +0000

Quick Answer: A split-brain scenario occurs when a network partition isolates server nodes, leading multiple nodes to assume the "leader" role and accept conflicting writes. To prevent this data corruption, distributed systems enforce quorum—requiring a strict majority of active nodes to approve any write operation, gracefully rejecting writes on isolated minority nodes.

Imagine a cable gets cut between two servers. Both are still alive, but they can't talk to each other. What happens next? Left to their own devices, both servers assume the other has died, crown themselves the "master," and start accepting writes. This is the split-brain nightmare, and it's one of the fastest ways to corrupt your database.

What is a split-brain scenario in distributed systems?

A split-brain scenario occurs when a network partition cuts off communication between nodes, causing isolated groups to independently elect their own master nodes. Because multiple masters now process writes simultaneously without coordination, it leads to conflicting data updates and severe state corruption.

Let's look at a concrete example. Imagine we are building a distributed system to sell airline seats using two servers: Server A and Server B. Under normal conditions, they coordinate smoothly. But then, a network partition occurs.

Server A can no longer communicate with Server B. It assumes Server B has crashed, so Server A declares itself the sole leader to keep the service running. Across the network split, Server B goes through the exact same thought process, assuming Server A has died, and also promotes itself to master.

Now, you have two master nodes operating independently. If two different users concurrently try to book seat 12A—one hitting Server A and the other hitting Server B—both servers will accept the write. Once the network heals and the servers attempt to sync, you are left with a double-booked seat and corrupted, irreconcilable data.

How does quorum prevent split-brain data corruption?

Quorum prevents split-brain by requiring that any write operation or leader election must be approved by a strict majority of the total nodes in the cluster. If a network partition occurs, the sub-cluster containing the minority of nodes will fail to reach this threshold and gracefully reject write requests, preventing conflicting state updates.

Instead of relying on a fragile two-node setup, distributed systems use an odd number of nodes to establish a majority rule. For any cluster of N nodes, a quorum requires at least floor(N/2) + 1 active nodes to agree before any write is committed.

Let's scale our ticketing system up to 9 nodes. If a network partition occurs, splitting the cluster into one group of 5 nodes and another group of 4 nodes, the quorum math protects us:

The majority partition (5 nodes): It can still reach the required quorum of 5 nodes (since 5 out of 9 are present). This side continues to accept writes normally.
The minority partition (4 nodes): If a write request hits this side, the nodes try to coordinate but realize they can only get 4 votes. Since 4 is less than the required 5, they reject the write.

The table below highlights how clusters behave during a partition depending on whether quorum is enforced:

Attribute	Without Quorum (Split-Brain)	With Quorum (Majority Rule)
Active Leaders	Multiple active leaders (one per isolated partition)	Exactly one leader (only in the majority partition)
Data Consistency	High risk of split-brain corruption and lost writes	Guaranteed consistency; conflicting writes are blocked
Minority Partition Behavior	Accepts uncoordinated writes, leading to state drift	Rejects writes or falls back to safe read-only mode
Recovery Process	Complex, manual, and painful database reconciliation	Automatic; minority nodes catch up once the network heals

Why is failing gracefully better than accepting every write?

Failing gracefully protects the integrity of your system's state, adhering to the CAP theorem's preference for consistency over availability during a partition. Accepting unverified writes during a network split guarantees data corruption, which is significantly harder and more expensive to recover from than a temporary, localized outage.

As software engineers, we often obsess over 100% uptime. However, there is a massive difference between a system that is temporarily unavailable and a system that actively writes bad data.

If your system rejects a booking because it cannot reach a quorum, you might lose a transaction or frustrate a user in the short term. But if your system accepts the booking on both sides of a partition, you now have two customers showing up for the exact same physical seat. Reconciling corrupted data post-incident requires complex compensation logic, manual support intervention, and ruins user trust. Using quorum forces the minority partition to fail gracefully, keeping your data source of truth pristine.

Frequently Asked Questions

What happens if a cluster splits into two equal halves?

If a cluster splits into two equal halves (for example, a 4-node cluster splitting into groups of 2 and 2), neither side can achieve a strict majority of 3 nodes. Consequently, both sides fail to achieve quorum, and the entire cluster stops accepting writes until connectivity is restored. This is why distributed databases are almost always deployed with an odd number of nodes (3, 5, or 7).

How do consensus protocols like Raft and Paxos leverage quorum?

Consensus protocols like Raft and Paxos use quorum to ensure a single, consistent state across the cluster. During a leader election, a candidate node must win votes from a majority of the cluster to become the leader. Similarly, when writing data, the leader must replicate the log entry to a majority of nodes before committing the write, ensuring that a minority partition can never overwrite committed data.

Can a minority partition still serve read requests?

Yes, depending on how you configure your database's consistency model. Many distributed databases allow the minority partition to serve "stale" or eventually consistent reads, even if they reject writes. However, if your application requires strict linearizable reads (ensuring you always read the absolute latest write), even read requests must query a quorum of nodes, meaning the minority partition would have to reject reads as well.

Why You Should Never Use a Database as an API

Doogal Simpson — Sun, 19 Jul 2026 22:01:38 +0000

TL;DR: Sharing databases across team boundaries bypasses service interfaces, turning database schemas into public APIs. This tightly couples services, halts schema migrations, and stalls development. To maintain architectural agility, I always advise teams to hide their databases behind well-defined APIs, decoupling internal implementation from external integration.

I'm no big fan of Jeff Bezos, but his early Amazon API mandate was absolute gold. The story goes that he sent a company-wide email warning that if anyone used another team's database directly as an API, they would be fired. While threatening to sack your engineering team is a bit extreme, the technical wisdom behind it is something I advocate for constantly. I see teams take this shortcut all the time, only to watch it blow up in their faces later. Here is my breakdown of why we need to stop doing this and how we can build better boundaries.

Why is direct database sharing bad for microservices?

In my experience, sharing databases directly across services destroys boundaries because it turns your internal storage schema into your public interface. When external teams query your database directly, you lose the ability to rename columns, change data types, or refactor your schema without breaking their systems.

I like to think of this using a physical analogy. Imagine you are setting up a smart home. If your smart lights plug into standard wall outlets, they work perfectly because they rely on a stable, standardized interface. But if those lights required you to splice raw copper wires directly into your home's main electrical panel, you could never upgrade your wiring or swap a circuit breaker without risking a blackout.

When Team B queries Team A's database directly, they bypass the API layer. The database schema—the raw copper wire—becomes the interface. If Team A wants to rename a column from user_id to account_id or migrate from PostgreSQL to MongoDB, they are stuck. They cannot make changes because they no longer own their data model.

How do API interfaces solve the database coupling problem?

From what I've seen, APIs solve the coupling problem by introducing a layer of abstraction between how data is stored and how it is consumed. By exposing data through HTTP endpoints, gRPC, or event streams, the owning team can freely modify their database schema as long as the API payload remains contract-compliant.

Let's say you have a billing service that needs data from a user profile service. Instead of writing a direct SQL query against the users table, the billing service should call GET /users/{id}.

Behind that HTTP endpoint, the profile service can restructure its tables, split a single table into three normalized tables, or even change its caching strategy. As long as the JSON response format does not break, the billing service does not know and does not care. This separation of concerns keeps teams operating independently and moving fast.

What are the trade-offs of direct database access versus API integration?

I always warn developers that while direct database access offers lower initial latency and faster setup, it leads to tight coupling and unmaintainable technical debt. API integration requires upfront design and introduces minor network overhead but ensures team autonomy, independent scaling, and safe schema evolution.

Metric / Feature	Direct Database Access	Service Interface (API)
Coupling Level	Extremely Tight (Schema is public)	Loose (Schema is private)
Development Speed	Fast initially, degrades rapidly over time	Consistent, predictable long-term velocity
Schema Migrations	High-risk (will break external consumers)	Safe (hidden behind abstraction layer)
Security & Auditing	Difficult (requires database-level permissions)	Simple (handled at the application layer)
Technology Choice	Locked into a single database technology	Free to choose different DBs per service

How do we transition away from shared databases?

To transition away from a shared database model, I recommend establishing clear data ownership and creating wrappers around legacy data paths. Teams must stop writing cross-schema joins, build lightweight API wrappers over the data, and migrate consumers to the new endpoints.

If you are currently stuck in a "shared database culture," I do not expect you to rewrite everything overnight. Start by treating the legacy tables as deprecated contracts. Write a thin service layer that queries those tables and exposes the data via standard HTTP endpoints. Have other teams transition their read queries to these new endpoints one by one. Once the direct database queries drop to zero, you finally own your schema again and can begin refactoring with confidence.

FAQ

Is sharing a database schema ever acceptable?

I only allow it within the boundary of a single, deployable microservice owned by one team. If multiple teams or services deploy independently but touch the same database schema, you have a distributed monolith, not microservices.

Does this rule apply to read-only database replicas?

Absolutely. Even if another team is only reading from a read replica of your database to avoid performance degradation, they are still coupled to your underlying schema. If you rename a column on the primary database, the replica schema changes and their queries will fail.

How do we handle high-performance reporting without direct database access?

Instead of querying another team's database directly for reporting, I recommend using asynchronous event streaming (like Kafka or RabbitMQ) to replicate the necessary data to a dedicated reporting database, or utilizing an ETL pipeline to move data to a central data warehouse.

Should You Use Custom ID Types or Plain Strings?

Doogal Simpson — Sun, 19 Jul 2026 22:00:26 +0000

Should you use a custom ID type or just stick to strings? Sticking to raw strings is simpler and faster to write, but wrapping your IDs in a custom domain type prevents developer mistakes—like running substring or split operations on a UUID. The right choice depends on your team's preference for speed versus absolute correctness.

I have lost count of the times I have seen developers treat IDs as generic strings, only for it to backfire. Imagine debugging a production crash only to find someone ran a .split('-') on a database UUID to "extract a prefix" for a lookup. It sounds absurd, but when IDs are passed around as raw strings, they inherit all of string's behaviors. I have noticed that developers have incredibly strong opinions on this, so I want to talk about the age-old debate: primitive strings versus custom ID types.

Why is using primitive strings for database IDs considered bad practice?

I find that using raw strings for database IDs is risky because it exposes a massive API surface of string manipulation methods that make no sense for an identifier. It allows developers to treat a unique domain identifier as arbitrary text, which often leads to silent bugs and loose domain boundaries.

In my view, an ID exists for one primary reason: equality comparison. You want to know if RecordA.id === RecordB.id. You never need to pad an ID, slice it, or regex-match it in normal business logic.

Think of an ID like a physical key to a house. You do not slice a physical key in half to see if it fits a smaller lock; you either use it to open the door, or you do not. When we type an ID as a string, I feel like we are giving developers a Swiss Army knife when they only needed a key.

How do you implement a custom ID type in code?

When I implement a custom ID type, I wrap the primitive value inside a dedicated class or value object. This encapsulates the internal representation and restricts the available operations solely to creation and equality checks.

Here is a lightweight example of how I like to wrap a primitive string ID in TypeScript:

class UserId {
  constructor(private readonly value: string) {}
  equals(other: UserId): boolean {
    return this.value === other.value;
  }
  toString(): string {
    return this.value;
  }
}

What are the trade-offs of custom ID types vs primitive strings?

From what I have seen, this trade-off is a classic battle between cognitive overhead and runtime safety. While custom ID types prevent illegal operations and accidental type mixing, they require extra boilerplate, serialization handling, and onboarding effort for new team members.

Here is how I break down the architectural and developer experience impacts of both approaches:

Raw Strings (The Fast Path)
- Pros: Zero boilerplate; works out of the box with JSON serialization and ORMs; lower cognitive load.
- Cons: Allows nonsensical operations (like id.split()); makes it easy to accidentally pass a ProductId into a function expecting a CustomerId because both are just strings.
Custom ID Types (The Safe Path)
- Pros: Prevents domain errors at compile-time; restricts API surface to equality checks only; highly self-documenting.
- Cons: Requires custom serializers for database and API boundaries; adds slight friction when onboarding new developers.

How should a team decide between custom IDs and strings?

I always advise teams to choose based on their operational culture and domain complexity. Fast-moving startups or prototyping teams usually prefer the speed of raw strings, while teams building long-lived, high-stakes systems benefit heavily from the strict correctness of custom types.

Ultimately, I do not think either approach is objectively wrong. If your team culture values shipping fast and trusts team discipline to keep things clean, strings are fine. But if you are building a system where a mixed-up ID could mean transferring money to the wrong account, I believe that compile-time safety is worth every line of boilerplate.

FAQ

Does using a custom ID type hurt database query performance?

No, it does not. Custom ID types only exist at the application layer for compile-time or runtime safety. Once the data is compiled down or serialized, it is sent to the database as a standard primitive (like a string or binary UUID), leaving query performance completely unaffected.

How do you handle JSON serialization with custom ID types?

Most modern languages and frameworks allow you to write custom serialization adapters. For instance, I configure my JSON parser to automatically serialize the custom type into its raw string value when sending payloads over the network, and deserialize it back into the class upon receipt.

Can TypeScript's 'branding' feature solve this without classes?

Yes, TypeScript allows you to use "branded types" (or nominal typing) to differentiate strings at compile time without creating runtime classes. This gives you type safety preventing you from passing a CustomerId to a ProductId parameter, while keeping the runtime value a plain string.

When to Use an ORM vs SQL and Query Builders

Doogal Simpson — Sun, 19 Jul 2026 22:00:09 +0000

ORMs like Prisma and Hibernate are excellent for simplifying basic CRUD operations. However, for complex queries involving window functions, CTEs, or deep joins, ORMs often get in the way. For advanced use cases, learning SQL first and using a modern query builder like Drizzle offers the perfect balance of static typing and control.

We have all been there. You start a new project, pull in a heavy Object-Relational Mapper (ORM) to make database interactions "easier," and everything goes smoothly while you are just writing basic endpoints. But the moment you need to fetch nested data with a complex join, a window function, or a common table expression (CTE), that helpful abstraction suddenly feels like a brick wall. You end up spending more time fighting the ORM's custom syntax than actually writing the query.

Here is how I approach the decision of when to use an ORM, when to drop down to a query builder, and why learning SQL first is non-negotiable.

When is an ORM actually useful?

ORMs are highly effective when your application primarily performs standard Create, Read, Update, and Delete (CRUD) operations. They eliminate repetitive boilerplate code by letting you interact with database rows as native programming language objects.

If you are building an application where you just need to get a record by ID, update a single column, or fetch list data based on simple filters, an ORM is a massive time-saver. It handles the mapping automatically, keeps your database schema in sync with your application models, and lets you move incredibly fast during the early stages of development.

Why do ORMs fail on complex queries?

ORMs struggle with complex queries because their object-oriented abstractions do not map cleanly to relational algebra operations like window functions or common table expressions (CTEs). Trying to force these operations through an ORM wrapper often leads to unoptimized database calls or incredibly convoluted code.

Imagine you are building an analytics dashboard that needs to calculate a running total of transactions grouped by category. In SQL, this is a straightforward window function. In a traditional ORM, you either have to write a highly complex, unreadable method chain, or worse, retrieve thousands of rows into your application memory to calculate the sum in code. When the abstraction gets in the way of writing clean, performant queries, it ceases to be useful.

Comparing Database Access Methods

Database Task	ORMs (e.g., Prisma / Hibernate)	Query Builders (e.g., Drizzle)	Raw SQL
Basic CRUD	Excellent (Fastest setup)	Good (Type-safe but manual)	Tedious (High boilerplate)
Complex Joins & CTEs	Poor (Clunky API)	Excellent (Matches SQL layout)	Perfect (Full control)
Performance Overhead	High (Heavy abstraction)	Minimal (Thin wrapper)	Zero (Direct execution)

Should you learn SQL before learning an ORM?

Yes, you should absolutely master raw SQL before relying on an ORM. Understanding how relational databases handle queries under the hood gives you the mental model needed to write performant database interactions and debug ORM-generated queries when they fail.

If you are new to software engineering, learning how an ORM works before understanding SQL is a trap. ORM libraries and frameworks go out of style, but SQL is an industry standard that has persisted for decades. When you understand SQL, you have total control; you know exactly what your database is doing, and you can easily evaluate whether an ORM is helping or hurting your application's performance.

What is the best alternative to a heavy ORM?

A lightweight query builder is the ideal middle ground between raw SQL and a heavy ORM. Tools like Drizzle in the Node.js ecosystem allow you to write type-safe queries that closely resemble real SQL, giving you full control without the overhead of an object-relational mapping layer.

Instead of writing a raw string of SQL that your editor cannot validate, query builders allow you to use static typing to catch errors at compile time:

const transactions = await db
  .select({
    id: transactionsTable.id,
    amount: transactionsTable.amount,
  })
  .from(transactionsTable)
  .where(eq(transactionsTable.userId, userId))
  .orderBy(desc(transactionsTable.createdAt));

This approach gives you the predictability of SQL with the developer experience and safety of modern TypeScript.

FAQ

Can you mix raw SQL with an ORM?

Yes. Most modern ORMs provide an "escape hatch" that allows you to write raw SQL queries for complex operations while still using the ORM for standard CRUD tasks.

Do ORMs hurt database performance?

They can. ORMs often generate inefficient SQL queries (such as the N+1 query problem) or fetch more columns than necessary because they default to selecting entire rows rather than specific fields.

What is the difference between an ORM and a query builder?

An ORM maps database tables directly to application objects and manages relations automatically, whereas a query builder simply provides a programmatic, type-safe way to construct raw SQL queries.

Solving the N+1 Query Problem in Databases and ORMs

Doogal Simpson — Sun, 19 Jul 2026 21:59:52 +0000

TL;DR: The N+1 query problem occurs when an ORM executes one initial database query followed by N subsequent queries to fetch related child data. I solve this by using SQL joins, batch fetching (IN queries), or data denormalization. This guide shows you how to implement each fix.

Let’s be honest: ORMs are a double-edged sword. Tools like Hibernate, Prisma, or TypeORM sucker you in with incredibly slick, beautiful interfaces. They make database interactions feel like pure poetry—until they turn around and beat you down with nasty performance bugs.

I see this happen all the time. You build a clean-looking feature, push it to production, and suddenly your database performance falls off a cliff. Nine times out of ten, you’ve quietly introduced the classic N+1 query problem without even realizing it.

What is the N+1 query problem?

To put it simply, the N+1 query problem is what happens when your application makes one query to fetch parent records, and then makes N individual queries to fetch related child data. Instead of hitting the database once, you end up hitting it N+1 times. If you have 100 records, you are making 101 queries, which will absolutely melt your database under load.

Let's look at a hypothetical scenario. Imagine your team is building a social media feed. You want to display a list of posts, and each post needs to show the author’s username.

Under the hood, your ORM might make one query to grab the posts:

SELECT * FROM posts LIMIT 100;

Then, to display the usernames, it loops through those posts and fires off a separate query for every single post:

SELECT username FROM users WHERE id = [post.userId];

Suddenly, what should have been a single fast database roundtrip turns into a barrage of 101 queries. I promise you, doing this repeatedly when users load their feeds is a fast track to a production outage.

How do you identify N+1 queries in your code?

I find the easiest way to catch N+1 queries is by enabling SQL logging in development or using APM tools to monitor your database traffic. If you see a massive block of near-identical SELECT statements executing in rapid succession, you have an N+1 bug. You can also spot them by keeping a close eye on your ORM’s relationship loading behavior.

If you do not have APM tools set up, just watch your console output in your local dev environment. When a single page load triggers a continuous scroll of database logs that look identical except for the primary key, I guarantee you have a performance problem on your hands.

What are the best ways to solve the N+1 query problem?

I resolve N+1 queries using one of three main strategies: database JOINs, batch queries with an IN clause, or data denormalization. The right choice depends on your read-to-write ratios and how complex your relationships are. Here is how I break down these three approaches:

Solution	Database Roundtrips	Complexity	When I Use It
SQL JOIN	1	Low	Standard relational databases where tables are indexed.
Batching (IN clause)	2	Medium	Large datasets where large JOINs become too expensive.
Denormalization	1	High	Scale-heavy, read-intensive feeds where performance is everything.

How does a SQL JOIN solve N+1?

A SQL JOIN combines both tables into a single query, which allows the database engine to correlate and return all your data in one roundtrip. Instead of retrieving raw posts and then looking up users, you return an "enhanced" post that already contains the username details. This is usually my go-to fix because it leverages what SQL databases do best.

Most modern ORMs allow you to trigger this behavior using eager loading flags. You just tell the ORM to explicitly include the relation, and it will rewrite the underlying query to use a join under the hood.

How does batching with an IN clause fix the issue?

Batch fetching solves the problem in exactly two database queries by utilizing a SQL IN clause to retrieve child data in bulk. First, the application fetches the parent records, extracts the foreign keys, and then runs a single query to get all related rows at once. This keeps the total database roundtrips to two, regardless of how many records you have.

In our feed scenario, this means you run one query for the 100 posts, extract all the userId values, and run a single follow-up: SELECT * FROM users WHERE id IN (1, 2, 3...). It is incredibly clean and prevents massive, slow joins on huge datasets.

When should you resort to data denormalization?

Data denormalization is the right choice when you are working with non-relational structures or read-heavy systems where you simply cannot afford joins. You solve N+1 by writing the duplicate data—like storing the username directly inside the posts table—so everything you need is in one place. Just remember that you will have the persistent headache of keeping that duplicated data updated when a user changes their username.

If a user updates their name, you have to run a background job to cascade that update to every single post they've ever written. It's a trade-off: you get lightning-fast reads, but your write logic gets significantly more complex.

FAQ

Why do ORMs default to lazy loading if it is so dangerous?

Lazy loading is the default because it is convenient and saves memory by only fetching what you ask for. But if you ask me, it is a total trap; it assumes developers will manually specify eager loading whenever they handle collections, which we almost always forget to do.

Does eager loading completely eliminate the N+1 query problem?

Yes, eager loading forces the ORM to fetch related data upfront, usually through a JOIN or an IN query. However, you have to be careful not to eager-load every single relation on a model, or you will end up pulling half your database into application memory.

How do you handle denormalized data sync issues?

If you go the denormalization route, you have to accept the trade-off of maintaining that data. I usually handle this by triggering asynchronous background workers or event listeners to update the duplicated username across all post records whenever a user changes their name.

How Collaborative Docs Work: An Introduction to CRDTs

Doogal Simpson — Sun, 19 Jul 2026 21:59:35 +0000

TL;DR: Collaborative documents avoid synchronization conflicts by using Conflict-free Replicated Data Types (CRDTs) instead of index-based positioning. By assigning every character a unique, immutable identifier and referencing edits to those IDs rather than numeric array indexes, concurrent changes merge predictably regardless of the order they arrive at the server.

If you have ever worked on a shared document in Google Docs or Notion, you have probably taken real-time collaboration for granted. But under the hood, syncing text across multiple distributed clients is a notoriously difficult problem. I want to dive into why naive solutions break, and explain how I think about resolving this conflict using an elegant data structure called a CRDT.

Why does index-based positioning fail in collaborative editing?

Index-based positioning fails because concurrent edits shift the character offsets of the document in real time. When multiple users send edits based on their local view of the document, the absolute indexes become desynchronized, leading to corrupt text when those edits are merged on other clients.

Imagine you are building a collaborative text editor. The document currently contains the word cat. Two users try to edit it simultaneously:

User 1 wants to change it to chat by adding an h at index 1.
User 2 wants to change it to cats by adding an s at index 3.

If you naively apply these edits using array indexes, the final state depends entirely on which edit runs first.

If User 1's edit runs first, the word becomes chat. When you then apply User 2's edit (insert s at index 3), you end up with chast—which is completely wrong. Conversely, if User 2's edit runs first, you get cats, and applying User 1's edit results in chats.

Because the order of network packets dictates the final text, clients will end up with mismatched documents.

How do CRDTs solve the collaborative text sync problem?

Conflict-free Replicated Data Types (CRDTs) solve this by assigning a globally unique, immutable identifier to every single character in a document. Edits are then declared relative to these static identifiers rather than volatile numeric indexes, ensuring that operations can be applied in any order with the same final state.

To solve this, I prefer to model the document not as a raw array of characters, but as a collection of metadata nodes. Every character node is assigned a unique ID consisting of the user's ID and a local counter.

For example, I would structure the word cat like this:

c -> ID: user0v1
a -> ID: user0v2
t -> ID: user0v3

When User 1 wants to insert h, they do not say "insert at index 1." Instead, they say "insert h (ID: user1v1) directly after user0v1." At the same time, User 2 inserts s (ID: user2v1) directly after user0v3.

Here is how I represent that metadata under the hood for a single edit:

{
  "id": "user1v1",
  "char": "h",
  "after": "user0v1"
}

This representation maps out how every character relates to its neighbors:

Character	Unique ID	Inserted After ID	Final Position
c	`user0v1`	Start	1st
h	`user1v1`	`user0v1`	2nd
a	`user0v2`	`user0v1`	3rd
t	`user0v3`	`user0v2`	4th
s	`user2v1`	`user0v3`	5th

Why is the order of applied edits irrelevant in a CRDT?

CRDTs are mathematically designed to be commutative and associative, meaning the order in which network packets arrive does not affect the final computed state. Because each insertion or deletion operation refers to static character IDs, the resulting document structure converges to the exact same state on all clients.

No matter what order the network delivers the packets, the relationship remains the same. Whether the local client processes the h insertion or the s insertion first, h will always sit after c (user0v1), and s will always sit after t (user0v3). I find this mathematical certainty incredibly satisfying because it guarantees that every client eventually converges on the exact same word: chats.

FAQ

How do CRDTs handle character deletions?

To delete a character, CRDTs typically use "tombstones." Instead of completely removing the node from the tree (which would break other pending edits referencing its ID), the node is marked as invisible. This preserves the document's relational structure while hiding the character from the user.

What is the difference between Operational Transformation (OT) and CRDTs?

Operational Transformation (OT) relies on a central server to rewrite the indexes of incoming edits before broadcasting them to other clients. CRDTs are peer-to-peer friendly; they do not require a central coordinating server to resolve conflicts because the data structures resolve themselves mathematically.

Do CRDTs make document file sizes too bloated?

Yes, CRDTs introduce metadata overhead because every single character requires an ID and positioning pointers. However, modern CRDT implementations use optimizations like run-length encoding and state-tree pruning to keep memory usage highly performant.