DEV Community: Mathew Dostal

Why You Probably Don't Need a Full-Time CTO

Mathew Dostal — Tue, 31 Mar 2026 15:22:25 +0000

The work I do now as a fractional CTO isn't new. But it's different from consulting in ways that matter.
I spent years at EY and Zilker Technology leading technical engagements — architecture direction, team building, platform migrations. Those were full engagements. Temporary, yes, but often 40+ hours a week. I'd make the architecture decisions, but I'd also be neck-deep in backlog grooming, PR reviews, data entry, and playing three roles because the engagement scoped it that way. The high-value work — the decisions, the direction, the "here's how to architect this and why" — would take maybe 10-15 hours a week. The rest was overhead that cost the client more money but wasn't the core focus.
I also did work that looked a lot closer to what fractional actually is: technical audits where I'd assess a system in a week and hand back a roadmap. Training drops where I'd come in, walk a team through how to move from Angular 1 to Angular 2, discuss the new paradigm, and leave them to execute. Architecture reviews where the deliverable was a set of diagrams and a handoff, not a six-month residency.
The difference with fractional is that you strip away the noise. No drowning in follow-up. No grooming the backlog or babysitting PRs. It's the high-level direction, the diagrams, the team handoff, the "I've got an hour during lunch — show me what's not working, I'll build you a quick prototype, and we'll walk through it." The decisions, not the data entry. And the market agrees — demand for fractional C-suite roles grew 68% from 2023 to 2024, with the market now topping $5.7 billion globally.
What changed for me is that I started doing it independently. And the first thing I learned is that the companies who need this the most aren't always the ones you'd expect.

It's Not Just Startups

The default narrative is that fractional CTOs exist for pre-seed startups that can't afford a full-time hire. That's part of it. But the pattern is broader.
At Hertz, DaVita, Kohl's, and Chick-fil-A, the highest-value parts of my engagements were always the same: come in, assess, make the critical decisions, build the roadmap, coach the team on how to execute it. The companies had hundreds of engineers. They didn't need another full-time executive. They needed someone who'd solved their specific problem before and could compress weeks of evaluation into days. Sometimes that turned into a full engagement where I built the team and led the delivery. But the direction-setting — the fractional part — was always the catalyst.
On the other end, I'm working with All That Technology, a home tech services company in the Dallas-Fort Worth metro. A small team — Mike runs the business with a few techs, building something real from a weekend side hustle. He's competing against established players — companies with 100+ Google reviews, 40+ years in business, Yelp presence, the works. He didn't need a CTO. He needed someone who could rebuild his Shopify storefront into a service-first conversion engine, set up PostHog analytics to separate bot traffic from real DFW customers, identify $53-119/month in wasted app spend, deliver 15+ strategy documents covering everything from an 8-week growth roadmap to pricing architecture to a 63-piece content calendar, build a growth strategy across Angie's List, Nextdoor, and BBB, and build him SOPs for what to sell and how — discount structures that double as marketing (10% off for a Google review, another 10% for a social media share), subscription models for recurring revenue, and upsell frameworks so every service call becomes an opportunity to grow the relationship.
In 90 days, that business went from $81/month in revenue to $3,584/month. Average ticket size grew 2.3x. Orders started coming in from surrounding cities — customers finding him organically from towns he hadn't specifically targeted yet. That growth started with zero ad spend — pure organic and referral. Google Shopping free listings alone are driving 45% of his traffic at $0. The repeat customer base is building — some customers coming back three, four times. We're now layering in paid channels and the trajectory keeps climbing. The Shopify site generates leads that convert offline. He's taking market share from competitors who have a decade-plus head start, and he's doing it by being faster, more responsive, and better positioned online.
As Mike at All That Technology put it: "Your advice and guidance has helped iron out so many kinks — I regularly keep thinking I gotta keep up." That's the signal that the engagement is working. He's not waiting for me to tell him what to do. He's running harder because the direction is clear and the results are showing up.
That's not architecture. That's not CI/CD. That's GTM strategy, marketing channel optimization, analytics, competitive positioning, pricing, and knowing enough about e-commerce platforms to build what the business actually needs instead of what a dev shop would sell you.

What You're Actually Buying

The technical decisions — stack, infrastructure, database, cloud provider — are the easy part. I wrote about this in What Does a CTO Actually Do?: any strong senior engineer can make those calls for $100 an hour. That's commodity work.
What fractional gives you is the focused version of senior technical leadership — the decisions without the overhead. Your team, your offshore devs, your agentic workflows, or your three senior engineers can handle the execution. They need someone to set the direction, make the call on which way to go, and be available when they're stuck. Not someone sitting in their standups.
What's harder to buy by the hour:

Translation between engineering and business. Turning "we have significant technical debt" into "we have a time bomb that will cost us a full sprint to defuse, and here's why now is cheaper than later." Making sure neither side is talking past the other.
Pattern recognition from reps. I've done platform migrations, e-commerce rebuilds, real-time system architectures at scale, edge AI deployments, and mobile CI/CD before CI/CD was a thing — back when Chef was just becoming real and infrastructure meant being on a release call, not clicking a button in a cloud console. I've watched infra evolve from managing your own AS/400 to Infrastructure as Code with solid APIs and GUIs. When a client hits a problem I've seen before, the answer comes from experience across that entire arc, not research. That compresses timelines dramatically.
Business strategy that a pure technologist won't give you. Marketing channels, pricing models, growth metrics, positioning against competitors. Not every fractional CTO does this. But the ones who've actually built and grown products — not just architected systems — bring a different lens. You might just need help with GTM, PMF, or figuring out which marketing channels are worth your time. I don't have all the answers, but I've been through enough launches, growth experiments, and close-the-deal moments to help with positioning.
Honest assessment. I spend a lot of time saying "the data doesn't support this yet." It's an unpopular sentence. But when your conversion rate drops 8% over a weekend on 300 visitors, that's noise, not signal. Knowing when to act and when to wait is worth something.

The Cost Math (Done Honestly)

A senior CTO in a major metro commands $200-300K base plus equity. All-in with benefits, equipment, and the inevitable 2-month ramp-up, you're looking at $250-350K annually. That's before they make a single decision.
And that's the best case. 40% of externally hired executives leave or are terminated within 18 months. When a CTO hire fails, the total cost — recruiting fees, severance, stalled strategy, team disruption, backfill — runs 200-400% of their annual salary. On a $250K hire, that's $500K-$1M in real damage. And the executive search process takes 3-6 months before you even have someone in the seat.

Fractional eliminates most of that risk. Bad fit? Adjust the engagement or part ways. No severance, no 6-month vacancy, no equity clawback drama.
Fractional doesn't mean "the same job, fewer hours." It means a fundamentally different engagement model. Typical fractional CTO rates run $150-500/hour depending on specialization and stage. Here's what that actually looks like:
A quick distinction that matters here: not every new company is a "startup" in the venture-backed, 10x-growth sense. If you're building a SaaS product and chasing product-market fit with the goal of raising a Series A, you're a tech startup. If you started a home services company, a boutique agency, or an e-commerce shop and you're building a real business that pays real bills — you're a small business. You might call yourself a startup because you're new, and you are, but the playbook is different. The technology needs are different. The budget is different. And the kind of help you need from a fractional CTO is different. Both are valid. But conflating them leads to bad advice.
SaaS / tech startup (pre-Series A, 1-5 engineers):
10-15 hours/week at $150-200/hr = $78K-156K/yr. You get the critical architecture decisions, CI/CD setup, mentorship for junior devs, and someone who's done this before. You don't get someone sitting in standups or writing JIRA tickets — and you shouldn't want that at this stage.
Growth-stage tech company (Series A-B, 5-20 engineers):
15-20 hours/week at $200-300/hr = $156K-312K/yr. More hands-on. Sprint planning, hiring process design, vendor negotiations, 1:1s with senior devs. This is where concentrated experience earns its premium.
Non-technical founder / small business startup:
You started a real business — maybe it's a service company, a local brand, an e-commerce shop. You don't have engineers and might never need a full engineering team. But you need a website that converts, analytics that separate real customers from bot traffic, a growth strategy, and someone who can build the first version of whatever you need without charging you for a dev team. Project-based or low-hour retainer. A Shopify rebuild might be a fixed project. Ongoing growth strategy might be 5-10 hours/month. The cost is a fraction of hiring a technical co-founder, and you get someone who's also thinking about your marketing, your analytics, and your unit economics — not just your code.
Enterprise / established companies:
Project-based or retainer. An architecture audit might be 40 hours total. A migration roadmap might be a 3-month engagement at 10 hours/week. A quarterly board-level technical review might be 8 hours per quarter. You have engineers. You need leadership for specific problems, not another VP.
The real savings aren't in the hourly rate — they're in the fact that most companies don't need 2,080 hours of CTO per year. They need 500. Maybe 200. Maybe 40.

What Actually Fills Those Hours

With a startup client, a typical week:
Monday (3 hrs): Architecture review. PR reviews for anything touching infra, auth, or data models. Set technical direction for the week.
Wednesday (3 hrs): 1:1s with senior devs. Unblock technical decisions. Vendor evaluations if needed.
Friday (3 hrs): Sprint planning with eng lead. Technical debt prioritization. Security review.
Async (6 hrs): Slack, code reviews, documentation, incident response if critical.
With a small business client, it's different:
Weekly check-in (1 hr): Review metrics — real traffic vs bots, conversion rates, revenue trends. Adjust strategy. Talk through what's working in their outreach and what isn't.
Async (3-5 hrs): Site improvements, SEO updates, analytics review, growth channel adjustments. Research on positioning. Draft pricing strategies or promotional campaigns.
Monthly: Deeper analysis. Are we hitting the 30/60/90 day targets? Where's the next growth lever? What needs to change?
With an enterprise client:
Week 1: Deep-dive audit. Interview the team, read the code, map the architecture, identify the three things that will hurt worst in 12 months.
Week 2: Deliver findings. Prioritized roadmap. Specific recommendations with tradeoffs — not a 60-page deck, a working document the team can actually use.
Ongoing (if retained): Monthly check-in. Review progress against roadmap. Adjust priorities. Available for escalations. The team does the work — I make sure they're doing the right work.

When Fractional Works (and When It Doesn't)

Fractional is ideal when:

You're a SaaS/tech startup pre-Series A with 1-5 engineers
Your core technical decisions haven't been made yet
You need someone who's done this before — not someone learning on your dime
You can't yet justify a $250K+ salary
Your existing team is capable but needs strategic direction
You need an architecture audit, migration plan, or technical due diligence
You're a non-technical founder who needs a technical translator
You're a small business that needs both the tech and the growth strategy
You're an enterprise that needs expert judgment for a specific initiative, not another FTE 72% of CEOs plan to increase their use of fractional executives in 2025. That's not startup desperation — that's enterprise strategy. Go full-time when:
Engineering is 10+ people and needs daily leadership
Technical strategy is your primary competitive advantage
You're post-Series B and have the runway
You need someone in the room for every product decision
You're in a heavily regulated industry (healthcare, fintech, defense) where compliance requires constant CTO-level oversight
The coordination overhead of fractional starts exceeding the cost savings

The Hybrid Path

Most companies don't need to make this an either-or decision. The smartest ones I've worked with start fractional and graduate to full-time when they actually need it.
Months 1-3: Establish architecture, CI/CD, coding standards. Make the big decisions. Build the foundation the team will live on for the next 2-3 years.
Months 4-6: Hire and mentor a senior engineer or eng manager. Transfer institutional knowledge. Start reducing your dependency on me.
Months 7-9: Reduce hours to advisory (5 hrs/week). The team should be self-sufficient for daily decisions.
Month 10+: Transition to board advisor or hire full-time CTO. Your architecture is solid, your team knows how to maintain it.
The fractional CTO can help write the job description for their full-time replacement, interview candidates, and ensure the transition doesn't lose institutional knowledge. I've done this. It works because the incentives are aligned — my job is to make you successful, not to make myself permanent.
Not every engagement follows this arc. Some clients stay on a retainer indefinitely — the 5-10 hours/month of strategic oversight and growth coaching is exactly what they need. Others ramp up temporarily during a fundraise or product launch. The model flexes because the point was never to fill a chair. It was to solve problems.

The Bottom Line

82% of startup failures trace back to leadership and management issues. The question isn't whether you need senior technical leadership — you absolutely do. The question is whether you need it 40 hours a week.
Don't hire a full-time CTO because you think you should. Hire one because your engineering org is large enough to need daily strategic leadership. Until then, buy the decisions, not the seat.
And if what you actually need isn't a CTO at all — if you need someone who can build the site, set up the analytics, figure out your growth channels, coach you on pricing, and help you go from side hustle to real business — that exists too. The title matters less than the outcome. You'll still do the ground work. You'll still be the one doing the outreach, handing out the review cards, closing the deals. But having someone in your corner who's been through it — who can help you figure out what to measure, where to spend, and how to position — that changes the trajectory.
The best people I work with treat these engagements the same way: you're paying for someone who's made these mistakes before, so you don't have to make them for the first time on your dime.
If you're not sure which model fits, that's a conversation worth having. No pitch, no pressure — just an honest assessment of where you are and what you actually need.
Book a call: https://cal.com/mdostal/meet

The Architecture Behind a 6,000% Throughput Improvement at Hertz

Mathew Dostal — Mon, 30 Mar 2026 20:22:51 +0000

Hertz was a nearly $10 billion company running on technology that its own CEO would publicly call "30 to 40 years old." Underneath it: 1,800 IT systems, six database vendors, 30 rental processing systems, and a core built on IBM AS/400 mainframes running COBOL. Adding a single new product required 18 separate system changes. Meanwhile, Uber and Lyft had captured over 70% of corporate ground transportation spending on expense reports — up from near zero just a few years earlier. The legacy platform wasn't just slow. It was an existential liability.

Hertz had already spent $32 million with Accenture on the digital transformation. The result was a website that never went live and code so riddled with defects that every line of frontend work had to be scrapped. Accenture's code couldn't even extend to the other brands — it was built specifically for Hertz when the whole point was a unified platform across Hertz, Dollar, Thrifty, and Firefly. When Accenture was fired, IBM came in through the Cloud Garage with business partners to pick up the pieces. I was there from day zero as a developer on the rate engine. When the system needed to go further — when it needed to actually scale — they realized I was the one to take it over. The rate engine became the piece I owned: one system serving all four brands, handling every pricing query across 10,000+ locations worldwide.

The Problem: Death by a Thousand Queries

We released the first version of the new rate engine and were told it was doing about 300 requests per second with a p90 of over a minute and a worst case often around 3 minutes — roughly the same as the legacy system we were replacing. That was the moment. Same throughput, cleaner code, but no actual improvement in capacity. So we went after it. The actual scale tells you why we had to — a global fleet of nearly 700,000 vehicles across 10,000+ locations, four brands, millions of pricing queries per day, with localized rates that change constantly based on market conditions, inventory, promotional windows, and regional demand curves. Hertz would ultimately commit more than $400 million to a multi-year technology transformation — and this was after the failed Accenture engagement.

At that RPS, the math doesn't work. It never worked. The system had been held together by caching patches and operational workarounds long enough that the seams were showing everywhere. The worst case could bog down the entire system if the multiple instances hit it repeatedly or it would backlog fetches.

The architecture was synchronous throughout. Every rate query hit the database directly. There was no meaningful tiered caching strategy — requests that came in a millisecond apart would both go all the way to storage rather than the first populating a cache and the second hitting it. During normal load, this was survivable. During holiday weekends or promotional events, it was catastrophic. The database would saturate, latency would spike, and the whole thing would cascade — a queue of requests backing up behind a storage layer that couldn't drain fast enough.

The business impact was direct. Abandoned bookings during peak periods aren't recoverable revenue — a customer who can't complete a reservation on a Friday afternoon before a long weekend books with a competitor. There's no email drip campaign that fixes that. The window closes.

That's what we were solving for. Not a bug fix. Not a performance tuning pass. A ground-up rearchitecture of how rates were stored, served, and kept current — replacing COBOL-era assumptions with patterns that could handle the actual demand.

The Architecture We Built

The core insight that unlocked everything — the realization that changed the entire architecture — was this: you don't need strong consistency for rate shopping.

When a customer is comparing rental prices, they're not executing a financial transaction. They're doing reconnaissance. The rate they see on the results page doesn't need to be the exact rate stored in the primary database at that precise millisecond. It needs to be accurate within a few seconds, reflect the correct pricing tier, and load fast enough that they don't leave. Eventual consistency with sub-second propagation across regions is indistinguishable from strong consistency to a human being browsing rental options.

Once we accepted that — once we stopped treating the read path like it needed ACID guarantees — the constraints changed entirely. We could cache aggressively. We could separate the read path from the write path. We could build for the actual SLA the use case required rather than the theoretical SLA we'd been designing to by default.

The architecture we landed on had three main components working together:

Cloudant (IBM's distributed CouchDB) as the document store, with rate-related data sharded into documents by location, date, and discount code. Redis sat in front for the read path, with a CDC stream from Cloudant that pushed changes to Redis as they happened — no cache misses on rule data, no manual invalidation.

An event streaming backbone via Kinesis for pricing propagation from the Rate Management System (RMS) into the write path.

A clean separation between the read path and the write path — two distinct microservices (HRE and HRE-Update) that could be scaled, deployed, and optimized independently.

Read path sustained >3,000 reads/sec at p95 around 30ms. Write path handled >2,500 pricing writes/sec with sub-second cross-region propagation. The 6,000% throughput improvement wasn't from one optimization — it was from attacking the architectural constraints that had artificially capped everything.

Read Path: 3,000+ Operations Per Second

Redis sat in front of everything on the read path, but it wasn't a passive cache waiting to be populated by requests. We used Cloudant's Change Data Capture (CDC) stream — a built-in feature of CouchDB-based databases — to push updates into Redis proactively. When the data around a rate changed in Cloudant — a discount code, a corporate agreement, a promotion timeframe, a sell rule — the CDC stream fired, and a cache manager process picked up the change and updated the corresponding Redis keys. The read path never had to wait for a cache miss to discover new data.

The key design that made this work was how we structured the cache keys — and what we chose to cache versus what we didn't. Rates changed too frequently to cache. But rates were useless without knowing which of the tens of thousands of rules applied to a given request. That filtering — figuring out which rules, discounts, promotions, and eligibility criteria applied to a specific location, account, date, and car type — was the expensive part. Not the rate lookup itself.

So we did that filtering work upfront. We grouped all applicable rules, benefits, promotions, and eligibility criteria by location and corporate discount code into hashed key structures in Redis. Everything for LAX went together. Everything for a corporate code like GMC or SLAYER went together. When a rate shop request came in, we pulled the pre-filtered rule set from Redis in one fetch, then made a single targeted database call for the actual rate. Instead of making hundreds of individual rule lookups per request, we'd already done that work when the underlying data changed — not when the customer was waiting.

Actual Redis key format: LOC~RLOC~Date:RC~CarType~DiscCode (e.g., LAX~LAT~2020-01:RC001~CCAR~D). Diagram shows simplified grouping — keys are compound strings encoding 6 dimensions.

This grouping strategy was rooted in a frequency asymmetry that most caching designs miss. Corporate discount codes change maybe once a year, when the contract comes up for renewal. Rate codes change thousands of times per second across the fleet. By caching the slow-moving data (account rules, location config, benefits, sell rules) aggressively with longer TTLs, and treating the fast-moving data (individual rate prices) as the thing that came from the database at request time, we dramatically reduced the volume of work per request without sacrificing freshness where it mattered.

The decision about what to cache and what not to cache matters as much as the caching infrastructure itself. We cached everything except the rates — the rules, discounts, promotions, corporate eligibility, and location-specific criteria that determined which rate applied. The rates themselves changed too frequently and needed to come from the source of truth. But by pre-computing and caching everything that surrounded the rate, we turned what had been hundreds of database calls per request into one Redis fetch plus one targeted DB lookup. That's where the 6,000% came from.

At 10,000+ locations with localized pricing variations, the cache hit rate on rule data during steady-state operation was the key metric. When the cache is absorbing the filtering load, the underlying storage layer only handles the targeted rate lookups it was designed for. When the cache miss rate climbs, you're back to the old problem. We monitored that ratio carefully.

Write Path: 2,500+ Operations Per Second

The write path had its own evolution story, and understanding where it started makes the final architecture more meaningful.

In the initial version, the Rate Management System (RMS) — the internal tool where Hertz's revenue team configured rates, bundled discounts, and promotional pricing — pushed updates to our write service (HRE-Update) via direct REST calls. Thousands of them. RMS would compute a new rate structure and fire off HTTP requests to HRE-Update, which had to accept them, validate them, and persist them to Cloudant. Under normal load this was manageable. During a rate restructuring event — say, adjusting pricing across all West Coast locations for a holiday weekend — the volume would spike to the point where HRE-Update couldn't keep up. I built a custom queuing system in Cloudant to buffer the backlog — two queue types (account-promo and generic), with the ability to spin up separate queue workers per document prefix. We deployed over 20 of them, partitioned by letter, location, or doc type, so they could process in parallel without parent/child conflicts. Under steady-state load, they kept up. The problem was burst scenarios — when a major corporate discount code changed, or during initial spin-up, locations like LAX and NYC that represent a disproportionate share of rate rules and rental volume would backlog badly. A "parked docs" mechanism handled failed inserts for retry, but the fundamental issue was that the queuing system was only as fast as the REST pipeline feeding it. Every update was a full HTTP request-response cycle — connection establishment, headers, serialization, acknowledgment, teardown. At thousands of updates per second, that overhead alone was eating a significant chunk of throughput.

The breakthrough was moving to event streaming — and honestly, just dropping the full HTTP stack was a massive speed improvement on its own. No more connection establishment, header negotiation, serialization overhead, and acknowledgment round-trips on every single update. We'd originally designed around Kafka, but AWS cut a deal that made Kinesis the practical choice. The architecture shifted: RMS published rate changes to Kinesis streams, and HRE-Update consumers pulled from those streams at their own pace. This decoupled the write path from the source system entirely — RMS didn't need to care whether HRE-Update was keeping up, and HRE-Update didn't need to handle burst REST traffic anymore. The burst scenarios that had backlogged the custom queues — corporate discount code changes hitting LAX and NYC simultaneously — were now absorbed by the stream buffer instead of hammering application-level queues.

With Kinesis in place, we could do something the REST-based approach never allowed: geo-routed updates. Kafka would have given us partition-by-key routing and consumer groups natively — with Kinesis we had to build that ourselves, encoding the routing metadata in the messages and writing custom consumer logic to select regional streams. More work, but the economics made it the right call.

The geo-replication was actually a two-layer strategy. Kinesis streams wrote to each regional cluster, targeting the nearest zone to the rental locations. But Cloudant also had its own built-in replication — CouchDB's multi-master replication protocol, which we used to sync data across regions as a second propagation path. We could control the replication direction and shard it, so EU and Asia data replicated independently from US data. The US was split into East, Central, and West zones. Ireland was one of the first international rollouts — we didn't do a full global deployment, focusing on US and EU.

A rate update for LAX hit the West Coast cluster first via Kinesis, while Cloudant replication propagated it outward to other regions. A rate update for a Dublin location hit the Ireland cluster first. This meant the region most likely to serve that rate got the update fastest through two independent channels — stream routing for speed, database replication for durability.

The eventual consistency model meant that a pricing update written to the stream would appear in the read-path cache within sub-second latency under normal conditions. Not immediately — but close enough that the gap was invisible to customers and acceptable to the business. When we framed it that way, the objection to eventual consistency disappeared. The alternative — synchronous writes propagating to every cache layer before acknowledging the update — would have strangled the write path at the throughput we needed.

The consistency vs availability trade-off was explicit and documented. We chose availability for the read path and eventual consistency for writes. The system would serve slightly stale pricing data for a brief window after a price change rather than block reads while writes propagated. For rate shopping, that's the right call. For a final booking transaction, you validate against fresh pricing data before completing the reservation — different code path, different consistency requirements.

p95 Under 30ms: The Latency Story

Averages lie. This is not new information, but it bears repeating because teams that optimize to average latency and ignore the tail will discover their mistake during peak traffic.

At 3,000+ reads per second, a p95 of 30ms means 95% of requests complete in under 30ms. The remaining 5% — 150 requests per second at steady state — are the ones you need to understand. What causes them? Where's the time going? What does that tail distribution look like under load?

For us, the tail latency was dominated by two things: cache misses that fell through to Cloudant, and connection establishment overhead during burst traffic. Both were solvable.

Connection pooling eliminated the burst overhead. Instead of establishing new connections to Redis and Cloudant under load, we maintained warm connection pools sized for peak concurrency. The connection establishment latency — which is small in isolation but adds up when you're handling thousands of requests per second — stopped contributing to the tail.

Strategic denormalization did the most work. By storing precomputed rule and eligibility summaries at the cache layer — grouped by that location/account key structure I described earlier — we eliminated the assembly cost at query time. A request for LAX pricing data retrieved a single pre-built document containing all applicable rules and eligibility criteria rather than joining dozens of individual lookups under load. The p95 improvement from this alone was significant.

Worst-case latency stayed well within 500ms even under extreme load — graceful degradation rather than cascading failure. The system had explicit shed-load behavior: under sustained overload, it would deprioritize less-time-sensitive work rather than blocking the entire request queue. That's the difference between a system that bends and one that breaks.

The Team Behind It

I need to say this clearly: I didn't do this alone. Not even close.

The hashing strategy that made the entire cache key grouping work — the shared hash values that let CDC correctly update, delete, and maintain the grouped object lists underneath — that was Jerry (Gerardo Leon). He figured out how to structure the hashes so that related items could be fetched together and updated together consistently. That idea is the foundation the whole read path is built on. IBM had their performance infrastructure guys assisting as we built out the k8s and had to scale the tests to test the scaling of the infra.

We had two Aarons on the team (yes, two). Taffy was a beast of a software engineer and the metalhead who added SLAYER as a test discount code — it stuck. Aiden set up some of the toughest testing infrastructure I've worked with, building end-to-end performance verification on our event streams that let us prove the system was solid, not just fast. Don Matthews helped clean up the Kinesis streaming layer. Andrew, Dominika, and Layne were in the trenches through the hardest phases. The original build had another Matt, Ravi, Steve, and others who laid the groundwork before we took it to the next level. There are many I'm forgetting who deserve a callout. Feel free to add a comment and add more!

The testing deserves its own callout. We didn't just have unit tests — we had integration tests, end-to-end tests, and performance tests with full reporting at every step. I rewrote our entire e2e suite from JMeter — which was painfully slow, spinning up and tearing down JVMs for every run — into Taurus with BlazeMeter. Optimized, parallelized, maintained in YAML files instead of brittle XML. I did that rewrite on a flight to France because it was bothering me that much and wasting time on tests that take too long to run means that you never maintain them.

We had complexity checkers throughout the codebase and refactored often. We built a way to print full decision reports when someone shopped a rate — every rate that went into the calculation, what came out, and why. You don't take a system this complex live across four brands without being able to verify every decision it makes. The old system followed its own logic built over decades, and ours didn't replicate it exactly — it was a new architecture with new patterns. The only way to prove it was correct was to test it at every layer and make the reasoning visible.

This was one of the best teams I've ever worked with. If I've gotten any of the details wrong here — it's been a while — feel free to call me out in the comments.

Lessons for Your Next Performance Overhaul

There are a few generalizable things from this work that I've found useful across every system I've touched since.

Measure before you optimize. The instinct when a system is slow is to start tuning the code. The actual first step is instrumenting the request path well enough to know where the time is going. At Hertz, the problem wasn't slow code — it was a synchronous architecture making too many round trips to storage. No amount of code optimization would have moved that number by 6,000%.

Caching solves most read problems. Async solves most write problems. This is the 80/20 of performance work. Before you reach for sharding, horizontal scaling, or re-platforming, understand whether your read path has a caching strategy and whether your write path is blocking on things it doesn't need to block on. Most systems I've seen haven't exhausted either of those levers when they start talking about infrastructure investment.

Look for frequency asymmetry. Not all data changes at the same rate. Corporate discount rules changed annually. Individual rates changed thousands of times per second. Caching everything with the same TTL wastes either freshness or compute. Match your invalidation strategy to how often the data actually moves.

Know what consistency level your use case actually requires. Strong consistency is expensive. It's the right choice for financial transactions, inventory commits, and anything where two systems acting on stale data produces a real-world problem. It's not the right choice for read-heavy use cases where the cost of eventual consistency is a user seeing data that's two seconds old. Be explicit about which category your system falls into. Default assumptions here are what cost the Hertz legacy system its throughput ceiling.

Graceful degradation is an architecture decision, not a fallback. Systems that fail catastrophically under load are systems where nobody made explicit decisions about what happens when limits are hit. The decision to shed load rather than cascade failures was made in the design phase, not after an incident.

Let the architecture evolve. We didn't start with Kinesis and geo-routed updates. We started with REST calls and a custom queue. Each iteration solved the most pressing bottleneck and revealed the next one. The final architecture was the product of multiple phases — stabilizing the legacy system, building a parallel read path, migrating traffic gradually, and decommissioning the synchronous path once the new one had earned trust under production load. That sequence matters. You don't pull the old system before the new one has proven itself.

If you're sitting on a system that's hitting its throughput ceiling — legacy rate engines, pricing systems, high-read APIs with inadequate caching — or if you're making an architectural bet right now that could use a second opinion, I do 30-minute architecture calls at cal.com/mdostal/meet. No pitch. Just a real conversation about the problem.

I Built a 35-Agent AI Coding Swarm That Runs Overnight

Mathew Dostal — Fri, 20 Mar 2026 21:18:59 +0000

Follow-up to *The Week I Stopped Coding*

A month ago, I wrote about the moment I stopped coding and started orchestrating AI agents. That post was about the shift — the emotional and philosophical pivot from developer to director.
This post is about what happened after.
I built a system. Two physical machines on my home network, 14 containers, 35 concurrent AI coding sessions, and a 5-layer memory architecture that teaches agents not to repeat each other's mistakes. It scans my project management board every two minutes, picks up tickets, creates isolated git worktrees, spawns Claude Code sessions, writes code, creates pull requests, and updates ticket statuses — all without me touching a keyboard.
It processes 20-40 tickets overnight. I wake up to PRs.

I've Seen This Before

In 2015, I was drawing CI/CD pipeline diagrams on whiteboards at Kohl's. Build → Test → Publish → QA → Deploy → Stabilization → Release. The full continuous delivery loop, managed by hundreds of people across multiple teams — developers, QA engineers, DevOps, project managers, release coordinators.

I look at that diagram now and I see my swarm's ticket lifecycle. The stages are identical. Linear scan is the ticket intake. The Claude Code session is the development cycle. Sub-agents run the build, test, and review gates. gh pr create is the publish step. Vercel auto-deploys on merge. Prometheus watches the stabilization metrics. The only difference is who's executing each stage.
In 2015, that loop required a branching strategy document that went through 11 revisions, a deployment runbook with IBM coordination steps, Chef cookbooks for node management, and a standing army of engineers. In 2026, it's one Node.js script and a vector database.
I'm not saying the people don't matter — they built the institutional knowledge that's now encoded in CLAUDE.md files and Qdrant collections. But the execution layer has fundamentally changed. The same pipeline I designed for human teams a decade ago now runs autonomously on two machines in my house.

The Hardware: Two Machines, One Network

No cloud compute for the swarm itself. Two machines on my home network — a Linux desktop and a Mac Studio I picked up for mobile builds.

Dragon (desktop) — Arch Linux, AMD ROCm GPU, 14 Podman containers. This is the primary orchestration node. It runs the director, the dashboard, Qdrant for vector search, Ollama for embeddings, and the full Prometheus/Grafana monitoring stack. 15 concurrent session slots.
Hive (Mac Studio) — The only machine with Xcode, iOS simulators, Android emulators, Fastlane, and Maestro for mobile QA. 20 concurrent session slots. Three macOS LaunchAgents handle its director, worker, and git-watcher processes.
They talk over SSH and a shared Qdrant instance. The desktop pushes build tasks to the hive's Qdrant — I use the vector database as a task queue because macOS LaunchAgents get EHOSTUNREACH when connecting back to the desktop IP. Flipping the direction solved it.
Total capacity: 35 concurrent AI coding sessions across 6 repositories.

The Three Layers

The architecture has three distinct layers, each operating independently.

Layer 2: Ops Runner. Infrastructure health. Zombie process detection every 15 minutes. TTL sweeps that mark tasks stuck over 30 minutes as failed. Metrics export to Prometheus — 43 metrics every 5 minutes. A React dashboard with 6 tabs that gives me visibility into what the swarm is doing. Without this layer, you don't know if the swarm is working or burning money.
Layer 3: Directors. This is the AI layer. Two directors — Dragon on the desktop handling backend and infrastructure (event-api, game-library, venues, monitoring, ops), Hive on the Mac Studio handling frontend and mobile (shindig, website). Each scans Linear every 2 minutes, routes tickets to the correct repo, creates git worktrees for isolation, and spawns Claude Code sessions.
The director is about 2,400 lines of Node.js. It's the single most important piece of the system — and in major need of refactoring. Now that I'm wrapping back around to making the ops swarm self-sufficient, it needs to follow its own linting and testing protocols. The irony of an autonomous coding system that doesn't enforce code quality on itself is not lost on me.

How a Ticket Becomes a PR

Linear scan (every 120 seconds). The director queries for tickets in Agent Queue, QA Queue, QA Testing, or In Review. It checks attempt history — each ticket gets 3 shots before it's skipped.
Ticket routing. Title prefix [website] goes to the website repo. No prefix? Keyword matching. The detectRepo() function is blunt but effective:

function detectRepo(issue) {
  const text = `${issue.title} ${issue.description || ''}`.toLowerCase();
  // Check explicit repo names first
  for (const repo of ['shindig', 'venues', 'event-api', 'game-library', 'website', 'monitoring']) {
    if (text.includes(repo)) return repo;
  }
  // Mobile/app keywords → shindig
  const shindigKeywords = ['maestro', 'e2e test', 'testflight', 'app store', 'play store',
    'fastlane', 'ios', 'android', 'mobile', 'kotlin', 'swift', 'xcode', 'gradle',
    'composable', 'jetpack', 'kmp', 'multiplatform', 'simulator', 'emulator',
    'firebase auth', 'deep link', 'push notif', 'in-app purchase', 'storekit',
    'billing', 'aab', 'ipa', 'bundle id', 'provisioning', 'signing key',
    'app group', 'widget', 'crashlytics'];
  for (const kw of shindigKeywords) {
    if (text.includes(kw)) return 'shindig';
  }
  // Ops/infra keywords → ops
  const opsKeywords = ['swarm', 'ops', 'ci/cd', 'pipeline', 'deploy', 'config',
    'infrastructure', 'docker', 'podman', 'container', 'github action',
    'orchestrat', 'director', 'dashboard', 'grafana', 'prometheus',
    'script', 'automation', 'devops', 'agent', 'memory', 'qdrant',
    'terraform', 'gcp', 'cloud run', 'fly.io', 'vercel'];
  for (const kw of opsKeywords) {
    if (text.includes(kw)) return 'ops';
  }
  return 'unknown'; // skip rather than routing to wrong repo
}

Fair warning: this is quick, dirty, get-it-running code. Not pretty. It works, it's being refactored, and I'm showing it because the pattern matters more than the polish.
This was the first pass. It's not sophisticated. It doesn't need to be. The title prefix catches 80% of tickets. The keyword fallback catches most of the rest. Unknown tickets get skipped — better to miss a ticket than route it to the wrong repo.
Since then, the routing has gone through a full arc. I added Linear labels, repo-specific tags, and more complex matching rules. It got more sophisticated — and harder to maintain. Now I'm simplifying back down: labels and tooling handle the routing upstream, so the director doesn't need 50 keywords to figure out where a ticket belongs. The lesson is the same one every system learns — the first simple version works, the complex version works harder, and the final version is simple again on purpose.

Worktree creation. git worktree add creates an isolated directory with its own branch: agent/fir-{ticketId}. Multiple tickets for the same repo can run in parallel without stepping on each other. This is essential — without worktree isolation, concurrent agents create merge conflicts on every commit.
Session spawn. The director launches Claude Code with the ticket description, all comments, the repo's CLAUDE.md rules, and RAG context from prior agent learnings — all injected into the system prompt. Model starts at Sonnet. If it fails twice, it auto-escalates to Opus.
Execution. The Claude Code session reads code, plans an approach, and spawns its own sub-agents — a researcher (Haiku), a coder (Sonnet), a tester (Haiku), and a reviewer (Haiku). Each sub-agent gets a fresh context window. No context pollution from the parent.
PR creation. The session commits, pushes, creates a PR targeting the repo's base branch, and updates the Linear ticket status.
Cleanup. Worktree removed. Pool slot freed. Next scan picks up more work. The director also batches up to 4 tickets per repo into a single session, so the agent has broader context and can address related issues together. A typical ticket takes 5-60 minutes depending on complexity. The director fills available pool slots continuously. By morning, I have a stack of PRs to review.

Inside the Director: processTicket()

The heart of the system is processTicket(). This is the function that takes a ticket from the queue and turns it into a coding session. Here's what happens under the hood — model escalation, worktree creation, rate limit detection, and the failure classification that took me weeks to get right:

async function processTicket(ticket, config, state) {
  let worktree;
  try {
    // Model escalation: after N failed attempts, upgrade to stronger model
    const prev = state?.ticketsWorked?.[ticket.identifier];
    const failedAttempts = prev ? prev.attempts : 0;
    const threshold = config.escalationThreshold ?? ESCALATION_THRESHOLD;
    const escalated = failedAttempts >= threshold && config.model !== config.escalationModel;
    const ticketConfig = escalated
      ? { ...config, model: config.escalationModel }
      : config;

    if (escalated) {
      log(`MODEL ESCALATION: ${ticket.identifier} failed ${failedAttempts}x ` +
          `on ${config.model} → upgrading to ${config.escalationModel}`);
    }

    acquireLock(ticket.identifier, ticket.repo);
    worktree = createWorktree(ticket.repo, ticket.identifier);

    const result = await runClaudeSession(ticket, worktree.worktreePath, worktree, ticketConfig);

    if (!result.success) {
      const tail = result.output.slice(-300);

      // Detect rate limit OR auth failure errors
      const isRateLimit = RATE_LIMIT_PATTERNS.some(p => p.test(tail));
      const isAuthFailure = AUTH_FAILURE_PATTERNS.some(p => p.test(tail));
      const isFastFail = result.duration_ms < FAST_FAIL_THRESHOLD_MS && !result.timedOut;

      // Fast failures (< 10s) are never real work — don't burn attempts
      if (isFastFail) {
        result.rateLimited = true;
        log(`FAST FAIL: ${ticket.identifier} died in ` +
            `${Math.round(result.duration_ms / 1000)}s — not counting as attempt`);
      }

      if (isRateLimit || isAuthFailure) {
        if (!rateLimitState.detected) {
          rateLimitState.detected = true;
          // Parse reset time, default to 4am Central
          const resetMatch = result.output.match(
            /resets?\s+(\d+)\s*(am|pm)\s*\(([^)]+)\)/i
          );
          // ... calculate sleep duration, auto-pause director
          notifySlack(`:warning: Director hit rate limit. Auto-pausing until reset.`);
        }
      }
    }

    return { ticketId: ticket.identifier, success: result.success, ... };

  } finally {
    releaseLock(ticket.identifier);
    cleanupWorktree(worktree);
  }
}

Same caveat as above — this is working code, not clean code. The director script is 2,400 lines of Node.js that grew organically as failures taught me what it needed to handle. Refactoring it is on the roadmap.
The escalation logic is simple: attempts 1-2 run on Sonnet. Attempt 3 auto-upgrades to Opus. The rate limit detection is what took the most iteration — the system had to learn the difference between "this ticket is hard" and "the infrastructure is down." That distinction saved me from permanently skipping dozens of viable tickets.

How Do You Solve AI Agent Memory Loss?

AI agents are amnesiac by default. When I started, a Claude Code session got 200K tokens of context. Now Opus 4.6 has a 1M token window. Bigger helps — but the fundamental problem is the same. The moment the session ends, everything it learned disappears. The next agent assigned a similar ticket starts from zero and makes the same mistakes.
This is the fundamental problem of autonomous AI coding at scale. I solved it with 5 layers of memory, each serving a different persistence scope.

Layer 2: CLAUDE.md files. Checked into git at each repo root. Every agent session in a repo automatically loads these rules — branch patterns, build commands, quality standards, things the agent must never do. This is institutional knowledge encoded as configuration.
Layer 3: File memory. Claude Code's auto-memory system at ~/.claude/projects/. Over 20 topic files covering build pipelines, secret management, deployment procedures, lessons learned. Survives across sessions on the local machine.
Layer 4: Vector memory. Qdrant stores nearly 16,000 knowledge points and all agent outcomes. Before starting work, agents query for relevant prior learnings using semantic search. After completing work, they record what happened — success or failure, with context. This is how the swarm learns. A failed Gradle build gets recorded. The next agent searching "shindig android build" finds it and doesn't repeat the same mistake.
Layer 5: Linear tickets. The human-agent interface. I embed decisions directly in ticket descriptions and comments. Agents read both before acting. This is the only layer where I actively participate in the memory system.
The key insight: agents query Qdrant before starting work and record outcomes after finishing. A memory-agent container consolidates learnings every 6 hours. The swarm genuinely learns from its own history.

The Failures

This is the part that matters. Anyone can describe an architecture. The real story is in how it broke.

124 Pull Requests Overnight

I woke up to 124 open PRs on one repo. About 90 were duplicates — the same tickets processed over and over because the director never checked whether a PR already existed. It would scan Linear, find tickets in Agent Queue, spawn sessions, create PRs — but never move tickets out of Agent Queue. Next scan cycle: same ticket, new branch, new PR.
While I was closing duplicates, the hive director kept creating new ones. 7 more appeared during cleanup.
The fix: Dedup guard. Check for existing PRs before creating new ones. Move tickets to In Progress when work starts, In Review when the PR is created. Deterministic branch naming — no random suffixes.
The lesson: If your agent loop doesn't track what it already did, it will do everything twice.

Every Session Ran on Opus

I noticed rate limits being hit in hours instead of lasting all day. The director script never passed a --model flag. Claude Code defaults to Opus — the most expensive model. 747+ sessions ran on Opus before I caught it.
One missing CLI flag. Token consumption was 3-5x what it needed to be. A $15/day operation became $65/day.
The fix: Added --model sonnet as default. Model escalation: attempts 1-2 use Sonnet, attempt 3 auto-escalates to Opus. Immediate 3-5x cost reduction.
The lesson: Defaults matter enormously at scale. A single missing flag multiplied by 1,000 sessions turned a cost-effective operation into a budget-burner.

46 out of 46 Sessions Failed in One Cycle

The director launched 46 sessions. Every single one died within 1-20 seconds with "You've hit your limit." The director treated these as normal failures, incremented attempt counters, and after 3 "failures" permanently skipped each ticket.
By morning, 37 tickets were stuck in limbo and dozens were marked as "max attempts exceeded" — even though the failures had nothing to do with the tickets themselves.
The fix: Rate limit pattern matching. Auth failure detection. Fast-fail threshold — sessions dying in under 10 seconds don't count as real attempts. If >80% of a cycle's sessions fail fast, auto-pause the director and sleep until the rate limit resets.
The lesson: Your orchestrator must distinguish between "the task failed" and "the infrastructure failed." Burning ticket attempts on rate limits is like marking a restaurant order as "rejected by customer" because the kitchen caught fire.

The Agent That Sent Me on a Wild Goose Chase

A Haiku-model agent was assigned an Apple Sign-In bug. After a brief code scan, it confidently told me: "The Firebase Console needs com.firefly.shindig.dev added as an authorized domain."
I spent 20 minutes checking the Firebase Console. It was already correct. The .dev bundle ID doesn't even exist — the app uses com.firefly.shindig.ios.
The real problem was three code bugs the agent never found.
The fix: Verification protocol. Agents can't tell humans to change external config without file-and-line code evidence. Memory files take precedence — if memory says it's configured and working, the agent needs strong evidence before claiming otherwise. Config-level claims get escalated to Opus for verification.
The lesson: An agent that confidently sends you on a wild goose chase is worse than one that says "I don't know." Cheaper models need guardrails that prevent authoritative claims about systems they can't inspect.

OAuth in Containers: Three Login Attempts Destroyed

Claude Code authenticates via OAuth. Tokens expire and refresh automatically. I tried three approaches to get auth working in Podman containers:

Run claude login inside the container. Works until restart — credentials wiped.
Base64-encode credentials as an env var. The entrypoint script always decodes on startup, overwriting any token refresh.
Copy host credentials at build time. Stale by the time the container runs. The fix: Bind-mount the host credentials file directly into the container:

volumes:
  - ${HOME}/.claude/.credentials.json:/home/swarm/.claude/.credentials.json:rw

Login once on the host, container stays authenticated across restarts, rebuilds, and token rotations. Set the env var to empty string so the entrypoint skips injection.
The lesson: Live auth tokens must be shared, not copied. Any approach that snapshots a token creates a stale copy.

Five Zombie Detectors, All Broken

After the 124-PR incident, I discovered agent processes that finished work but never exited. They consumed memory and held file locks, preventing new sessions.
I had five separate zombie detection mechanisms. Not one caught the problem.
The bash script only checked for duplicates, not resource thresholds. The task-level checker only looked at output file freshness. The cleanup script explicitly skipped daemon processes. The ops runner only checked Qdrant task state. The container-based detector couldn't see outside its own namespace.
The fix: One detection mechanism that actually works end-to-end. Process-level zombie detection in the ops runner. Session timeout as a catch-all.
The lesson: Five bad detectors are not better than one good one. Each was built for a specific variant of the problem but had blind spots. The result was false confidence.

The Numbers

Over 7 weeks of operation:

Metric	Value
Total agent runs	6,500+
Repos covered	6
Peak day (March 5)	1,047 runs
Qdrant knowledge points	nearly 16,000
Prometheus metrics	43, exported every 5 minutes
Concurrent session capacity	35 (15 dragon + 20 hive)

The run counts are real — every Claude Code session triggers a lifecycle hook that logs the run to a daily JSON file. Those numbers come straight from the system.
The cost numbers I'm deliberately leaving out. The cost tracker I built was a placeholder designed for direct API usage — it assumes fixed token counts per model and multiplies by Anthropic's published per-token pricing. That gives you a number, but not a real one. A Sonnet session that refactors 40 files burns a lot more tokens than one that fixes a typo, and my tracker treated them identically.
I'm building the version that actually works — one that reads the local Claude Code session files where real token consumption is recorded. Those files exist on both machines. The data is there. I just haven't piped it into the dashboard yet.
What I can tell you is the operational picture: model routing matters. Starting every session on Sonnet and only escalating to Opus after two failures means the cheap model handles the majority of tickets. Sub-agents running on Haiku handle research, testing, and review at a fraction of the cost of having the main session do everything. The exact dollar amount is less interesting than the pattern — route to the cheapest model that can do the job, escalate only on failure.
All of this runs on Claude Code MAX, a flat-rate subscription. The rate limit resets at 4am Central. The director knows this and sleeps instead of churning.

What Would I Change About My AI Coding Swarm?

Start with dedup guards. The 124-PR incident and the 36-PR overnight should have been prevented by Day 1 architecture. Any autonomous loop needs to track its own history before it gets turned loose.
Log everything from the start. The event bus saved me multiple times during debugging. I wish I'd had it from Day 1 instead of adding it after the first disaster.
Test cleanup code like feature code. The broken worktree cleanup and the five zombie detectors both failed because cleanup code got less testing rigor than feature code. It needs more.
Auth health checks before session launch. The 75-second sessions and the 46-session cascade both came from auth failures that the director didn't detect. A single pre-flight auth check would have saved hours.

What's Next: From 35 Agents to an Extensible Swarm

35 concurrent sessions across two machines is where I am today. It's not where this is going.
The architecture was built to be extensible from the start — the director doesn't care what machine it runs on, only that it can reach Linear, GitHub, and a Qdrant instance. Adding a third machine means adding another director with its own pool of session slots. The Qdrant task queue already handles cross-machine dispatch. The memory layer is shared. A new node joins the swarm by pointing at the same vector database and ticket board.
The next step is making that real. Right now, spinning up a new director node requires manual configuration — env vars, repo paths, auth setup. I'm building a setup script that walks through first-run configuration and a repos.yaml config file that replaces the hardcoded repo mappings. Project management adapters so it works with Linear, GitHub Issues, or eventually Jira. The goal is: clone the repo, run setup.sh, point it at your ticket board and repos, and you have a director.
But the bigger idea is the hive mind.
Right now, each director operates independently. Dragon and Hive share a Qdrant instance for task dispatch and learnings, but they don't coordinate. If Dragon is rate-limited, Hive doesn't pick up the slack. If both directors claim the same ticket in the same scan cycle, the dedup guard catches it at PR creation time — which works, but it's reactive.
The next evolution is a coordination layer. A lightweight orchestrator that sits above the directors and manages the global pool: which tickets are claimed, which machines have capacity, where rate limits are hit, and how to redistribute work when a node goes down. Think of it as the difference between two independent teams that happen to check the same task board, versus a project manager who assigns work to teams based on who's available.
The memory architecture scales with this. Qdrant already supports multiple collections and namespaces. When a third machine joins, its agents query the same knowledge base. The learnings from Dragon's failed Gradle build are immediately available to a new node's agents. The swarm gets smarter as it gets bigger — every node contributes to the shared memory, and every node benefits from it.
There's also a problem nobody talks about: pipeline starvation. The swarm was idle last night — not because anything was broken, but because Agent Queue was empty for 5 of 6 repos. The director was healthy, auth was working, but there was nothing to do. Autonomous execution means nothing without autonomous work generation. That's the next frontier — a PM swarm that identifies technical debt, writes tickets, and feeds the coding swarm. Execution without intake is just an expensive idle loop.
I'm also closing the QA gap. Right now, agents write code and create PRs, but verification is still manual. Maestro flows for mobile, Playwright for web, triggered by the director after PR creation. The goal: ticket to verified PR with zero human intervention.
And the cost tracking needs to be real. The current tracker was a placeholder built for API billing — fixed token estimates multiplied by published pricing. I'm building the version that reads actual token consumption from Claude Code's local session files. The data is already there on both machines. It just needs to be piped into the dashboard so I can see what this actually costs at the per-session level.
I'm planning to open-source the director. The 2,400-line Node.js file is specific to my setup, but the pattern is general. The memory architecture is the piece I think has the most value for other teams — the 5-layer approach solves a real problem that every AI coding setup hits eventually. Agents forget. This system remembers. And when the extensibility layer is done, anyone with a spare machine and a Claude Code subscription can add a node to their swarm.
If you're building something similar — or thinking about it — I'd genuinely like to compare notes. This is new territory for everyone. The tooling is primitive, the failure modes are novel, and the patterns are still emerging. I'm figuring it out as I go, one 124-PR disaster at a time.
Book a conversation. No pitch, no pressure — just two people trying to figure out how to make AI agents stop creating duplicate pull requests.

Right-Sizing Your DevOps Stack

Mathew Dostal — Fri, 06 Mar 2026 21:20:17 +0000

Most DevOps problems aren't DevOps problems. They're people problems — someone got excited about Kubernetes before they had a second service, or someone hand-configured a production server and left on vacation.
I've watched teams spend weeks building Helm charts with custom ingress rules, horizontal pod autoscaling, and dedicated namespaces — to serve a static SPA. I've watched corporations burn entire sprints getting minikube to behave like a production cluster so they could "test locally" before deploying to a managed service that would have handled everything for them.
The tools exist. The cloud has matured. You don't need half of what you think you need. Here are the DevOps best practices I actually follow — the ones that keep my CI/CD pipelines fast, cheap, and out of my way.

Bind Your Code to a Cloud Provider and Walk Away

Vercel, Cloud Run, AWS Lambda, Fly.io — pick one that fits your stack and connect it to your repository. Install the GitHub app. Point it at a branch. Push code, it deploys. That's continuous deployment without the ceremony.
That's the whole setup for most projects. No build server. No artifact registry. No custom Docker images unless you actually need them. The provider handles builds, previews, rollbacks, and SSL.
If you need full-stack deployment — databases, background workers, and cron jobs on one platform — Render, Sevalla, or Northflank handle that without forcing you into a separate managed database tier. Railway fits here too if you're comfortable wiring up workers and cron yourself — it doesn't have dedicated support for those out of the box, but it's fast and cheap for everything else.
When I spin up a new project, this is step one. Not "how do I set up a CI/CD pipeline" — it's "which provider auto-deploys from my repo?" If the answer is Vercel or Cloud Run, I'm deployed before lunch.
Scale comes later. Cost optimization comes later. Right now, you're shipping.

Use GitHub Actions for Everything Else

I'll be upfront: I don't love GitHub Actions. The YAML gets unwieldy, debugging is painful, and the runners are slow. If you have the budget, Blacksmith is a drop-in replacement that runs your same workflows on bare-metal gaming CPUs — one line change in your YAML (runs-on: blacksmith) and builds run 2-4x faster. Or skip the middleman entirely and wire up your own webhook-triggered build server, similar to how providers like Vercel tie into your repo. But for most teams at this stage? GitHub Actions is simple, you get a generous free tier, and it covers the next step. Don't overthink it.
Your package.json scripts, Gradle tasks, Makefiles, Vite configs — these are your real build layer. GitHub Actions just triggers them. If your build works locally with pnpm test && pnpm build, it generally works the same way in Actions.
Once your auto-deploy provider handles the basics, GitHub Actions fills the gaps. Run your tests on push. Lint on PR. Build validation before merge. That's 90% of what teams actually need from a CI/CD pipeline. If you're already on GitLab, their built-in CI does the same job.
The mistake I see is teams building elaborate multi-stage pipelines with approval gates and artifact caching before they have more than one deploy target. It's never the solo founder doing this — it's the single dev team inside a larger org, or the product group with a few engineers who should have just grunted, Gradle'd, or Gulp'd their way through a simple pipeline. You're not Netflix. Run your tests, deploy your code, move on.

Turn Your Terminal History into Shell Scripts

Every DevOps pipeline starts the same way: someone typed a series of commands into a terminal that worked. Then they forgot them.
Run history. Look at the last 50 commands you ran to set up that server, configure that service, or deploy that release. Clean them up. That's your shell script.
I write shell scripts that call other shell scripts. Setup scripts that install dependencies, configuration scripts that set env vars, deployment scripts that pull and restart. None of this is glamorous. All of it is repeatable. I've replaced entire "deployment runbooks" with a single ./deploy.sh that took an afternoon to write and saved the team hours every week.
The difference between "I can deploy this" and "anyone on the team can deploy this" is a shell script with comments. Write it down. Automate the human element — because humans forget, skip steps, and get distracted. Your scripts don't.

Let Third-Party Providers Handle Scaling

You don't need to manage your own load balancer. You don't need to run your own Postgres cluster. You definitely don't need a self-hosted Redis instance on a bare EC2 box because someone "wanted full control."
Full control means your engineer is watching YouTube tutorials at midnight when it goes down on a Saturday.
Vercel handles edge caching and auto-scaling for most workloads. Fly.io makes multi-region straightforward. Cloud Run scales to zero when nobody's using it (with cold-start trade-offs for latency-sensitive apps). MongoDB Atlas manages your database. For serverless Postgres, Neon or Supabase give you a managed instance that scales to zero. For Redis, Upstash gives you serverless Redis without the 2 AM EC2 incident.
Use managed services until the bill is the problem. When the bill becomes the problem, that's a good problem — it means you have traffic. Optimize then, not before. The current generation of managed services makes this absurdly cheap at startup scale.

Lock Down Credentials Before You Have an Incident

I joined a team where they passed around SSH keys in Slack to their dev jump box — the same tunnel that had been open for years. Employees came and went, but without any access requirements beyond IP and an SSH cert, their database had been open to every prior employee, anyone who got one of those employees' old laptops, or any other leak. Worse, clients had things bound to that connection. Rolling the keys meant re-attaching everything that could interact with the database. The actual users had been running the root DB user through a top-level SSH box with root access for years. It took way too long and too many interventions to be able to start fixing that.
That's the cost of not thinking about credentials early. Use a secrets manager — AWS Secrets Manager, GCP Secret Manager, Vault, whatever your cloud provides natively. Workload Identity Federation (WIF on GCP, OIDC roles on AWS) eliminates long-lived service account keys entirely. Your workload proves its identity to the cloud provider directly. No keys in environment variables. No credentials committed to repos. No shared Google Doc with passwords.
I roll all keys from one location using bash scripts with RBAC controls. One script, one source of truth, one audit trail. If a key leaks, I know exactly where it was used and can rotate it in minutes, not hours.
This is the hack that prevents the 2 AM incident. Everything else on this list makes you faster. This one keeps you employed.

Infrastructure as Code — When You Actually Need It

I've watched a senior engineer spend two weeks writing Terraform modules for a Next.js app running on a single EC2 instance. The app had one environment and no reason it couldn't have been deployed to Vercel or Netlify — where builds, previews, rollbacks, and SSL come free. Instead, the Terraform added nothing except a state file someone would eventually forget to lock.
IaC earns its keep when you have multiple services that need to talk to each other, when you're managing cloud resources that drift if left unattended, or when compliance requires an audit trail of infrastructure changes. If your deployment is "push to main and Vercel handles it," you don't need Terraform yet.
When you do need it, version control your infrastructure definitions alongside your application code. Submit PRs for infrastructure changes — this is what the industry calls GitOps, and it works. The same review discipline that catches bugs in your API will catch the IAM policy that grants too much access. If you want to write infrastructure in TypeScript or Python instead of HCL, Pulumi is the mature alternative. If Terraform's BSL licensing concerns you, OpenTofu is the community-maintained fork.
The goal is reducing binding between services. IaC describes relationships and dependencies in code instead of in someone's head. When that person takes another job — and they will — the infrastructure is still documented.

Local Clusters for Testing (But Know Their Limits)

Minikube, kind, or Rancher Desktop are useful for one thing: making sure your containers actually start and behave correctly before you push them somewhere expensive.
They are not useful for performance testing. A local cluster on a developer laptop doesn't tell you anything about how your service handles 10,000 concurrent connections. Performance tests need scaled environments that match production, or they're just giving you false confidence.
Use local clusters to validate configuration: environment variables load correctly, services discover each other, health checks pass. Then deploy to a real environment for anything load-related. If you attempt load testing on the same machine running the service, you're testing your laptop's hardware, not your application's scalability. Your tests have to scale with the environment, and the load source has to be separate from the target — otherwise you're measuring your own ceiling.
I've seen teams spend weeks tuning local Kubernetes configurations that had zero relevance to their actual production setup on EKS. The local cluster was a comfort blanket, not a testing tool.

When the Complexity Is the Point

Everything above is about cutting complexity you don't need yet. But sometimes the complexity isn't optional — it's the job.
If you're on a government contract and you need to meet Section 508, GDPR, SOC 2, and PCI simultaneously — you're not over-engineering your pipelines. You're meeting the requirements that let you keep the contract. That level of compliance demands real DevOps infrastructure: automated audit trails, enforced access controls, reproducible builds, and deployment gates that prove you did what you said you did. There's no SCP-and-cron shortcut to PCI compliance.
Same applies when the integrations have genuinely scaled. A product with 30 third-party integrations, four environments, and a deployment pipeline that touches three cloud providers isn't over-engineered — it's managing real complexity. The mistake isn't having sophisticated tooling at that point. The mistake is having sophisticated tooling when you have two developers and a single Vercel deployment. If this sounds like your situation, I've helped companies at both ends of that spectrum.
And then there's mobile. Mobile DevOps is famously under-done — even at companies that should know better. The signing credential dance with Apple alone is an entire discipline: provisioning profiles, distribution certificates, entitlements, and an App Store review process that treats your CI/CD pipeline as an afterthought. Most mobile releases still ship from someone's laptop because the alternative — properly automating the build-sign-upload chain — requires fighting Xcode's tooling at every step. Tools like Fastlane and Bitrise have made it better, but "better" still means fragile. Android is more forgiving with Gradle and signing configs, but Play Store deployment automation has its own sharp edges. The whole space deserves more attention than it gets, and I'm planning a deeper write-up on mobile DevOps — including what it looked like before IBM tried to make "Mobile First" a platform with CI/CD baked in, and why most of those lessons still apply.
Then there's the closet problem. You know the one. Over three years, someone spun up service accounts for every integration. SSH keys got passed over Slack. API tokens live in a GitHub wiki that five former employees still have access to. The IAM policy looks like it was written by committee — because it was. Nobody knows which service accounts are still active, and nobody wants to find out by turning one off.
Ideally, you wipe them and re-roll every key. In practice, that's a project — sometimes a long one. You're untangling dependencies between accounts, services, and secrets that were never documented because "we'll clean this up later" is the most consistently broken promise in engineering.

RBAC and Documentation: The Two Things That Actually Scale

When you're growing a team, two investments pay for themselves immediately: RBAC and documentation.
Role-Based Access Control isn't glamorous. Nobody writes a blog post about how excited they are to set up IAM policies. But the alternative — everyone has admin access because it's easier — is a ticking clock. It works until the first time someone accidentally deletes a production database, or until your SOC 2 auditor asks who has write access to your payment service and the honest answer is "everyone."
You already know about WIF from earlier in this post — the same pattern applies here. Enforce it with proper role boundaries and you've closed the biggest gap in most teams' access model.
I'll be honest about something: I use a swarm of AI agents to set up RBAC with WIF and roll keys across environments. Is passing instructions through LLM context windows the most secure thing I've ever done? No. The messages could theoretically be leaked. But here's the trade-off math — that risk is meaningfully lower than the alternative I've walked into at a dozen companies: SSH keys shared in Slack DMs, API tokens in public GitHub repos, service account credentials in a shared Google Doc titled "DO NOT SHARE." The bar isn't perfection. The bar is better than what you're doing now.
The other thing that actually scales is documentation. Not the 200-page runbook nobody reads. The kind where every pipeline has a README that answers three questions: what does this do, what credentials does it need, and what happens when it breaks. When someone new joins the team and can deploy to staging on day two because the docs told them how — that's the ROI. When your compliance auditor asks how deployments work and you can point them to a living document instead of scheduling a meeting — that's the ROI.
RBAC keeps you from getting in trouble. Documentation keeps everyone on the same page. Neither is exciting. Both are the difference between a team that scales and a team that breaks at 15 people.

The Biggest Hack

Everything above shares a common thread — reduce what a human has to remember, decide, or manually execute.
The biggest hack isn't a tool. It's good documentation and focusing your automation on the thing that costs the most when it fails — or eats the most engineer-hours when it doesn't. If something is requiring three manual work days just to move things around — document it fully and then automate it piece by piece.
Every manual step in your deployment is a step where someone can make a mistake, skip a check, or do something slightly different from last time. Automate the human element. Add monitoring and observability so you know when something breaks before your users tell you — even a simple health check endpoint and an uptime ping goes a long way.
Simplify first. Keep it secure. Scale as your project scales — not before.

If you're watching your team drown in infrastructure they didn't need to build yet, this is exactly the kind of decision I help companies get right. Right-sizing your DevOps for your actual stage is one of the highest-leverage calls a technical leader can make. See how I work with startups and scaling companies.

What Does a CTO Actually Do?

Mathew Dostal — Fri, 06 Mar 2026 21:20:04 +0000

What Does a CTO Actually Do?

I’ve made the Postgres versus MongoDB call more times than I can count. At Firefly Events, we chose MongoDB deliberately for its timeseries data capabilities, native geo-sharding, and the eventual consistency model we’d successfully applied in earlier work at Hertz for localized geo records. A document store gave us the schema flexibility we needed to ship MVPs and PoCs fast, with clear levers like sharding and tunable consistency we could pull later as we scaled. Postgres with TimescaleDB could have handled parts of it, but MongoDB gave us startup flexibility without locking us into premature schema decisions.
I’ve also been in rooms at large companies where directors debated the same question for entirely different reasons: ease of local-first development, sync between distributed systems, replication strategies, or simple team familiarity. The right answer changed every time, but it was never actually a hard call. You gather context, you know the tradeoffs, you decide. Any senior engineer can give you a solid answer for $150 an hour.

“The database question, the framework question, the cloud provider question—these are commodity decisions. What a CTO is actually there for is something entirely different.”
And if your CTO is spending most of their time on that, you hired them wrong, or they haven’t figured out their real job yet.
Let me tell you what the job actually is.

The Translation Layer

The single most important thing a CTO does is translate.
On one side: the business. Revenue, burn rate, roadmap, competitive positioning, investor expectations. Business people speak in outcomes. “We need to launch by Q3.” “We need to cut costs by 20%.” “The competitor just shipped that feature; how long to match it?”
On the other side: the engineering team. Trade-offs, technical debt, dependencies, architecture constraints. Engineers speak in constraints and possibilities. “We can ship that fast, but we’ll accrue significant debt.” “The current architecture doesn’t support that without a rewrite.” “If we rush this, we’re going to be fixing it for the next two years.”

These two groups can sit in the same room and completely fail to understand each other. I’ve watched it happen at EY, through client engagements, and at early-stage startups. Smart, well-intentioned people on both sides, genuinely trying, and nobody’s getting through.
The CTO’s job is to bridge that gap, in both directions.
Translating from engineering to business means turning “we have significant technical debt in the payments module” into “we have a time bomb that will cost us a full sprint to defuse, and here’s why now is cheaper than later.” It means explaining that taking three months isn’t a sign of incompetence; it’s the reality of maintaining a production system that thousands of customers depend on.
Translating from business to engineering means protecting the team from the chaos and urgency that naturally flows from a growth-stage company. When the CEO decides at noon on Friday that we need a new feature by Monday, that’s a leadership moment. You either take the heat for saying no, or you watch your engineers burn out chasing an impossible deadline. One of those choices is the CTO’s job. The other is abdication.
Good translation protects engineers from the business. It also protects the business from jargon, over-engineering, and engineers who build beautiful systems that solve the wrong problems.

The Strategic Layer: Decisions, Data, and Dollars

A CTO makes a lot of decisions. But treating them all as equal is a fast way to waste everyone’s time. The real work is in three key areas: strategic choices, data integrity, and cost architecture.

Build vs. Buy vs. Hire

This is a constant calculation. Do we build this feature ourselves, buy a third-party tool, or bring on someone who’s already solved this problem?
At Hertz, delivering a 6,000% throughput improvement—scaling from roughly 100 to 6,000 requests per second—was a clear “build-it-ourselves” situation. I recognized the event-driven architecture pattern from earlier work and adapted it to their specific context. You don’t outsource that kind of foundational systems work; the expertise and control are paramount.
At Frontiers Market, needing image-recognition capabilities at the edge for RTSP streams on Raspberry Pi clusters running offline, we evaluated commercial tools and found they all failed our edge workload and cost constraints. We built it ourselves. Different context, same conclusion, but you have to actually do the evaluation to know.

Negotiations and Cost Architecture

Every line item is a negotiation: cloud contracts, data providers, dev tooling. I’ve seen engineering leaders sign agreements with pricing structures that become existential threats eighteen months later. Understanding the cost architecture of what you’re building is not a finance function. It’s yours.
Cloud infrastructure costs are not an afterthought. I’ve seen startups get surprise bills that could fund two engineer salaries. More efficient systems cost less to run—the Hertz throughput work wasn’t just a technical win, it was a cost optimization story. That matters.
When you can articulate to your board exactly why the infrastructure budget went up 30% and tie it directly to growth, you’re a business leader. When you can’t, you’ve ceded control of a major cost center.
The hiring vs. outsourcing vs. tooling decision is one I make repeatedly. Do we hire a data engineer, buy a data platform, or invest in training the team we have? Do we bring on contractors for a defined sprint, or build the capability in-house? There’s no universal answer. But the CTO needs to be the one doing that math, not the CEO, not the CFO, not the VP of Product.

Becoming the Analytics Reality Check

This one stings a lot of startups: most “data-driven decisions” are not actually data-driven. They’re gut-driven with a thin veneer of analytics.
Here’s the pattern I’ve seen repeatedly: conversion rate drops 8% over a weekend. A nervous stakeholder wants to ship a dozen UX changes Monday morning to “fix it.” But if you had 300 visitors that weekend, you have no statistical signal. None. You’re looking at noise. A/B testing requires meaningful volume to reach statistical significance. Chasing a 10-user blip on a Saturday afternoon is how you introduce bugs to solve a problem that never existed.
The CTO’s job is to build a culture that distinguishes between vanity metrics and real metrics. Vanity metrics feel good: total signups, page views, monthly active users at a surface level. Real metrics tell you if the business is actually working: activation rate, retention cohorts, revenue per user, feature adoption among paying customers.
Even with real metrics, you have to ask the harder questions. If retention drops with 100 users, the question isn’t which feature to change; it’s whether you ever found product-market fit. At that scale, you genuinely don’t have the data to judge features. Your metrics spiking after a podcast mention only proves people will try your product, not that it delivers lasting value.
I spend a lot of time saying, “The data doesn’t support this yet.” It’s an unpopular sentence, but it’s part of the job.

The Ultimate Constraint: Executive Override

At Frontiers Market, this was a constant optimization. Small team, ambitious product scope, finite capital. Every architecture decision had a cost dimension. We built for efficiency not because it was elegant, because it had to work on limited hardware with limited budget. When those constraints were respected, they produced good engineering decisions.
But I’ll be honest about something: not every recommendation survives contact with the person writing the checks. At a startup, the CEO has the final say, and they always do. The tension between sound technical judgment and executive override is real and constant. You make the case, you document the tradeoffs, and sometimes the decision goes the other way anyway. At Frontiers Market, our architecture decisions were driven by efficiency because they had to be. Whether that thinking was applied consistently is a different story. That friction is part of the job too.
The efficiency curve matters too. There’s always a point where adding more engineers produces diminishing returns, sometimes negative returns as coordination overhead grows. A team of four shipping consistently beats a team of twelve arguing about architecture. Knowing when you’re on the wrong side of that curve, and being willing to say so, is a leadership skill.

The People Layer

This is the one that matters most, and the one most technical leaders under invest in.

Hiring Is Everything

Team composition is the single biggest variable in your success. More than architecture, more than stack, more than process. Get the people right and everything else is solvable. Get them wrong, and no amount of technical excellence can save you.
After hiring across Ascendant Technology, Avnet, Zilker Technology, EY engagements, Frontiers Market, and Firefly Events, what I’ve learned is that hiring the most technically impressive candidate in the room isn’t always right. I look for a specific combination: strong enough technical skills to get the work done, genuine problem-solving ability (not just pattern-matching), self-direction, and the ability to make the team better.
I’ve hired engineers who could whiteboard any algorithm you threw at them but were a nightmare to work with—hoarding knowledge and being dismissive in code reviews.
I’ve also hired engineers who weren’t the top technical candidate but could reason clearly, take ownership, and elevate everyone around them. The second type builds companies. The first type builds resentment.
Your first 10-15 hires define your culture for the next five years. That’s not an exaggeration. The norms that get established early—how you do code review, how you handle disagreements, how you respond when things break—these calcify. Hire careless people early and you’ll be fighting carelessness for years. Hire people who take ownership, and ownership becomes the default expectation.
For more on what I actually look for in engineering interviews: /blog/how-to-hire-engineers.

Talent Has a Price

If people are working under market rate for your vision, that investment must be recognized and honored. Not with vague promises about the future, but in how you treat them day-to-day, how honest you are about where things stand, and whether the upside you’re selling is real.
The cost of getting this wrong is high and compounding. Losing an engineer is expensive in recruiting costs, onboarding time, and lost institutional knowledge. More importantly, it sends a negative signal to everyone who stays. Hiring into a team with a bad culture reputation is extremely difficult. The best engineers have options, and they ask around.
If you’re asking someone to take below-market compensation for equity and mission, you owe them honesty about where things stand. You owe them respect for what they’re giving up. You owe them a real shot at the upside you’re describing. The alternative is a revolving door that costs more in the long run—in money, in culture, in what you’re able to build—than paying closer to market would have in the first place.
This isn’t just ethics. It’s math.

Protect People from Burnout

This part of the CTO job doesn’t get said plainly enough: your job is to protect the team.
Engineers break after too many weekends, too many all-nighters, too many sprints where “one more push” becomes the new normal. I’ve seen well-meaning founders say “funding is coming next month, just one more push” for twelve months straight. By the time the money arrives, half the team is gone or checked out.
That’s a CTO failure. The CTO is the person who’s supposed to read that pattern and say stop before the damage is permanent.
The mechanics are simple: real downtime between intense sprints, honoring commitments about scope and timeline, and noticing when someone is running on empty before they quit or implode. If you have offshore team members covering time zones, remember that’s people spending their nights working. It’s easy to forget that when you’re not seeing them every day.
One thing that’s also genuinely unfair and worth naming: equity at early-stage startups is often structured in ways that leave employees holding nothing. No pre-seed strike price locked in, or terms that shift after the fact. And if you’re asking people to grind for years on a promise of equity, make sure that equity is real. Engineers talk. And how you treat the people who built the thing matters, both morally and for your future ability to hire.
I’m writing a longer treatment of this topic in Startup Culture and the Human Element .

Set Direction, Don’t Micromanage

Engineers need to see the thread between their work today and the long-term vision. But “Build the most reliable real-time marketplace in agriculture” isn’t actionable at 9am on a Tuesday morning. You have to decompose that into milestones, quarter by quarter, sprint by sprint.
Empowering creativity within guardrails is the balance every engineering leader struggles with. Too much guardrail and you stifle the innovation that good engineers are capable of. Too little and you get beautiful chaos—systems that don’t interoperate, tech choices made for personal interest rather than business need, scope that drifts indefinitely.
My approach is to provide guardrails: here’s the business outcome we need, here’s the quality bar, here’s the timeline. How you get there is mostly up to you. Push back if you see a better path. Own your decisions.
The best thing you can do as a CTO is build a team that doesn’t need you for the day-to-day. Your goal isn’t to be the smartest person in the room. It’s to hire people who eventually make you the least technical person in the room—and be completely fine with that.

Startup CTO vs. Large Company CTO

These are different jobs. Significantly different, and conflating them leads to misery.
At a startup, you’re doing everything: translating, hiring, architecture, vendor negotiation, budget, and maybe still writing code. You have limited process, high ambiguity, and decisions have outsized consequences. The role is about building structure from ambiguity.
At a large company, the structure already exists, and that’s both a feature and a constraint. Decisions are often political—which team gets the budget, which vendor relationship the company protects, which legacy system doesn’t get touched because it’s owned by someone with organizational power. Some are vendor-locked in ways that have nothing to do with technical merit. And some are genuinely good business decisions that just feel slow from the outside.
A CTO expecting startup latitude in a large enterprise will fail, not because of technical issues, but because nobody wrote down what “success” looked like.
Not every large company or industry actually needs a full-time CTO. Some need a VP of Engineering or a strong principal architect. The CTO title gets applied broadly in ways that don’t always reflect what the role actually requires.
The fractional model often makes more sense than either extreme, providing senior judgment without the full-time overhead.
I’ve written more about this in Why Your Startup Doesn’t Need a Full-Time CTO .

Three Traps Every CTO Falls Into

I’ve made most of these mistakes myself at some point. Recognizing them is the first step to avoiding them.
Staying too technical, too long. The hardest transition for most CTOs is giving up the work you’re good at. Coding is concrete and satisfying. Leadership is diffuse, slow-feedback, and often thankless. The temptation to stay in the code is real. But every hour you spend coding is an hour you’re not doing the things only you can do: translating, deciding, building the team, managing up.

“If you’re coding 30 hours a week as a CTO, you’re probably leaving critical leadership work undone.”
Measuring the wrong things. Teams optimize for what you measure. Measure lines of code, get bloated code. Measure story points, get inflated estimates. Instead, measure customer outcomes, system reliability, time to ship, and cost to operate. Connect engineering activity to business results. This takes real work to set up correctly, but it’s the difference between an engineering org that knows if it’s succeeding and one that’s just busy.
Underinvesting in communication. Technical people often believe the work speaks for itself. It doesn’t. If you don’t tell the story of your team’s wins to the CEO, board, and investors, they won’t know. You made an architecture decision that will protect you from a class of security vulnerabilities for the next three years? Your engineering team knows. Nobody else does. The CTO has to communicate constantly—to the team, to leadership, to investors, to recruits. Communication is not soft skills garnish on top of the technical work. It’s the leverage that makes the technical work matter.

What the Job Actually Looks Like

Most of my days as a fractional CTO look nothing like building systems. They look like this:

A call with a founder, explaining why their two-week feature is really a six-week project, and what we can realistically ship in two weeks that gets them 80% of the value.
A budget review, walking a COO through cloud costs line by line.
A recruiting call, selling a senior engineer on meaningful equity, real ownership, and the chance to build something from scratch because we can’t match a FAANG salary.
A technical review of a vendor’s architecture before we sign an 18-month contract.
An architecture review that takes two hours but prevents six months of work in the wrong direction.
A hard conversation with a founder about whether it’s time to let someone go, and walking through what that actually costs in terms of team morale, replacement time, and the implicit signal it sends about standards.
A conversation with an engineer who is technically brilliant but creating team friction.
And yes, sometimes, picking a database in thirty seconds because I’ve already run through the decision framework in my head before the question is even fully asked. That’s the job. It’s not glamorous. It doesn’t always feel like engineering. But it’s the thing that determines whether a company actually ships the product it means to ship, and whether the team that builds it is still functional in two years. At Chick-fil-A, through an EY engagement, we rebuilt core POS systems that had to work offline-first in stores that can’t afford downtime during a lunch rush. That wasn’t just a technical problem; it was an operational, people, communication, and vendor coordination problem requiring deep technical judgment to solve. The DevOps layer—how systems get deployed, monitored, and recovered—was load-bearing. More on this in Stop Wasting Time on DevOps Complexity . That is what good CTO execution looks like. It’s not the genius who writes the best code; it’s the leader who holds the entire system—of people and technology—together. So if you’re building something and wondering whether you need a CTO, the question isn’t “Do we need someone who knows technology?” The real question is: “Do we have someone who can hold the line between the business and the engineers, protect the people doing the work, and make the hard decisions that neither side can make alone?” If the answer is no, you need one.

Let’s Talk

If any of this resonates—if you’re a founder trying to figure out when to bring in technical leadership, a startup hitting a scaling wall, or an enterprise looking for an outside perspective—I can help.
I work with companies as a fractional CTO. You can see how I engage on my services page, or browse case studies from Hertz, Wayfair, Frontiers Market, and others.
If you want to talk through your specific situation, book a call directly: cal.com/mdostal/meet . No pitch, no pressure: just a real conversation about what you’re building and whether I can help.

The Week I Stopped Coding: Orchestrating an Army of AI Agents

Mathew Dostal — Mon, 16 Feb 2026 12:00:00 +0000

The Week I Stopped Coding: Orchestrating an Army of AI Agents

Last week, a profound shift occurred in my work: I stopped coding. Let me be clear—I didn’t stop building software, but the very act of coding itself ceased to be my primary engagement. This distinction, I quickly realized, matters more than you might think, signaling a fundamental evolution in how we build and, crucially, in the very nature of technical leadership and architecture. I found myself back in a familiar role: a director overseeing numerous teams, or 'pods' (a term I recall from my time at EY), each with a cadre of workers underneath.

The Setup: From Developer to Orchestrator

A few weeks ago, curiosity led me to experiment with AI agent frameworks—complex, autonomous software programs designed to perform tasks, often by interacting with large language models (LLMs) and other tools. My journey began with Claude Code, a CLI tool designed to "vibe" code and directly control your computer (https://en.wikipedia.org/wiki/Claude_(language_model)#Claude_Code). Building upon this, I integrated claude-flow, an npm package that extends Claude Code's capabilities with a comprehensive tool suite to manage context, memory, agents, swarms (collections of these agents working collaboratively), and direction files. Soon after, I had added plugins, pulled down skillsets, integrated MCP servers and CLIs, completely handing over control of a localized environment to my growing swarm.

It no longer had just control of a single repository, but an entire development ecosystem. Soon, this wasn't just about code assistance. The agents were iterating, improving, refactoring, and crafting next steps—all while continuously engaging, running, and deploying. What began as a simple exploration rapidly evolved into a fundamentally different way of building software. Within days, I had 10 agents running simultaneously across different repos and projects, each humming away at their assigned tasks.

I created a central repository that held context across my 8+ active projects. I started separating concerns, pushing context down as much as possible to keep each agent focused and effective. I added Linear for project management. I set up monitoring and dashboards to track progress across this distributed system.

Déjà Vu: This Is Enterprise Digital Transformation

The true scope of this shift became glaringly clear, and with it, a profound realization: orchestrating these AI agents felt exactly like managing enterprise digital transformation. The 'workers' were now swarms of threads making requests to LLMs, stepping into roles traditionally filled by individuals in a major corporation. The way I was working felt almost eerily familiar. This wasn't just building; it was a microcosm of the massive project kickoffs and sprawling digital transformations I managed during my time consulting at Zilker Technology and Ernst & Young. These were the architecture and design sessions that kicked off digital transformation projects that rippled across entire organizations, consuming months of meticulous planning.

I thought back to those engagements: the careful distribution of work across multiple teams, the constant iteration and refinement, the Proofs of Concept (PoCs) we’d run before committing to full implementation. The key difference? Those projects involved dozens or hundreds of human resources: architects, managers, product owners, developers, QA testers, DevOps engineers, infrastructure specialists, directors, and executive leaders.

Now, all of that was being directed by me and my agents.

AI Orchestration: A New Organizational Blueprint - The Organizational Mirror

The organizational structure mirrored itself almost eerily. Instead of hundreds of people underneath me, I had hundreds of agent instances. Groups of 3-5 agents with a “leader” agent. Those leaders reported to higher-level coordination agents, which in turn interacted directly with me.

But here’s where it gets interesting: instead of scheduling meetings, sending calendar invites, and waiting for responses, I was answering questions and providing context through Linear tickets. I’d respond, hand it back down the chain, and watch the work continue to spin.

The Reality Check: Not Perfect, But Familiar

Did things go off track sometimes? Absolutely. My agents occasionally veered into unexpected territory or misinterpreted context, leading to some hairy moments.

But here’s the thing: those multi-million dollar contracts with tens or hundreds of human resources? They went off track too. Miscommunications happened. Requirements got misunderstood. Technical debt accumulated. The difference wasn’t in the perfection; it was in the speed of correction.

When an agent team veered off course, I could course-correct in minutes. In contrast, a human team hitting a blocker often meant scheduling a meeting days out, spending an hour discussing it, and then waiting for the implementation cycle to restart.

The Speed Factor: Hours Instead of Months

I’m now accelerating through the entire software development lifecycle at a pace that feels almost surreal. What once required months now unfolds in hours.

To illustrate this velocity, in a single day, my agent army now delivers:

RBAC implementation
Complete DevOps pipelines
CI/CD automation
Terraform infrastructure as code
API development and versioning
Database schema migrations
UI updates and refinements

The workflow looks like this: I receive mockups and wireframes first. Then architecture reports land with decision points clearly outlined. I review, provide feedback, make calls on the architectural decisions, and respond—all in minutes rather than weeks.

The traditional back-and-forth that consumed entire sprint cycles now happens in near real-time. This hyper-accelerated pace fundamentally redefines the limits of what a single technical leader can achieve.

The Full Stack Reality

What’s particularly striking is the breadth of work happening simultaneously. I’m not context-switching between tasks the way a solo developer traditionally would. I’m orchestrating parallel workstreams across the entire stack.

One agent cluster is refactoring the authentication system while another implements new API endpoints. A third group is updating the UI components while a fourth handles database optimization. DevOps agents are spinning up new environments and configuring monitoring.

Crucially, I am not removed from the technical decisions. I still understand the underlying code. I still diagnose complex issues, deep dive into performance problems, and call out architectural flaws. However, my role has fundamentally shifted from primary implementer to architect and director. This parallel orchestration across the entire stack allows for unprecedented velocity and breadth of development under unified strategic direction.

The Emotional Complexity

Beyond the technical marvel, this unprecedented acceleration brought with it a complex, even conflicting, emotional landscape. The world is changing at an unprecedented rate, and I’m watching it happen from the inside.

This experience is amazing, eye-opening, disheartening, and scary. All at once.

I’m excited about what’s possible. The acceleration is intoxicating. Building things that would have taken a team months to deliver, and seeing them come together in days—there’s a rush to that.

But alongside the excitement, a profound shift is underway. I still remember the deep satisfaction of crafting elegant functions, of refactoring a particularly gnarly piece of logic, of naming variables just right. These cherished individual lines of code are now fading from my day-to-day reality. This shift isn't just personal; it represents a seismic change for many developers and technical professionals.

I still appreciate good code. I can still spot a performance issue or an architectural misstep. But my relationship to the code itself has changed. It’s no longer my primary medium of expression. This personal transformation, I realized, wasn't just about me; it signaled a broader, profound inflection point for software development.

What This Means for the Industry: Evolving Technical Leadership

This personal evolution signals a broader inflection point for software development. The role of the technical leader is rapidly evolving from the “best coder in the room” to the “best orchestrator of technical work.”

Far from becoming irrelevant, my 20+ years of experience has become more valuable. The architectural knowledge, the understanding of how systems fit together, the ability to spot risks and trade-offs—all of that matters more than ever.

What changed is the leverage. Instead of that knowledge bottlenecking on my ability to type code, it now flows through an army of agents that can execute in parallel.

For technical leaders, this is a paradigm shift. The skills that matter are:

Architectural vision and systems thinking
Context management and information architecture
Project orchestration and work breakdown
Quality assessment and risk identification
Strategic decision-making under uncertainty

The skills that matter less:

Raw coding speed
Memorizing syntax
Implementing boilerplate patterns
Repetitive debugging

The Path Forward

I don’t know exactly where this leads. I suspect my experience is a preview of what many technical leaders will face in the next 12-24 months.

What I do know is that software development as a discipline isn’t going away; it’s transforming. The leaders who can adapt, who can learn to work with AI agents as force multipliers rather than replacements, who can maintain their technical judgment while delegating implementation—they’re going to thrive.

For me, this week marked a transition. From developer to orchestrator. From coder to conductor. The symphony is larger now, and the pace is faster, but the music still needs someone who understands what it should sound like.

This setup might not be permanent—I might even wipe a whole repo—but the next logical step, however, involves integrating Kubernetes—imagine pods spinning up not just to execute tasks, but to further self-optimize, maintain, and evaluate the entire agent system, ensuring dynamic scaling and resilience. This is the frontier of truly autonomous orchestration. We’ll see where this goes.

For technical leaders navigating this profound shift, I urge you to reach out and compare notes on this brave new world we’re building. What challenges and opportunities are you encountering as you consider or implement AI agent orchestration in your own organizations? Share your insights, contact me, or comment below—let's discuss this future together.

Mathew Dostal is a fractional CTO and technical leader with 20+ years of experience in enterprise software development, cloud architecture, and digital transformation. Learn more about his work at mdostal.com.