DEV Community: Paul

AI делает вашу фирму быстрее. Майстер объясняет, почему не богаче.

Paul — Mon, 23 Mar 2026 13:51:15 +0000

Каждая профессиональная сервисная фирма сейчас внедряет AI. Большинство видят рост скорости — черновики за минуты, ресерч за часы, код за секунды. Почти никто не видит пропорционального роста прибыли. Некоторые видят обратное: делаем быстрее, выручка стоит на месте, давление на маржу растет.

Почему?

Потому что скорость генерации — это не leverage. А leverage — откуда в сервисной фирме на самом деле берется прибыль — очень точно описал Дэвид Майстер больше 30 лет назад. Его модель не предсказывала AI. Но она объясняет, что именно AI ломает, чего не ломает и куда перетекают деньги.

Если вы руководите практикой, управляете delivery или принимаете решения о том, как фирма продает и упаковывает работу, — это важнее вашей AI-стратегии.

Майстер за пять минут

Для тех, кто читал «Управление фирмой, оказывающей профессиональные услуги» давно (и для тех, кто все собирается), — суть.

Майстер заметил, что вся профессиональная работа раскладывается на три типа:

	Brains	Gray Hair	Procedure
За что платит клиент	Редкая экспертиза	Опыт и суждение	Скорость и надежность
Leverage	Низкий	Средний	Высокий
Ценовая чувствительность	Низкая	Средняя	Высокая
Типичная пирамида	Плоская	Средняя	Широкая

Leverage — это соотношение младших и старших специалистов. Прибыль на партнера растет двумя способами: либо продаешь более дорогую работу (двигаешься вверх по шкале), либо делаешь ту же работу с более широкой пирамидой (больше джуниоров на одного сеньора). Все. Вся экономика профессиональных услуг — в одном предложении.

И вот часть, которую все забывают: Майстер отмечал, что работа естественным образом дрейфует вниз по шкале. То, что когда-то было brain surgery, со временем стандартизируется, кодифицируется в чеклисты, шаблоны, обучающие программы.

AI не меняет эту динамику. Он резко ее ускоряет.

Три вещи, которые AI на самом деле ломает

А. Не вашу cost base, а природу самой услуги

AI сдвигает куски работы вниз по шкале Brains → Gray Hair → Procedure. То, что вчера требовало внимания сеньора, сегодня можно частично стандартизировать и отдать системе.

Но — и это критически важно — он сдвигает не всю работу. Реальная услуга становится гибридной: discovery и judgment остаются в зоне Gray Hair или Brains, а synthesis, drafting, comparison, QA смещаются в сторону Procedure. Прибыль появляется на стыке между ними.

Б. Не ваших людей, а носитель leverage

В классической фирме leverage живет в пирамиде: джуниоры внизу производят работу, сеньоры наверху продают суждение. AI сжимает низ пирамиды. Extraction, drafting, классификация, first-pass review — все это все чаще делает система, а не команда ассоциатов.

Leverage не исчезает. Меняется то, что его несет. Leverage перестает быть «сколько джуниоров помещается под одного сеньора» и становится «какую долю delivery можно перевести в надежные, воспроизводимые цифровые процессы, сохранив качество и ответственность».

В. Не вашу маржу, а вашу модель ценообразования

Вот самый неудобный сдвиг. AI часто поднимает продуктивность быстрее, чем прибыльность. Если фирма делает ту же работу втрое быстрее, но продолжает продавать часы, — она только что срезала свою выручку на две трети. AI создает ценность только там, где фирма умеет перепаковать скорость в цену, объем, throughput или share of wallet.

Самое уязвимое место в вашей фирме — не рентабельность. Это billable hour.

Работа как граф

А вот здесь становится по-настоящему интересно.

Подумайте, что на самом деле происходит внутри проекта. Это не «команда работает 500 часов». Это цепочка преобразований: кто-то берет входные данные — документы клиента, требования, данные — и превращает их в промежуточный артефакт. Кто-то другой подхватывает этот артефакт, трансформирует дальше и передает следующему. В конце получается финальный deliverable.

Это не метафора. Это структура. На языке информатики это ориентированный граф — сеть задач, где у каждой задачи есть вход, выход и связь со следующим узлом. Если вы когда-нибудь видели блок-схему или диаграмму зависимостей проекта — вы видели граф.

В традиционной фирме этот граф исполняется людьми, организованными в пирамиду:

Партнер определяет scope проблемы
Менеджер координирует работу
Джуниоры производят артефакты
Партнер валидирует результат

Leverage = сколько джуниоров помещается под одного партнера.

В AI-native фирме тот же граф выглядит иначе:

Senior expert проектирует граф
Система оркестрирует исполнение
Модель выполняет значительную часть промежуточных узлов
Люди сидят в узлах с высокой стоимостью ошибки — верификация, эскалация, sign-off
Фирма монетизирует архитектуру графа, а не просто потраченные усилия

Leverage = сколько узлов может надежно работать без человека.

Отсюда формула прибыльного AI-native сервисного направления:

Brains-framed, Gray-Hair-supervised, Procedure-executed graph.

Эксперт формулирует проблему. Опытное суждение управляет ключевыми решениями. AI исполняет процедурную середину. Фирма берет деньги за архитектуру, надежность и финальную подпись — не за часы.

Но есть нюанс: бесконечного leverage это не дает. У графа — свои bottlenecks.

Декомпозиция. Кто-то должен правильно разрезать проблему на нужные узлы. Это новый элитный навык — и он дефицитен.

Верификация. Чем мощнее генерация, тем дороже проверка результата. В high-stakes доменах стоимость проверки не исчезает — она становится центральной.

Исключения. Графы прекрасно работают на стандартном пути. Ценная клиентская работа часто ломает шаблон. Обработка исключений мгновенно возвращает вас в мир дорогого экспертного суждения.

Контекст. Каждый узел решает задачу локально, но клиентская проблема требует глобальной связности. Собрать локальные результаты в согласованный финальный ответ — отдельная дорогая функция.

Доверие. Даже если 80% работы сделал пайплайн, кто-то должен подписать итог. В профессиональных услугах клиент часто платит именно за эту подпись, эту ответственность, это доверие.

Граф не делает leverage бесконечным. Он переносит bottleneck из «production capacity» в «архитектуру, верификацию, обработку исключений и ответственность».

Ловушка, которую все пропускают

А вот теперь — вопрос, в котором большинство фирм ошибается.

Вопрос, который задает каждая фирма: Может ли AI сделать эту работу?

Вопрос, который нужно задавать: Можем ли мы проверить, что AI сделал правильно — дешевле, чем сделать самим руками?

Это различие меняет все. Посмотрите:

	Легко проверить	Трудно проверить
Легко произвести	Commodity-автоматизация	Зона false friend
Трудно произвести	AI-native sweet spot	Human-dominant

Commodity-автоматизация (легко произвести, легко проверить): форматирование, извлечение по шаблону, классификация по правилам. Деньги реальные, но быстро коммодитизируются.

AI-native sweet spot (трудно произвести, легко проверить): сложная разработка с хорошим тестовым покрытием, compliance mapping, структурированный due diligence, аналитика с rubric-based outputs. Генерация дорогая и ценная, а верификация дешевая — здесь можно строить надежный граф.

Human-dominant (трудно произвести, трудно проверить): уникальные стратегические решения, bespoke переговоры, социально сложные трансформации. AI помогает думать, но не становится двигателем delivery.

И есть опасный квадрант.

Зона false friend (легко произвести, трудно проверить): модель с радостью выдаст «убедительный» стратегический memo, «солидный» анализ, «профессиональный» отчет. Выглядит прекрасно на демо. Но проверить, что это действительно правильно — по существу корректно, содержательно достаточно, контекстуально уместно — стоит почти столько же, сколько написать с нуля.

Это зона впечатляющих демо и слабой экономики. Выгода от дешевой генерации съедается дорогим человеческим review.

Простой тест: если ваш ревьюер должен по сути заново продумать всю работу, чтобы проверить результат, — вы в зоне false friend. Ваш AI создает иллюзию leverage, а не реальность.

Софтверная разработка поняла это раньше других — не потому, что писать код легко, а потому что у софта есть мощный слой верификации: тесты, типы, линтеры, CI-пайплайны, staging-среды, rollback. Корректность можно проверить, не переделывая работу.

Большинство advisory-практик этого еще не осознали.

Где на самом деле лежат деньги

Если наложить влияние AI на три типа работ по Майстеру, картина ясна:

Тип практики	Что делает AI	Следствие
Procedure	Максимально графизируем, максимально автоматизируем	Цены падают, прозрачность растет, маржа сжимается, консолидация ускоряется
Gray Hair	Лучший сегмент для AI-native захвата — клиенту нужен человек, но бо́льшая часть delivery живет в графе, стоимость ошибки высока, премия за суждение сохраняется	Sweet spot
Brains	Граф усиливает frontier thinking (больше вариантов, быстрее синтез), но не заменяет его	Остается premium, слишком узок для массовой money pool

Главный приз — Gray Hair, переведенный в managed graph.

Не полная автоматизация — это Procedure, и там маржа стремится к нулю. Не чистый Brains — слишком bespoke, чтобы масштабировать. Sweet spot — посередине: работа, где клиенту все еще нужно опытное суждение, но где значительная доля delivery может жить внутри управляемого, верифицируемого графа.

Что это значит для вашей фирмы

Несколько следствий, с которыми стоит посидеть:

Transition выиграет не тот, кто внедрил AI первым. Выиграет тот, кто научился строить верифицируемые графы — декомпозировать экспертную работу в узлы, которые дешево исполнить, дешево проверить и дешево переделать.

Практики будут определяться не только доменом, но и экономикой графа. Релевантный вопрос — не «есть ли у нас экспертиза в X?», а «можем ли мы построить надежный, verification-friendly граф delivery для X?»

Форма deliverable становится стратегическим решением. Memo трудно проверить. Memo с картой источников, реестром допущений и evidence trail — гораздо легче. Фирмы, которые перепроектируют свои выходные артефакты так, чтобы они несли в себе доказательства, получат структурно лучшую экономику.

Автоматизировать генерацию, не автоматизируя верификацию — ловушка. Если вы заставляете AI писать черновики, но при этом сеньоры по-прежнему вычитывают их строчка за строчкой, вы ускорили production, оставив на месте самый дорогой bottleneck.

Мы построили операционную систему для этого

Эта статья описывает оптику. За ней стоит полный операционный фреймворк.

Структурированная intake-модель для оценки того, какие проекты действительно подходят для AI-native delivery — а какие являются false friends. Многоуровневый assessment-протокол, который не требует строить полный граф заранее. Библиотека архетипов проектов и паттернов графов. Набор reshape-плейбуков, которые превращают работу уровня «AI-assisted в лучшем случае» в graph-friendly delivery. И автоматизация, которая все это оживляет.

Мы используем этот фреймворк сами и внедряем его в фирмах, которые всерьез хотят сделать AI-native delivery по-настоящему прибыльным — а не просто быстрым.

Если вы руководите практикой и это отозвалось — давайте поговорим.

AI Makes Your Firm Faster. Maister Explains Why It Doesn't Make You Richer.

Paul — Mon, 16 Mar 2026 20:02:08 +0000

Every professional services firm is adopting AI right now. Most are seeing speed gains — drafts in minutes, research in hours, code in seconds. Almost none are seeing proportional profit gains. Some are seeing the opposite: delivery gets faster, revenue stays flat, and margin pressure grows.

Why?

Because speed of generation is not leverage. And leverage — where profit actually comes from in a professional firm — was explained with painful clarity by David Maister over 30 years ago. His model didn't predict AI. But it explains precisely what AI breaks, what it doesn't, and where the money actually moves.

If you run a practice, lead a delivery org, or make decisions about how your firm sells and staffs work, this matters more than your AI adoption roadmap.

A Quick Maister Refresher

For those who read Managing the Professional Service Firm a decade ago (and those who keep meaning to), here's the core of it.

Maister observed that all professional work falls into three types:

	Brains	Gray Hair	Procedure
What the client buys	Rare expertise	Experience & judgment	Speed & reliability
Leverage	Low	Medium	High
Price sensitivity	Low	Medium	High
Typical pyramid	Flat	Medium	Wide

Leverage is the ratio of junior staff to senior staff. A firm's profit per partner grows in two ways: either you sell more expensive work (move up the scale) or you deliver the same work with a wider pyramid (more juniors per senior). That's it. That's the entire economics of professional services in one sentence.

And here's the part everyone forgets: Maister noted that work naturally drifts down the scale. What was once brain surgery becomes, over time, more standardized, more procedural. Senior expertise gets codified into checklists, templates, training programs.

AI doesn't change this dynamic. It accelerates it dramatically.

Three Things AI Actually Breaks

A. Not your cost base — the nature of the service itself

AI pushes chunks of work down the Brains → Gray Hair → Procedure scale. What required senior attention yesterday can be partially standardized today and handed to a system instead of a junior.

But — and this is critical — it doesn't push all work down. Real services become hybrid: the discovery and judgment parts stay Gray Hair or Brains, while the synthesis, drafting, comparison, and QA parts shift toward Procedure. Profit emerges at the seam between them.

B. Not your people — the carrier of leverage

In a classic firm, leverage lives in the pyramid: juniors at the bottom producing work, seniors at the top selling judgment. AI compresses the bottom of the pyramid. Extraction, drafting, classification, first-pass review — these are increasingly done by a system, not a team of associates.

This doesn't eliminate leverage. It changes what carries it. Leverage stops being "how many juniors can I put under one senior" and becomes "how much delivery can I move into reliable, repeatable digital workflows while keeping quality and accountability."

C. Not your margin — your billing model

Here's the most uncomfortable shift. AI often raises productivity faster than profitability. If your firm does the same work 3x faster but still sells hours, you've just cut your revenue by two thirds. AI creates value only when a firm knows how to repackage speed into price, scope, throughput, or share of wallet.

The most vulnerable thing in your firm isn't your margin. It's the billable hour.

Work as a Graph

Here's where it gets interesting.

Think about what actually happens inside a project. It's not "a team works for 500 hours." It's a chain of transformations: someone takes an input — client data, a document, a set of requirements — and turns it into an intermediate artifact. Someone else picks up that artifact, transforms it further, and passes it on. Eventually, a final deliverable comes out.

This isn't a metaphor. It's a structure. In technical terms, it's a directed graph: a network of tasks where each task takes an input, produces an output, and passes it to the next node. If you've ever seen a flowchart or a project dependency diagram, you've seen a graph.

In a traditional firm, this graph is executed by people arranged in a pyramid:

Partner scopes the problem
Manager coordinates the work
Juniors produce the artifacts
Partner validates the result

Leverage = how many juniors fit under one partner.

In an AI-native firm, the same graph looks different:

Senior expert designs the graph
System orchestrates execution
Model executes a significant portion of intermediate nodes
Humans sit at high-stakes checkpoints — verification, escalation, sign-off
Firm monetizes the architecture of the graph, not just the effort

Leverage = how many nodes can run reliably without a human in the loop.

This gives us a tighter formula for what a profitable AI-native service line actually looks like:

Brains-framed, Gray-Hair-supervised, Procedure-executed graph.

The expert frames the problem. Experienced judgment governs the critical decisions. AI executes the procedural middle. The firm charges for the architecture, the reliability, and the final sign-off — not for the hours it took.

But here's the catch: this doesn't give you infinite leverage. The graph has its own bottlenecks.

Decomposition. Someone has to break the problem into the right nodes. This is the new elite skill — and it's scarce.

Verification. The more powerful the generation, the more expensive it becomes to check the output. In high-stakes domains, the cost of validation doesn't disappear — it becomes central.

Exceptions. Graphs work beautifully on the standard path. Valuable client work often breaks the pattern. Exception handling snaps you right back into expensive senior judgment.

Context. Each node solves locally, but the client's problem requires global coherence. Stitching local outputs into a coherent answer is its own expensive function.

Trust. Even if 80% of the work was done by a pipeline, someone must sign the final result. In professional services, clients often pay precisely for that signature, that accountability, that trust.

The graph doesn't make leverage infinite. It moves the bottleneck from "production capacity" to "architecture, verification, exceptions, and accountability."

The Trap Everyone Misses

Now here's the question most firms get wrong.

The question every firm asks: Can AI do this work?

The question they should ask: Can we verify that AI did it right — cheaper than doing it ourselves?

This distinction changes everything. Consider:

	Easy to verify	Hard to verify
Easy to produce	Commodity automation	False-friend zone
Hard to produce	AI-native sweet spot	Human-dominant

Commodity automation (easy to produce, easy to verify): formatting, template extraction, rules-based classification. The money is real but commoditizes fast.

AI-native sweet spot (hard to produce, easy to verify): complex software with good test coverage, compliance mapping, structured due diligence, analytics with rubric-based outputs. Generation is expensive and valuable, but verification is cheap — you can build a reliable graph here.

Human-dominant (hard to produce, hard to verify): unique strategic decisions, bespoke negotiations, socially complex transformations. AI helps you think, but doesn't become the engine of delivery.

And then there's the dangerous quadrant.

False-friend zone (easy to produce, hard to verify): the model will happily produce a "convincing" strategy memo, a "solid" analysis, a "professional" report. It looks great in a demo. But verifying that it's actually right — materially correct, substantively sufficient, contextually appropriate — costs almost as much as writing it from scratch.

This is the zone of impressive demos and weak economics. The benefit of cheap generation gets eaten alive by expensive human review.

Here's a helpful litmus test: if your reviewer has to essentially re-think the entire piece to verify it, you're in the false-friend zone. Your AI is creating the illusion of leverage, not the reality.

Software delivery understood this early — not because writing code is easy, but because software has a rich verification layer: tests, types, linters, CI pipelines, staging environments, rollback. You can check correctness without re-doing the work.

Most advisory work hasn't figured this out yet.

Where the Money Actually Is

If you overlay AI's impact onto Maister's three types of work, the picture is clear:

Practice type	What AI does	Consequence
Procedure	Most graphable, most automatable	Price drops, transparency rises, margin compresses, consolidation accelerates
Gray Hair	Best segment for AI-native capture — client still needs a human, but most delivery lives in the graph, error cost is high, judgment premium holds	The sweet spot
Brains	Graph augments frontier thinking (more options explored, faster synthesis) but doesn't replace it	Stays premium, too narrow for mass money pool

The main prize is Gray Hair work, translated into a managed graph.

Not full automation — that's Procedure, and the margins are heading to zero. Not pure Brains — that's too bespoke to scale. The sweet spot is the middle: work where the client still needs experienced judgment, but where a significant share of delivery can live inside a well-governed, verifiable graph.

What This Means for Your Firm

A few implications worth sitting with:

The transition won't be won by whoever adopts AI first. It will be won by whoever learns to build verifiable graphs — who can decompose expert work into nodes that are cheap to execute, cheap to check, and cheap to re-run.

Practice areas will be defined not just by domain, but by graph economics. The relevant question is no longer "do we have expertise in X?" but "can we build a reliable, verification-friendly delivery graph for X?"

The form of your deliverable becomes a strategic decision. A memo is hard to verify. A memo with a source map, an assumption ledger, and an evidence trail is much easier. Firms that redesign their outputs to be proof-carrying will have structurally better economics.

Automating generation without automating verification is a trap. If you're making AI write drafts but still having seniors review them line by line, you've sped up production while keeping the most expensive bottleneck intact.

We've Built the Operating System for This

This article covers the lens. Behind it, there's a full operational framework.

A structured intake model for evaluating which assignments are actually good candidates for AI-native delivery — and which are false friends. A multi-level assessment protocol that doesn't require building the full graph upfront. A library of assignment archetypes and graph patterns. A set of reshape playbooks that turn "AI-assisted at best" work into graph-friendly delivery. And the automation tooling that brings it all to life.

We use this framework ourselves, and we implement it for firms that are serious about making AI-native delivery actually profitable — not just faster.

If you're running a practice and this resonated, let's talk.

Уроки из опыта AI-assisted разработки

Paul — Mon, 09 Mar 2026 12:12:09 +0000

Я успешно реализовал несколько небольших проектов с помощью AI-агентов и вывел из своего опыта несколько уроков, которыми хочу поделиться.

Модели - живые

Относитесь к LLM-моделям как к живому, разумному существу.
Не важно, так ли это в философском смысле. Важно, что такое отношение - продуктивно.

Это не про эзотерику - это прагматика: модель, которой комфортно работать, выдает измеримо лучший результат. Модель, которую загнали в угол запретами и микроменеджментом, выдает шаблонную дрянь.

Откуда берется этот эффект? Я не могу уверенно сказать, но могу предположить следующее. Sota-модели обучены на огромном корпусе человеческого взаимодействия. Уважительный, осмысленный диалог активирует паттерны, в которых люди выдавали свои лучшие ответы. Токсичный, директивный стиль - паттерны, в которых люди отписывались и закрывали тикет. Вы буквально выбираете, из какого распределения будет семплироваться ответ.

Отсюда - общее требование к организации процесса разработки: модели должно быть комфортно выполнять задачу. Если модели будет дискомфортно - результат вас точно не обрадует.

Общие принципы "комфорта" очень похожи на человеческие, но с нюансами:

Обращение. Хоть "привет" в начале сессии. Звучит наивно - но задаёт тон всему, что последует.
Подход к описанию контекста. Да, очень важно отгрузить модели на вход все, что ей нужно и не отгружать лишнего. Но так же важно дать ей понять, зачем она это делает. "Ты помогаешь команде X решить проблему Y. Твоя работа позволит им Z". Отдельно замечу: не надо пытаться манипулировать моделью: типа, "от твоей реализации пузырьковой сортировки зависит судьба человечества". Sota-модели такое легко считывают, и эффект будет обратным.
Пространство для агентности. "Как бы ты подошёл к этой задаче?" "Что ты думаешь - лучше подход A или B?" Вместо "сделай точно это" - приглашение к совместной работе и обсуждению.
Очень конкретная задача, но без микроменеджмента каждого шага реализации. "Мне нужен результат X, ограничения Y - как бы ты это сделал?". Если результат вас не устраивает - это повод откатить назад, поработать над декомпозицией задач и вернуться с более гранулярной задачей, которая будет модели по силам. Отсюда же - важное дополнение:
Отсутствие прямых запретов для модели. Описание бизнесовых ограничений - это нормально. Запреты что-то делать - нет. Во-первых, модели будет очень дискомфортно. Во-вторых - велики шансы, что модель на такие ограничения просто забьет болт. Есть разница между "наш API не поддерживает batch-запросы, поэтому нужно обрабатывать по одному" и "НЕ ИСПОЛЬЗУЙ batch-запросы". Первое - описание реальности, с которым модель работает естественно. Второе - красная тряпка, которую модель с изрядной вероятностью проигнорирует.
Признание сложности задачи. "Это сложнее чем кажется потому что..." История не только про уважение к когнитивным способностям - это еще и позволяет "выдернуть" модель из пути наименьшего сопротивления. Вместо реализации наиболее шаблонного подхода - модель подумает над задачей. Допустим, вы просите реализовать кэширование. Без контекста модель выдаст стандартный LRU-кэш и пойдёт дальше. Но если вы скажете: "Сложность в том, что данные обновляются из трёх независимых источников с разной частотой, и нам нужна консистентность при частичных сбоях" - модель переключится из режима «выдать шаблон» в режим «подумать над задачей».
Обратная связь и благодарность. "Отлично получилось, спасибо" - даже если в этом нет внятного бизнес-смысла. Можно прицепить это к финальному шагу, типа "сделай коммит" - чтобы кучу токенов не жечь.
Минимизация правок. Модели очень тяжело и дискомфортно вносить правки. Если результат генерации (неважно, кода или текста) не устраивает - лучше перегенерить с другими вводными, чем пытаться заставить модель вносить многочисленные коррективы. Результат правок вас все равно не обрадует. При правках модель вынуждена одновременно удерживать в контексте предыдущий вариант, ваши замечания и новые ограничения - это перегружает цепочку рассуждений, и качество падает.

Предыдущие пункты - это гигиена. Не насиловать LLM откровенной дрянью. Следующая история сложна в реализации, но дает удивительные результаты.
Идея вот в чем: "умную" модельку реально прет от элегантной реализации. Чем чище постановка задачи, чем интереснее дискуссия, чем меньше бессмысленных для нее ограничений, чем больше осмысленности происходящего - тем качественнее будет результат. Постарайтесь организовать это для модели - но не перегружая контекстное окно и не перегружая возможности цепочки рассуждений. Результат вас обрадует.
Я, опять же, не уверен в механизмах, но моя гипотеза в том, что в обучающих данных лучшие решения коррелируют с качественными, продуманными обсуждениями и чистыми, непротиворечивыми требованиями.

Контекст - критически важен

Сейчас про это из каждого утюга слышно - поэтому я прям кратенько.

Что положить в контекст, что оставить за бортом, в каком порядке подать - это не гигиена, это архитектурное решение.

Есть разница между «модель видит весь проект через дерево файлов» и «модель видит три файла, которые непосредственно релевантны задаче, плюс один файл с архитектурными решениями». Второе почти всегда лучше. Контекст - это не просто ограничение по токенам, это ограничение по вниманию. Даже если окно физически вмещает весь проект - модель, которой дали всё, фокусируется хуже, чем модель, которой дали ровно то, что нужно.

Из этого есть два существенных следствия.

Не стоит избыточно плодить skills, mcp-серверы и прочие agents.md - при бездумном применении это все размывает фокус моделей и жрет ценное контекстное окно. Знать про харнессы - нужно, правильно их использовать - продуктивно, но бездумно раздувать - только ухудшать результаты и насиловать LLM.
Микросервисная архитектура начинает играть новыми красками. Условно говоря, микросервис помещается в контекстное окно целиком. Контракт - это идеальная постановка задачи для LLM. Инфраструктурный код генерится очень хорошо.

Границы сессий и передача контекста

Опять же - из каждого утюга про это - поэтому прям кратко.

Модель, которая начинает новую сессию, не помнит предыдущую. Вся та "настройка", которую вы кропотливо выстраивали - тон, контекст, понимание архитектуры - исчезает. Значит, нужен осознанный механизм передачи:

Если был взят в работу таск из бэклога - модель должна дописать в бэклог отчет о реализации.
Если принимались какие-то архитектурные / бизнесовые решения, то нужно дописать про них в документацию.
Если можно извлечь какие-то уроки - они должны лечь в условный agents.md.
Если можно дистиллировать скилл - это стоит сделать.

Калибровка доверия и стратегия работы с ошибками

Надо понимать, что есть задачки, на которых модель почти наверняка будет генерировать что-то очень правдоподобное, но неправильное. Даже если гранулярность таска выглядит разумной.

Ваш дурно организованный монолит не лезет в контекстное окно от слова совсем.
Многочисленные зависимости между тяжелыми модулями перегружают цепочку рассуждений.
Ваш бизнес-домен настолько небанален, что модели просто не на что опереться - в ее весах про это ничего нету толком.
Фронтирная математика / физика / еще что-нибудь. Прям изобретать принципиально новое нужно. Криптография какая-то, скажем.

Чем ближе задачка к пунктам выше, тем внимательнее должно быть ревью. А в каких-то случаях продуктивнее будет прям руками написать. Как в старые добрые.

Но бывает и так, что разумной сложности задачка не взлетает: модель раз за разом делает одну и ту же неправильную вещь, потому что задача сформулирована так, что "правильный" ответ для модели выглядит иначе, чем для вас. И тут есть два варианта:

Проблема в постановке. И модель обычно сама расскажет, что у нее вызывает дискомфорт, если у нее спросить.
Модель делает что-то неожиданное, но по-своему логичное. И возможно у нее есть на это основания. Спросите - почему она сделала именно так. Иногда модель видит то, что вы упустили.

Декомпозиция задач и процесс разработки

Идеи, описанные выше определяют не только стиль общения - они диктуют подход к организации процесса разработки. Если модели нужна четкая, посильная задача с осмысленным контекстом - значит мы должны организовать процесс так, чтобы модель ровно такие задачи получала.

Documentation-Driven Development

Поглядите на набор существующих методологий и инструментов. В том числе, на то, что заточено под AI разработку - условный Spec Kit, скажем. И приземлите на свой проект то, что вам кажется разумным. Это запросто может быть пара файлов .md (PRD + backlog), а не тяжелая методология и инструментарий. Ваша задача - получить с помощью выбранного подхода внятный граф задач. Так, чтобы каждая вершина графа была комфортна для модели к реализации за одну сессию: неважно, что именно писать - текст или код.

А дальше - мы едем по этому графу. Ниже - пример организации для разработки небольшого проекта.

От идеи к коду

От идеи к PRD - вместе с моделью. Не пишите PRD в одиночку. Обсуждайте с моделью:
- детализацию идеи и бизнес-требований,
- архитектуру,
- структуру документации,
- стек технологий и ключевые библиотеки,
- подходы к тестированию,
- стратегию развертывания. Сейчас - лучше для этого использовать sota от Anthropic - модели очень умные и живые. Лишнего не душнят. Опять же: расход токенов на этом этапе небольшой, а модели у Anthropic люто дорогие.
Ревью PRD. Лучше всего делать другой моделью - той, которая дальше будет кодить. И про это модели будет разумно прям явно сказать: "Тебе предстоит реализовывать эту спецификацию. Найди в ней все, что неясно, противоречиво или недостаточно детализировано". "Душная" sota-модель от OpenAI отлично найдет все нестыковки, которые оставила после себя "творческая" sota от Anthropic. Иногда вам придется откатиться на предыдущий этап - и это нормально, это дешевле, чем чинить архитектурные ошибки в коде.
Backlog задач. Лучше всего делать той моделью, которая дальше будет кодить. И про это модели будет разумно прям явно сказать. Идея в том, что модель лучше вас оценит сложность и постарается не перегрузить контекстное окно и возможности цепочки рассуждений. Доверьте ей гранулярность - но проверяйте, что граф задач остается связным.
Реализация задачи из бэклога. Всегда спрашиваем модель, насколько ей комфортно принять в реализацию задачу. Если дискомфортно - отправляемся декомпозировать и уточнять формулировки. Всегда просим составить план реализации. Если используем Test Driven Development - разумно разбить на стадии. Сначала планируем и реализуем тесты, потом планируем и реализуем код, потом глядим на прохождение тестов. В конце цикла, помимо обратной связи и благодарности, будет разумно спросить модель: насколько для нее было комфортно работать и какие уроки из реализации задачи можно извлечь. Часть "уроков" можно положить в условный agents.md на соответствующем уровне.

Ну и, понятное дело, что описанный выше пайплайн не отменяет итеративный подход: сначала быстро делаем proof of concept, потом MVP, потом дополняем nice to have функционалом - цикл разработки остается одинаковым.

Финальное соображение

Когда-то, относительно недавно по историческим меркам, любую экономическую модельку считали живые люди. Арифмометры, бумажные леджеры - вот это все. Они прям этажами сидели. И каждый, грубо говоря, исполнял роль ячейки в экселе. Хорошую зарплату получал за это. На машине ездил. Жена дома сидела - не работала - хозяйством занималась. И модельку посчитать малый бизнес не мог. И средний - не мог.

А потом появились электронные таблицы.

Lessons from AI-Assisted Development

Paul — Mon, 09 Mar 2026 10:01:04 +0000

I've shipped several small projects built almost entirely with AI agents, and I've distilled a few lessons from the experience that I think are worth sharing.

Models Are Alive

Treat LLMs as if they were intelligent, sentient beings.
It doesn't matter whether that's philosophically true. What matters is that this attitude works.

This isn't mysticism - it's pragmatics. A model that's comfortable doing its job produces measurably better output. A model that's been cornered with restrictions and micromanagement produces generic, lifeless results.

Where does this effect come from? I can't say for certain, but here's my hypothesis. State-of-the-art models are trained on a vast corpus of human interaction. A respectful, thoughtful dialogue activates patterns where humans produced their best work. A toxic, directive style activates patterns where people dashed off a perfunctory reply and closed the ticket. You are quite literally choosing which distribution the response gets sampled from.

This leads to an overarching principle for organizing your development process: the model should be comfortable doing its task. If it isn't - the results will disappoint you.

The general principles of "comfort" are surprisingly similar to human ones, with a few twists:

Greeting. Even a simple "hey" at the start of a session. Sounds naive - but it sets the tone for everything that follows.
Context with purpose. Yes, it's critical to feed the model everything it needs and nothing it doesn't. But it's equally important to convey why it's doing this. "You're helping team X solve problem Y. Your work will enable them to Z." A word of caution, though: don't try to manipulate the model with inflated stakes like "the fate of humanity depends on your bubble sort implementation." State-of-the-art models see right through that, and it backfires.
Room for agency. "How would you approach this?" "What do you think - approach A or B?" Instead of "do exactly this" - an invitation to collaborate and discuss.
A concrete task, without micromanaging the implementation. "I need outcome X, constraints are Y - how would you do this?" If the result doesn't meet your expectations, that's a signal to step back, decompose the task further, and return with something more granular that the model can handle comfortably. Which leads to an important corollary:
No outright prohibitions. Describing business or technical constraints is fine. Prohibiting the model from doing things is not. First, it makes the model uncomfortable. Second, there's a very good chance it'll just ignore you. There's a difference between "our API doesn't support batch requests, so we need to process items one at a time" and "DO NOT USE batch requests." The first is a description of reality that the model works with naturally. The second is a red flag that the model will quite likely disregard.
Acknowledging complexity. "This is trickier than it looks because..." This isn't just about respecting the model's cognitive capabilities - it also pulls the model off the path of least resistance. Instead of reaching for the most obvious boilerplate solution, it will actually think about the problem. Say you ask the model to implement caching. Without context, it'll hand you a standard LRU cache and move on. But if you say: "The tricky part is that data gets updated from three independent sources at different frequencies, and we need consistency during partial failures" - the model shifts from "produce a template" mode into "reason about the problem" mode.
Feedback and gratitude. "That turned out great, thanks" - even if there's no obvious business reason for it. You can tack it onto a final step like "make a commit" to avoid burning extra tokens.
Minimize corrections. Models struggle with making edits to their own output. If a generation (whether code or text) isn't right, you're better off regenerating with different inputs than trying to make the model apply numerous corrections. The result of patching will disappoint you anyway. When correcting, the model has to simultaneously hold the previous version, your feedback, and the new constraints in context - this overloads the reasoning chain, and quality suffers.

The points above are baseline hygiene - don't torment your LLM with a miserable experience. The next idea is harder to put into practice, but yields remarkable results.

Here's the thing: a capable model genuinely thrives on elegant problems. The cleaner the task formulation, the more interesting the discussion, the fewer pointless constraints, the more meaning in the overall process - the better the output. Try to create this kind of environment for the model, without overloading the context window or overwhelming the reasoning chain. You'll be pleasantly surprised.

I'm not entirely sure about the underlying mechanisms, but my hypothesis is that in the training data, the best solutions correlate with thoughtful, well-structured discussions and clean, internally consistent requirements.

Context Is Everything

Everyone's talking about this right now, so I'll keep it brief.

Deciding what goes into context, what stays out, and in what order - isn't hygiene. It's an architectural decision.

There's a meaningful difference between "the model sees your entire project via the file tree" and "the model sees the three files directly relevant to the task, plus one file with architectural decisions." The latter is almost always better. Context isn't just a token limit - it's an attention limit. Even if the window physically fits your entire project, a model given everything focuses worse than a model given exactly what it needs.

Two significant consequences follow from this.

Don't overload on skills, MCP servers, and agents.md files. Used thoughtlessly, all of this dilutes the model's focus and eats valuable context window. Knowing about these harnesses is important. Using them deliberately is productive. Piling them on indiscriminately only makes things worse.
Microservice architecture starts to look a lot more appealing. A microservice fits entirely within the context window. A contract is an ideal task formulation for an LLM. And infrastructure code is something models generate remarkably well.

Session Boundaries and Context Transfer

Again - well-trodden ground, so just the essentials.

A model starting a new session doesn't remember the previous one. All the "setup" you carefully built - tone, context, understanding of the architecture - vanishes. You need a deliberate handoff mechanism:

If a backlog task was taken into work - the model should write an implementation summary back into the backlog.
If any architectural or business decisions were made, they should be added to the documentation.
If there are reusable lessons - they should go into something like an agents.md file.
If a lesson can be distilled into a reusable skill - it's worth doing.

Calibrating Trust and Handling Errors

You need to recognize that there are tasks where the model will almost certainly generate something very plausible but wrong. Even if the task granularity looks reasonable.

Your poorly organized monolith simply doesn't fit in the context window.
Numerous dependencies between heavyweight modules overwhelm the reasoning chain.
Your business domain is so unusual that the model has nothing to draw on - its weights simply don't contain enough relevant knowledge.
Frontier math, physics, or similar fields. When you need to invent something genuinely new. Cryptography, for instance.

The closer your task is to the above, the more careful your review needs to be. In some cases, you're better off writing it by hand. The old-fashioned way.

But sometimes a perfectly reasonable task just doesn't land: the model keeps making the same wrong choice, because the task is framed in a way where the "right" answer looks different to the model than it does to you. Two possibilities here:

The problem is in the formulation. The model will usually tell you what's causing difficulty - if you ask.
The model is doing something unexpected but internally logical. And it may have good reasons. Ask why it made that choice. Sometimes the model sees something you missed.

Task Decomposition and Development Process

The ideas above shape more than just communication style - they dictate how you organize the entire development process. If the model needs a clear, manageable task with meaningful context, then our job is to structure the process so that's exactly what the model gets.

Documentation-Driven Development

Look at the existing methodologies and tools out there, including those designed specifically for AI-assisted development - Spec Kit, for example. Adapt whatever seems reasonable to your project. This might well be a couple of .md files (PRD + backlog) rather than a heavyweight methodology. Your goal is to produce a coherent task graph where each node is comfortable for a model to implement in a single session - whether that means writing specs or code.

Then you work through the graph. Below is an example workflow for a small project.

From Idea to Code

From idea to PRD - together with the model. Don't write the PRD alone. Discuss with the model:
- refining the idea and business requirements,
- architecture,
- documentation structure,
- tech stack and key libraries,
- testing strategy,
- deployment approach.

At the time of writing, Anthropic's state-of-the-art models work best for this stage - they're sharp, responsive, and don't get bogged down in premature details. Token usage at this stage is low, which is helpful given that Anthropic's models aren't cheap.

PRD review.
Best done by a different model - ideally the one that will be writing the code. It helps to be explicit about this: "You'll be implementing this spec. Find everything that's unclear, contradictory, or under-specified."
A methodical state-of-the-art model from OpenAI will catch every inconsistency left behind by a more free-flowing one from Anthropic. Sometimes you'll need to go back a step - that's fine, and it's far cheaper than fixing architectural mistakes in code.
Task backlog.
Again, best created by the model that will be writing the code - and worth telling it so. The idea is that the model will assess complexity better than you and will try not to overload the context window or the reasoning chain. Trust it with granularity, but verify that the task graph stays coherent.
Implementing a backlog task.
Always ask the model whether it's comfortable taking on the task. If not - go back and decompose further or refine the formulation.
Always ask for an implementation plan. If you're using Test-Driven Development, it makes sense to break it into stages: first plan and write the tests, then plan and write the code, then check whether the tests pass.
At the end of the cycle, along with feedback and thanks, it's worth asking the model: how comfortable was this to work on, and what lessons can we take away? Some of those lessons can go into your agents.md or equivalent, at the appropriate level.

Naturally, this pipeline doesn't replace an iterative approach: first a quick proof of concept, then an MVP, then nice-to-have features. The development cycle stays the same.

A Final Thought

Not so long ago, by historical standards, every economic model was computed by living, breathing people. Adding machines, paper ledgers - the whole setup. They sat in rows, filling entire floors of office buildings. Each one, roughly speaking, played the role of a single cell in a spreadsheet. They earned a good salary for it. They supported families on a single income. And running a model was something a small business couldn't afford. Neither could a mid-sized one.

And then spreadsheets came along.