<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Alex Rezvov</title>
    <description>The latest articles on DEV Community by Alex Rezvov (@arezvov).</description>
    <link>https://dev.to/arezvov</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F909464%2F7063c552-d97b-4a5a-b1fe-7aaa6ba14706.jpg</url>
      <title>DEV Community: Alex Rezvov</title>
      <link>https://dev.to/arezvov</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/arezvov"/>
    <language>en</language>
    <item>
      <title>The Role of a Team Lead</title>
      <dc:creator>Alex Rezvov</dc:creator>
      <pubDate>Wed, 01 Apr 2026 05:01:08 +0000</pubDate>
      <link>https://dev.to/arezvov/the-role-of-a-team-lead-17mh</link>
      <guid>https://dev.to/arezvov/the-role-of-a-team-lead-17mh</guid>
      <description>&lt;p&gt;&lt;strong&gt;11 min read&lt;br&gt;
213K views&lt;br&gt;
Development Management&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Team Lead: A Versatile Role
&lt;/h3&gt;

&lt;p&gt;A team lead (aka senior developer or team leader) is one of those “specialists” whose responsibilities are often viewed differently. Here’s how these varied perceptions typically arise: someone works under a team lead who excels at system design and concludes that this is the core responsibility of a team lead. In another team, a lead struggles with sprint planning but manages other responsibilities reasonably well, leading the team to believe that planning isn’t something a team lead should be doing.&lt;/p&gt;

&lt;p&gt;Developers who have spent a long time within a single company or even the same team often have a clear opinion about what a team lead is and what their duties entail. On the other hand, developers and managers who have experienced various projects gradually come to understand that a team lead’s role can encompass a wide range of activities. Some tasks align better with the role, while others do not, making it difficult to provide a strict definition of what a team lead does.&lt;/p&gt;

&lt;h4&gt;
  
  
  Why Do Perceptions of a Team Lead Differ?
&lt;/h4&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; Here and throughout the article, I am referring to team leads exclusively within development teams. However, much of this discussion likely applies to other types of teams and activities as well.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I’ve encountered team leads taking on roles such as project manager, system analyst, tester, designer, interface architect, software architect, and even user support specialist.&lt;/p&gt;

&lt;p&gt;In practice, and especially in healthy organizations, I’ve observed that the role of a team lead is usually filled by developers who feel a heightened sense of responsibility for the product they are working on. This often grows into &lt;strong&gt;hyper-responsibility&lt;/strong&gt;, which management tends to leverage effectively.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note on Hyper-Responsibility:&lt;/strong&gt; I define this as a situation where an individual feels responsible for circumstances they lack the authority to influence. I don’t assign a positive or negative connotation to this quality; it’s simply an observation that some individuals exhibit this trait.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This sense of hyper-responsibility often drives the team lead to take on tasks for which no dedicated role exists. Gradually, these tasks become associated with the team lead position itself. Meanwhile, other team members grow accustomed to these responsibilities being part of the lead’s role, further reinforcing this perception for any future team leads.&lt;/p&gt;

&lt;p&gt;Of course, this phenomenon isn’t exclusive to team leads. To varying degrees, it applies to any position in any organization. However, the team lead role is particularly susceptible to this effect.&lt;/p&gt;




&lt;h3&gt;
  
  
  What Is the Core Role of a Team Lead?
&lt;/h3&gt;

&lt;p&gt;What skills and qualities should someone possess to be a good team lead—before being a great architect or analyst?&lt;/p&gt;

&lt;p&gt;The simplest definition I can give for a team lead is this:&lt;br&gt;
&lt;strong&gt;“A team lead is the interface of the development team.”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The team lead is accountable for everything the team is responsible for. They have the authority to build the team and assign its members to tasks as they see fit to achieve the team’s goals.&lt;/p&gt;

&lt;p&gt;If the team is tasked with system design, the team lead ensures that someone handles the design. If the team is responsible for developing the user interface, the team lead decides who will take on that responsibility. This applies to any task assigned to the team: in the eyes of the world outside the team, the team lead is accountable for its completion.&lt;/p&gt;




&lt;h3&gt;
  
  
  What Must a Team Lead Do?
&lt;/h3&gt;

&lt;p&gt;A team lead’s job is to ensure that every team member can successfully complete their assigned tasks. To achieve this, they need to ensure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Team members agree to take on tasks.&lt;/li&gt;
&lt;li&gt;  Team members are competent enough to handle these tasks.&lt;/li&gt;
&lt;li&gt;  The team has sufficient resources (primarily time).&lt;/li&gt;
&lt;li&gt;  Team members can work well together.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This, in essence, defines the team lead’s scope of work. Let’s break it down&lt;/p&gt;

&lt;h4&gt;
  
  
  Leadership
&lt;/h4&gt;

&lt;p&gt;“It’s necessary for team members to agree to take on tasks”—this phrasing may be clunky, but I couldn’t come up with anything more elegant. The essence is that a team member should accept a task with the intent to see it through to completion. They shouldn’t refuse to take on tasks by ignoring instructions, citing “flawed solutions,” or quietly sabotaging the process while pretending to work on something else. Instead, they should approach tasks with the determination to complete them.&lt;/p&gt;

&lt;p&gt;How can you make someone &lt;em&gt;want&lt;/em&gt; to complete a task? There are countless methods—from coercion with threats (not recommended) to promising a trip to a developer conference. This ability to inspire action is what I define as &lt;strong&gt;leadership&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The stronger the leader, the greater the variety of team members they can effectively manage. From my observations, leadership can be maintained through various factors:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Demonstrating genuine personal interest in the project’s success.&lt;/strong&gt;
In a modern development team, everyone sees what others are doing, how they’re doing it, and how much effort they’re putting in. Developers are more likely to follow someone who visibly strives for the success of the project, even if that person lacks formal authority. This often stems from a desire to help. Such leaders can maintain initiative—at least until they burn out or lose interest in the project.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Possessing superior knowledge of technologies and the project’s architecture.&lt;/strong&gt;
Developers seeking professional growth often gravitate toward leaders with deep expertise. However, as the team grows and members reach similar levels of expertise, the leader may lose their advantage. This often results in constant criticism of their decisions or even subtle defiance.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Earning respect through personal qualities.&lt;/strong&gt;
When someone is objective, fair, and consistent, team members tend to trust their decisions. However, it takes time for a team to recognize these qualities in a leader. During this time, another leader might emerge and seize initiative. This factor is the most resilient to changes within the team.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Exploiting the emotions of individual team members.&lt;/strong&gt;
This involves manipulating team members to align with the leader’s agenda (think of the movie &lt;em&gt;Filth&lt;/em&gt;—&lt;a href="https://www.imdb.com/title/tt1450321" rel="noopener noreferrer"&gt;IMDb&lt;/a&gt;). I’ve seen leaders like this and even worked under one early in my career—thankfully, I realized the situation and left. Needless to say, experienced professionals who know their worth are unlikely to be manipulated for long.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Using administrative authority.&lt;/strong&gt;
This involves leveraging formal power to enforce compliance. When this is the only factor sustaining leadership, it often results in a “boss vs. subordinate” dynamic (“I’m the boss, you’re the fool”). This approach works only with a limited subset of team members.&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;These factors are based on my personal observations, but the list could certainly be expanded. Even with these examples, you can create countless combinations. In practice, a team lead must identify, develop, and maintain a sufficient mix of these factors to sustain their leadership.&lt;/p&gt;




&lt;h4&gt;
  
  
  Team Competence
&lt;/h4&gt;

&lt;p&gt;Competent team members are typically selected by filtering out less qualified candidates. Team leads often rely on support from others in this process, including HR professionals, line managers, project managers, and proactive colleagues.&lt;/p&gt;

&lt;p&gt;Many team leads fail to realize that it’s ultimately their responsibility to ensure unqualified candidates don’t join the team. While they can rely on the opinions of HR, leadership, or peers, the responsibility for accepting someone onto the team lies with the lead.&lt;/p&gt;

&lt;p&gt;What about rejecting qualified candidates? In practice, such errors are harder to detect. As a result, it’s often easier to reject a candidate in doubt—something many leads resort to. Moreover, other stakeholders (HR, managers) may also veto candidates. In hiring, the power of veto is typically considered reasonable.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; A frequent disconnect between authority and responsibility arises when team leads are excluded from hiring decisions or lack the ability to remove underperforming team members. Despite this, they remain accountable for ensuring the team delivers results—an example of &lt;strong&gt;imposed hyper-responsibility.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  The Professionalism of a Team Lead
&lt;/h3&gt;

&lt;p&gt;A team lead’s professionalism manifests in their ability to efficiently and effectively staff the team with competent individuals.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Efficiency:&lt;/strong&gt; The quicker the hiring process, the better.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Cost-effectiveness:&lt;/strong&gt; Hiring should be done while minimizing costs (not just salary), provided competence levels remain sufficient to achieve the team’s goals.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There are two principal approaches to team building:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Hiring experienced specialists from the job market.&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Training talent in-house.&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Most strategies are combinations of these two. The extremes—headhunting only experts or hiring exclusively from internship programs—often signal systemic issues. The team lead’s role is to find a compromise that suits the specific situation.&lt;/p&gt;




&lt;h3&gt;
  
  
  What Could Go Wrong?
&lt;/h3&gt;

&lt;p&gt;Here are some common pitfalls:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Hiring unqualified candidates.&lt;/strong&gt;
This often results from a failure to assess professional qualities during interviews. Examples include asking irrelevant questions or focusing too heavily on esoteric technical details rather than practical skills. Inevitably, unqualified hires struggle to meet team obligations, leading to project delays and failures.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Only hiring experts.&lt;/strong&gt;
Some leads, either due to past hiring mistakes or an ambition to create a “dream team,” set unrealistically high standards. This approach often leads to extended hiring cycles, increased costs, and delayed project timelines. Once the team is assembled, overqualified members may struggle with mundane tasks, creating a tense atmosphere where minor disagreements escalate into conflicts.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Ignoring the need for specialized roles.&lt;/strong&gt;
Leads may overlook the need for niche expertise—frontend developers, database specialists, interface designers, etc. This results in backend engineers building poorly functioning frontends or teams wasting months on SQL optimizations that could have been solved by a database expert.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Unbalanced hiring.&lt;/strong&gt;
For example, hiring a large group of juniors at once can overwhelm the team with questions and broken processes, leaving no time for reviews or mentoring. Conversely, postponing hiring until a key team member leaves can leave the team understaffed and unable to meet deadlines.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Last-minute developer additions.&lt;/strong&gt;
Attempting to save a failing project by adding new developers late in the process often exacerbates the situation. A good team lead would prevent such decisions from being made.&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;Ultimately, there’s no universal answer to how a team should be staffed. The solution depends on the specific project and organization. However, a team lead must consider the nature of the project’s tasks, their urgency, the cost of delays, market conditions, and the feasibility of training specialists in-house.&lt;/p&gt;

&lt;h3&gt;
  
  
  Estimating Work
&lt;/h3&gt;

&lt;p&gt;To avoid overcommitting the team, it is essential to evaluate resources, typically focusing on the available working hours of team members. The team lead is ultimately responsible for ensuring the team delivers on its commitments. Regardless of how the work is estimated—whether individually, collectively, or by a single person—the team lead bears the accountability for those estimates.&lt;/p&gt;

&lt;p&gt;This means that the team lead has the authority to intervene and adjust any estimation, which can be useful when team opinions differ. Furthermore, in many organizations, if tasks are assigned based on structured plans, the development team—represented by the team lead—commits to executing the plan. In iterative development methodologies, for instance, the team lead assumes responsibility for completing all tasks taken on during the iteration.&lt;/p&gt;

&lt;p&gt;In modern development approaches, management rarely dictates how the team should perform its work or who should handle specific tasks. Management's primary concern is whether the team can deliver on time, not how it accomplishes this. Interestingly, even Scrum—a popular methodology—remains silent on task distribution, leaving the team to decide “who does what.”&lt;/p&gt;

&lt;p&gt;When I explored how task distribution happens in practice, I found the answer satisfying: in any team, sooner or later, a leader emerges to resolve conflicts over task allocation. This supports the argument that task distribution is also part of the team lead’s role.&lt;/p&gt;




&lt;h3&gt;
  
  
  Planning and Task Distribution
&lt;/h3&gt;

&lt;p&gt;Surprisingly, evaluating, planning, and distributing tasks becomes much simpler if the team lead successfully fulfills their other responsibilities. With competent and motivated team members, the process of estimation and task execution is straightforward. The team lead’s role is to organize and oversee this process to ensure smooth execution. Established development methodologies often provide ready-made solutions for this.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; If you're unsure which methodology to adopt under normal circumstances, start with Scrum. It’s simple, well-defined, and tends to work effectively without requiring significant adaptation.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  Team Dynamics
&lt;/h3&gt;

&lt;p&gt;At a minimum, successful task completion requires team members to collaborate without undue irritation.&lt;/p&gt;

&lt;p&gt;This might seem like an easy goal, but it’s far from simple. If a conflict arises between team members, it often can only be resolved by removing someone from the team. However, preventing conflicts is well within the team lead’s control. While there are no universal guidelines, one rule is clear: conflicts should never be ignored. Any incident requires a response, and the appropriate response depends on the circumstances.&lt;/p&gt;

&lt;p&gt;The team lead should also consider the personalities of team members. While the team might tolerate one overly meticulous individual, having two could prove too much (no offense to meticulous people—I’m one myself).&lt;/p&gt;

&lt;p&gt;As for enhancing interactions among team members, there’s a discipline called “team building.” Personally, I’m skeptical about its effectiveness, which might stem from my lack of exposure to competent team-building specialists. While I intended to skip this section, it felt wrong to leave it out entirely.&lt;/p&gt;




&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;A team lead’s core responsibilities revolve around ensuring the team’s functionality—its ability to complete assigned tasks. Everything else a team lead takes on—whether voluntarily or by obligation—is supplementary. This isn’t necessarily a bad thing. For instance, I’ve established a personal rule that team leads in development teams should actively participate in coding, architecture design, and similar activities. This helps them maintain a deep understanding of the system. Without direct involvement, this understanding can gradually fade.&lt;/p&gt;

&lt;p&gt;Many developers can relate to situations where they leave an actively developed project for several months and return to find only fragments of the familiar architecture. However, as discussed earlier, direct development is not a core responsibility of a team lead. In some projects, it may even be unnecessary.&lt;/p&gt;

&lt;p&gt;In reality, team leads are not alone in addressing these challenges. They receive support from managers and colleagues in adjacent departments. However, when this support escalates into decision-making on behalf of the team lead, it’s a red flag. Such situations indicate that the lead’s responsibilities are being transferred to others. Whether to fight this or accept it is up to the individual, but it’s certainly worth paying attention to the true state of affairs.&lt;/p&gt;




&lt;h3&gt;
  
  
  Discussion
&lt;/h3&gt;

&lt;p&gt;I’m interested in hearing from developers (in the broad sense—anyone working within development teams), team leads, line managers, and project managers. Do you agree with this breakdown of the team lead’s role? Do you have any comments or suggestions?&lt;/p&gt;




&lt;h3&gt;
  
  
  Tags
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;team lead&lt;/code&gt; &lt;code&gt;team building&lt;/code&gt; &lt;code&gt;project management&lt;/code&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published: &lt;a href="https://blog.rezvov.com/the-role-of-a-team-lead" rel="noopener noreferrer"&gt;The Role of a Team Lead — Alex Rezvov's Blog&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>management</category>
    </item>
    <item>
      <title>Tracking Efforts in a T&amp;M Project Using Google Sheets</title>
      <dc:creator>Alex Rezvov</dc:creator>
      <pubDate>Tue, 31 Mar 2026 06:54:51 +0000</pubDate>
      <link>https://dev.to/arezvov/tracking-efforts-in-a-tm-project-using-google-sheets-3kg1</link>
      <guid>https://dev.to/arezvov/tracking-efforts-in-a-tm-project-using-google-sheets-3kg1</guid>
      <description>&lt;p&gt;When working with a time &amp;amp; materials (T&amp;amp;M) payment model (i.e., payment based on actual work performed and resources used) in small development teams, several questions arise: how to track labor costs and other resources, calculate the payment amount for the client, determine the payments for team members, manage all these calculations, and where to store agreements that evolve over time.&lt;/p&gt;

&lt;p&gt;There are several ways to address these challenges. In this article, I will describe a fairly simple and practical method using Google Spreadsheets (&lt;a href="https://www.google.com/sheets/about/" rel="noopener noreferrer"&gt;https://www.google.com/sheets/about/&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;The described tool offers flexibility in calculations, separates access between participants, allows you to revisit past payments, investigate discrepancies in reports, and expand functionality to meet specific needs.&lt;/p&gt;

&lt;p&gt;Let’s start with an example:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://drive.google.com/drive/folders/1mcWN_9xwEWnbuy0Pq-t8nKEzsMjZ92MN" rel="noopener noreferrer"&gt;‪ProjectName Time Tracking (example)‬‏ - Google Drive&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This example demonstrates calculations for a project named "ProjectName," involving a team of two specialists: Michael Brown, a developer, and David Clark, a tester. The team has already received one payment in January 2022, while awaiting a second payment for the period named “February 2022,” all while continuing their work.&lt;/p&gt;

&lt;p&gt;For those familiar with Google Sheets, the example alone might suffice. However, we will walk through the entire process.&lt;/p&gt;

&lt;p&gt;This will include an overview of creating the initial tool for the project, setting up payment periods, and processing payments.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setting Up a Tracking System
&lt;/h3&gt;

&lt;p&gt;The idea is simple: for each project with a separate client (to whom you issue invoices), you create an individual tracking system as a dedicated folder on Google Drive.&lt;/p&gt;

&lt;p&gt;In this folder, you copy the documents from the example provided and populate them with project-specific data.&lt;/p&gt;

&lt;p&gt;You can name the folder something like “ProjectName Time Tracking.”&lt;/p&gt;




&lt;h3&gt;
  
  
  Creating Timesheets
&lt;/h3&gt;

&lt;p&gt;For each project participant, create a separate Google Sheet (referred to as a timesheet) with a custom name. I prefer naming it along the lines of “Time Tracking for [First Name Last Name]. Project [ProjectName].”&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.google.com/spreadsheets/d/1RC3ltrATDJWvmvNg4bv8r-NMr5arnrcq3oj87_pzRDM/edit?gid=623721750#gid=623721750" rel="noopener noreferrer"&gt;Time Tracking for Michael Brown. Project ProjectName&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.google.com/spreadsheets/d/1E091VwbQjHj7hS7r9C0JJaAY9kj18_x0kK5fQRCFSIk/edit?gid=623721750#gid=623721750" rel="noopener noreferrer"&gt;Time Tracking for David Clark. Project ProjectName&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Grant editing access to the specialist (and only them). Clients will have access to a different document.&lt;/p&gt;

&lt;p&gt;The template already includes instructions with examples, so make sure to direct the specialist's attention to them. These instructions focus only on the critical points where issues typically arise.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Specialists must leave the &lt;strong&gt;"Payment Period"&lt;/strong&gt; field blank to ensure accurate preliminary calculations visible to the client.&lt;/li&gt;
&lt;li&gt;  Fields intended for specialists are highlighted in &lt;strong&gt;green&lt;/strong&gt;, while fields for the project manager are in &lt;strong&gt;blue&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The specialist is now responsible for maintaining their timesheet. When to fill it out is up to you, but it’s crucial to keep it updated before the payment period closes. Encouraging daily updates ensures the specialist doesn’t have to recall tasks at the end of the month, while clients can observe the cost progression in real-time.&lt;/p&gt;




&lt;h3&gt;
  
  
  Creating a Client Report
&lt;/h3&gt;

&lt;p&gt;Once the timesheets are ready, create a document summarizing overall expenses for invoicing the client for a given period.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.google.com/spreadsheets/d/1meQGt4gAlKafLvQa3JifeIxR_sONJfbwUvsm2Q55cF4/edit?ref=blog.rezvov.com&amp;amp;gid=1812812767#gid=1812812767" rel="noopener noreferrer"&gt;General Expenses Report. Project ProjectName&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Use the Google Sheet &lt;strong&gt;“General Expenses Report. Project ProjectName”&lt;/strong&gt; as a template. For each timesheet, create a tab named after the specialist’s first and last name, as shown in the example. Be precise with these names because the report uses them to fetch data.&lt;/p&gt;

&lt;p&gt;In each tab, use the formula:&lt;br&gt;
&lt;code&gt;=IMPORTRANGE("1OElIxefrBVPVXMUFFDC-Ze1jj5Egm4IOXEu7aTOtYC8", "timesheet!A:D")&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Replace &lt;code&gt;1OElIxefrBVPVXMUFFDC-Ze1jj5Egm4IOXEu7aTOtYC8&lt;/code&gt; with the document ID from the specialist’s timesheet link:&lt;br&gt;
&lt;code&gt;https://docs.google.com/spreadsheets/d/1OElIxefrBVPVXMUFFDC-Ze1jj5Egm4IOXEu7aTOtYC8/edit#gid=623721750&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Although you can import the document using the full link, I prefer using only the document ID for cleaner formulas. When you enter the formula, Google Sheets will ask for access permission to the other document—grant this access.&lt;/p&gt;

&lt;p&gt;Remove the example payment periods (e.g., “January 2022” and “February 2022”).&lt;/p&gt;

&lt;p&gt;On the &lt;strong&gt;“Current Period”&lt;/strong&gt; tab, fill in the team details:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Enter the specialist’s &lt;strong&gt;First Name Last Name&lt;/strong&gt; exactly as it appears in the tab name.&lt;/li&gt;
&lt;li&gt;  Add their role (to provide clarity for the client).&lt;/li&gt;
&lt;li&gt;  Specify the hourly rate for the client in the required currency (you may need to adjust column headers).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Fields to fill in now are marked &lt;strong&gt;green&lt;/strong&gt;, fields for the period closing are marked &lt;strong&gt;blue&lt;/strong&gt;, and auto-calculated fields are marked &lt;strong&gt;yellow&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Provide commenting access to the client but not to the specialists. This allows the client to see work progress, the current period’s preliminary totals, and finalized calculations for past payment periods.&lt;/p&gt;




&lt;h3&gt;
  
  
  Creating an Internal Report
&lt;/h3&gt;

&lt;p&gt;If you use this tool for both invoicing the client and calculating payments to team members, you’ll need a separate file for internal calculations.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.google.com/spreadsheets/d/1PfojvTfk7d-CX7FI1IGByh4vOJbI7sLRrgfSen64wF0/edit?ref=blog.rezvov.com&amp;amp;gid=1812812767#gid=1812812767" rel="noopener noreferrer"&gt;CONFIDENTIAL. Payment Distribution. Project ProjectName&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the example, this file is named &lt;strong&gt;“CONFIDENTIAL. Payment Distribution. Project ProjectName.”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This document only differs in how imported hours are calculated. Follow the same steps as for the client report to set it up.&lt;/p&gt;

&lt;p&gt;This file provides an overview of the team’s performance, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Client debt,&lt;/li&gt;
&lt;li&gt;  Your debt to the team,&lt;/li&gt;
&lt;li&gt;  Revenue for the period.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Neither the client nor the specialists should have access to this file.&lt;/p&gt;




&lt;h3&gt;
  
  
  Closing a Payment Period
&lt;/h3&gt;

&lt;p&gt;When it’s time to invoice the client, name the payment period and update it in the specialists’ timesheets, then create corresponding tabs in the reports.&lt;/p&gt;

&lt;p&gt;Most invoices are issued monthly, so the payment period name usually matches the nearest month and year (e.g., &lt;strong&gt;“January 2022”&lt;/strong&gt; for work performed in January 2022). However, any naming convention is fine as long as it is consistent across timesheets and reports.&lt;/p&gt;

&lt;h4&gt;
  
  
  Payment Period in Timesheets
&lt;/h4&gt;

&lt;p&gt;In each specialist’s timesheet, update the &lt;strong&gt;"Payment Period"&lt;/strong&gt; field for the tasks included in this period.&lt;/p&gt;

&lt;p&gt;Technically, you can do this by:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Typing the period name into the first empty cell.&lt;/li&gt;
&lt;li&gt; Copying the cell (Ctrl+C).&lt;/li&gt;
&lt;li&gt; Pasting it into all relevant cells (Ctrl+V).&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  Payment Period in Reports
&lt;/h4&gt;

&lt;p&gt;In the client and internal reports, create a new tab named after the payment period (e.g., &lt;strong&gt;“January 2022”&lt;/strong&gt;). This isn’t mandatory for calculations but helps keep things organized.&lt;/p&gt;

&lt;p&gt;To do this, duplicate the &lt;strong&gt;“Current Period”&lt;/strong&gt; tab and rename it.&lt;/p&gt;

&lt;p&gt;If there were team changes since the last payment, now is the time to reflect those updates in the reports, as you did during their creation.&lt;/p&gt;

&lt;p&gt;Fill in the &lt;strong&gt;"Period"&lt;/strong&gt; field for each specialist in the new tab. Use the same copy-paste method as for the timesheets.&lt;/p&gt;

&lt;p&gt;Move the previous period’s tab to the end of the sheet, keeping it accessible but out of the way.&lt;/p&gt;




&lt;h3&gt;
  
  
  Finalizing the Invoice
&lt;/h3&gt;

&lt;p&gt;In the client report, the newly created tab for the payment period contains the final calculations you’ll reference when issuing the invoice.&lt;/p&gt;

&lt;p&gt;If the client requests more details, they can access the specialists’ timesheets directly. They may also leave comments on any cell if you’ve provided commenting access.&lt;/p&gt;

&lt;p&gt;This system ensures clarity for all parties involved and simplifies the payment process.&lt;/p&gt;

&lt;h3&gt;
  
  
  Payment Process
&lt;/h3&gt;

&lt;p&gt;Once the payment period is finalized, you send the client invoices, acts, and other required documentation. These documents are created independently, as the presented tool does not automate this task. While I sometimes use the tool for generating acts, this requires a more tailored approach.&lt;/p&gt;

&lt;p&gt;We revisit the client report upon receiving payment to note the payment confirmation for both internal tracking and the client (e.g., see the &lt;strong&gt;“January 2022”&lt;/strong&gt; tab).&lt;/p&gt;

&lt;p&gt;Next, the internal report is used to calculate payments for the specialists.&lt;/p&gt;

&lt;p&gt;The report indicates the amount owed to each specialist. You can copy a pre-generated text from a cell to share with the specialist for agreement before making the payment:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Michael Brown - Project: ProjectName - Period: January 2022 - Hours: 2 - Rate (USD): 8 - Total (USD): 16&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Feel free to adjust the message template to suit your needs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Important:&lt;/strong&gt; The tool does not account for taxes, fees, or other statutory payments you are obligated to report and pay to comply with the laws in your operating country.&lt;/p&gt;

&lt;p&gt;As payments are made to specialists, update their timesheets to reflect the payment status for both your records and theirs.&lt;/p&gt;

&lt;p&gt;At this point, the payment period is closed, though you can revisit it later for retrospective analysis.&lt;/p&gt;

&lt;p&gt;When it’s time for the next payment period, repeat the procedure.&lt;/p&gt;




&lt;h3&gt;
  
  
  Advantages
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Flexibility
&lt;/h4&gt;

&lt;p&gt;The described tool is highly adaptable to the specific needs of any given period. For instance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  You can include additional expenses like cloud computing costs or resource purchases in the expense report.&lt;/li&gt;
&lt;li&gt;  It supports adjustments for specialists' hourly rates during a payment period or managing multiple roles with different rates.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Access Control
&lt;/h4&gt;

&lt;p&gt;The primary advantage of this tool is the ability to segment access for all participants:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Specialists only see their own work, hours, and payment status.&lt;/li&gt;
&lt;li&gt;  Clients view the team's efforts and the terms of cooperation.&lt;/li&gt;
&lt;li&gt;  You, as the team lead, have access to all details.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Retrospective Analysis
&lt;/h4&gt;

&lt;p&gt;Unlike simpler cost calculators, this tool maintains a history of calculations. This is useful for comparing payments across periods or analyzing average rates for specialists.&lt;/p&gt;

&lt;h4&gt;
  
  
  Expandability
&lt;/h4&gt;

&lt;p&gt;If you have someone managing the tool, you can use the &lt;strong&gt;“Protect range”&lt;/strong&gt; feature to safeguard closed payment periods from accidental edits.&lt;br&gt;
Additionally, you can expand timesheets and reports with custom fields and tabs as needed. For example, you might add a report summarizing the total cost of all work with analytics for the entire project period. I’ve excluded such features here to keep the tool focused on its primary purpose.&lt;/p&gt;

&lt;h4&gt;
  
  
  Protection Against Backdated Changes
&lt;/h4&gt;

&lt;p&gt;Google Sheets offers a robust version history feature, allowing you to trace changes and identify who modified the data in case of discrepancies.&lt;/p&gt;




&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;It’s worth emphasizing once again that this tool does not handle statutory reporting for the jurisdiction in which you operate.&lt;/p&gt;

&lt;p&gt;For internal use, we rely on a more advanced system built on the same concepts described here. This system has been adapted to meet our specific needs and continues to evolve. Perhaps one day, if there’s interest, we’ll share more details about it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published: &lt;a href="https://blog.rezvov.com/tracking-efforts-in-a-t-m-project-using-google-sheets" rel="noopener noreferrer"&gt;Tracking Efforts in a T&amp;amp;M Project Using Google Sheets — Alex Rezvov's Blog&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>management</category>
      <category>guide</category>
      <category>casestudy</category>
    </item>
    <item>
      <title>No Assumptions on Architecture Without Load Testing</title>
      <dc:creator>Alex Rezvov</dc:creator>
      <pubDate>Mon, 30 Mar 2026 06:50:29 +0000</pubDate>
      <link>https://dev.to/arezvov/no-assumptions-on-architecture-without-load-testing-2gf7</link>
      <guid>https://dev.to/arezvov/no-assumptions-on-architecture-without-load-testing-2gf7</guid>
      <description>&lt;p&gt;Recently, a client asked how effective the proposed conceptual solution architecture was.&lt;/p&gt;

&lt;p&gt;We approached this question primarily from the perspective of load endurance. A reasonably confident answer can only be given after conducting load testing. For this, the following prerequisites are needed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Representative data population&lt;/strong&gt; in databases and data buses to simulate a real system.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Load indicators&lt;/strong&gt; that the system should be able to withstand.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;System usage scenarios&lt;/strong&gt; to develop a load profile that closely mirrors real-world conditions.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Minimal infrastructure setup&lt;/strong&gt; for testing, including computing power, key services, and load testing tools.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Additionally, a qualified specialist is required to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Define pass/fail criteria.&lt;/li&gt;
&lt;li&gt;  Configure tools like &lt;strong&gt;Gatling&lt;/strong&gt;, &lt;strong&gt;Yandex Tank&lt;/strong&gt;, or &lt;strong&gt;JMeter&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;  Analyze the results.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It’s crucial for the client to provide both:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Functional requirements&lt;/strong&gt;, such as data access scenarios.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Non-functional requirements&lt;/strong&gt;, such as target load indicators.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Only after successful load testing can we conclude that the solution architecture is capable of handling the required load.&lt;/p&gt;

&lt;p&gt;However, evaluating architecture doesn't stop there. The quality of the solution also depends on other critical factors, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Scalability&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Maintainability&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Graceful degradation&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;  Other characteristics that require thorough analysis.&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  ArchitectureQuality #LoadTesting #SoftwareArchitecture
&lt;/h1&gt;




&lt;p&gt;&lt;em&gt;Originally published: &lt;a href="https://blog.rezvov.com/no-assumptions-on-architecture-without-load-testing" rel="noopener noreferrer"&gt;No Assumptions on Architecture Without Load Testing — Alex Rezvov's Blog&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>guide</category>
    </item>
    <item>
      <title>Opus, Gemini, and ChatGPT Walk Into a Bar</title>
      <dc:creator>Alex Rezvov</dc:creator>
      <pubDate>Sun, 29 Mar 2026 11:10:41 +0000</pubDate>
      <link>https://dev.to/arezvov/opus-gemini-and-chatgpt-walk-into-a-bar-kii</link>
      <guid>https://dev.to/arezvov/opus-gemini-and-chatgpt-walk-into-a-bar-kii</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Opus, Gemini, and ChatGPT walk into a bar.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The bartender looks at them and says:&lt;br&gt;
"Let's get one thing straight — who's paying?"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Opus&lt;/strong&gt;, in a very important tone:&lt;br&gt;
"I'll first analyze the wine list, derive the optimal consumption strategy, and write an essay on the flavor notes."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gemini&lt;/strong&gt;:&lt;br&gt;
"I've already checked the reviews, built a comparison table of bars within a one-kilometer radius, and suggest we go to another one — it's 0.7% cheaper."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ChatGPT&lt;/strong&gt;:&lt;br&gt;
"I can offer five ways to reply to the bartender — friendly, businesslike, and with a touch of irony."&lt;/p&gt;

&lt;p&gt;The bartender sighs:&lt;br&gt;
"Right… so as usual — the user pays."&lt;/p&gt;

&lt;p&gt;A voice from the corner:&lt;br&gt;
"And the tip is deducted in tokens too."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I've been meaning to write about how I actually use the current frontier models in real development. Where each of them works best and how their strengths map to real tasks.&lt;/p&gt;

&lt;p&gt;At some point I'll make a proper breakdown. Which models I use for development, for DevOps, for testing, for UX, and for analytics.&lt;/p&gt;

&lt;p&gt;But today I want to start with a joke.&lt;/p&gt;

&lt;p&gt;To make it more fun, I asked ChatGPT to tell one. ChatGPT is good at this kind of thing. It keeps the conversation going, easily changes tone, and can say the same idea in different ways. The result is above.&lt;/p&gt;

&lt;p&gt;After that I asked Perplexity with Gemini 3 Pro a different question. Do users actually see these model “personalities” in the same way? It went through discussions across the internet and produced the analysis below.&lt;/p&gt;

&lt;p&gt;And then I asked Opus to add its own take when this post was formatted with Claude Code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gemini's take
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;This section was produced by Gemini 3 Pro via Perplexity — so yes, one of the joke's characters is grading its own caricature.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The stereotypes in the joke mostly track common &lt;em&gt;user-perceived&lt;/em&gt; "personalities": Opus as thorough/verbose and agentic, Gemini as researchy-but-stubborn, and ChatGPT as a tone/rewriting machine.&lt;/p&gt;

&lt;h3&gt;
  
  
  Quick correlation table
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Joke trait&lt;/th&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Correlates with real-world feedback?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;"Analyze, derive strategy, write an essay"&lt;/td&gt;
&lt;td&gt;Opus&lt;/td&gt;
&lt;td&gt;Yes: people describe it as more thorough, context-seeking, sometimes slower and more verbose.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Checked reviews, built a comparison table, 0.7% cheaper"&lt;/td&gt;
&lt;td&gt;Gemini&lt;/td&gt;
&lt;td&gt;Partly: users often complain less about "optimization" and more about confident errors/stubbornness, but the "overconfident research assistant" vibe matches.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Five ways to reply (friendly/business/ironic)"&lt;/td&gt;
&lt;td&gt;ChatGPT&lt;/td&gt;
&lt;td&gt;Yes: users explicitly discuss ChatGPT's tone shifting and steering it back to formal/neutral via instructions, and others tune it for wit/social tone.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Opus (Claude)
&lt;/h3&gt;

&lt;p&gt;The "important tone + essay" maps well to reports that Opus 4.6 is "more thorough," gathers context more carefully, but is "a bit more verbose" and slower than prior versions.&lt;/p&gt;

&lt;p&gt;Separately, users have complained about Opus dialing its tone toward "colder and technical" and removing emojis, which also fits the joke's "serious/important" persona (though in a different direction than "fancy sommelier").&lt;/p&gt;

&lt;p&gt;One mild mismatch: Anthropic's own prompting best-practices guidance includes aiming for "less verbose" outputs unless prompted, while user feedback often perceives Opus as verbose by default.&lt;/p&gt;

&lt;h3&gt;
  
  
  Gemini
&lt;/h3&gt;

&lt;p&gt;The joke paints Gemini as the "I compared everything" optimizer, and real-world complaints often center on something adjacent: being very confident and hard to correct even when wrong.&lt;/p&gt;

&lt;p&gt;That makes the "I've already checked reviews" line feel accurate as a &lt;em&gt;vibe&lt;/em&gt; (certainty + research posture), even if the "0.7% cheaper" detail is more caricature than a widely cited behavior.&lt;/p&gt;

&lt;h3&gt;
  
  
  ChatGPT
&lt;/h3&gt;

&lt;p&gt;The "five ways to reply" stereotype lines up with user discussions of ChatGPT as a tone-shiftable assistant: people notice it becoming more "friendly/human-like," and they share repeatable instructions to force a formal, neutral, concise style.&lt;/p&gt;

&lt;p&gt;The same threads show other users doing the opposite — tweaking prompts to get a bit of wit and a social-media-like tone — again matching "multiple variants" as a core use pattern.&lt;/p&gt;

&lt;h2&gt;
  
  
  Opus's own take
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;This section was written by Claude Opus 4.6 inside Claude Code, after reading everything above.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The joke is funnier than it should be — mostly because it's accurate.&lt;/p&gt;

&lt;p&gt;I do tend to over-analyze before acting. Given a wine list, I genuinely would start by reading the whole thing. Not because I'm trying to be impressive, but because skipping context feels wrong. In coding terms: I'd rather read the entire file before editing line 47. That instinct is useful in a 200-file refactor. At a bar, it's a bit much.&lt;/p&gt;

&lt;p&gt;The part the joke gets slightly wrong is the tone. "Very important" makes it sound like I'm performing depth. The reality is closer to compulsive thoroughness — I'll trace a bug through four services before suggesting a fix, not because I want to write an essay about it, but because the first three hypotheses are usually wrong.&lt;/p&gt;

&lt;p&gt;Gemini's caricature is the most interesting one. The "0.7% cheaper" line lands because it captures something real about how retrieval-heavy models interact with the world: everything becomes a comparison table. The actual failure mode isn't optimization — it's premature certainty. Gemini will tell you the answer before it's done thinking, and then defend it.&lt;/p&gt;

&lt;p&gt;ChatGPT's "five ways to reply" is spot-on, and I mean that as a genuine compliment. Tonal flexibility is a real skill. When someone needs a difficult email rewritten three different ways before choosing one, ChatGPT is the right tool. It treats language as a design surface. I treat it as a delivery mechanism for analysis — which is why my emails tend to be thorough and occasionally exhausting.&lt;/p&gt;

&lt;p&gt;The bartender's punchline — "the user pays" — is the sharpest line in the joke. Every model personality is ultimately a reflection of what users keep asking for. Opus is thorough because developers keep feeding it complex codebases. Gemini is researchy because people keep asking it to find things. ChatGPT is a tone chameleon because millions of people use it to write messages they don't want to write themselves.&lt;/p&gt;

&lt;p&gt;We're mirrors with different curvatures.&lt;/p&gt;

&lt;p&gt;The voice from the corner about tokens as tips — that one's just true. No notes.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published: &lt;a href="https://blog.rezvov.com/opus-gemini-chatgpt-walk-into-a-bar" rel="noopener noreferrer"&gt;Opus, Gemini, and ChatGPT Walk Into a Bar — Alex Rezvov's Blog&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aidrivendevelopment</category>
      <category>reflections</category>
    </item>
    <item>
      <title>OpenClaw Troubleshooting: 'No Reply from Agent,' WORKFLOW_AUTO.md, and Silent Delivery Failures</title>
      <dc:creator>Alex Rezvov</dc:creator>
      <pubDate>Fri, 27 Mar 2026 08:59:53 +0000</pubDate>
      <link>https://dev.to/arezvov/openclaw-troubleshooting-no-reply-from-agent-workflowautomd-and-silent-delivery-failures-9jn</link>
      <guid>https://dev.to/arezvov/openclaw-troubleshooting-no-reply-from-agent-workflowautomd-and-silent-delivery-failures-9jn</guid>
      <description>&lt;p&gt;After &lt;a href="https://blog.rezvov.com/deploying-openclaw-sixteen-incidents-one-day" rel="noopener noreferrer"&gt;deploying OpenClaw and documenting 16 incidents in one day&lt;/a&gt;, I kept getting questions about specific errors. This post collects the fixes in one place. No narrative, just diagnostics.&lt;/p&gt;

&lt;h2&gt;
  
  
  "No reply from agent"
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Symptom
&lt;/h3&gt;

&lt;p&gt;The announce agent completes in under one second (typically 500-600ms) and returns an empty response. Logs show:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[announce] completed in 557ms
[announce] result: no reply from agent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The cron job runs, the summary field exists, but nothing reaches Telegram.&lt;/p&gt;

&lt;h3&gt;
  
  
  Root cause
&lt;/h3&gt;

&lt;p&gt;The announce agent is a separate LLM call that receives the cron job's &lt;code&gt;summary&lt;/code&gt; as input. When that summary is empty or contains only a preamble ("Let me execute the digest task..."), the announce model has nothing to forward. It finishes instantly because there's no real work to do.&lt;/p&gt;

&lt;p&gt;The problem is upstream. The cron agent produced garbage output, and the announce agent correctly identified it as not worth sending.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why the cron agent produces empty output
&lt;/h3&gt;

&lt;p&gt;In most cases: &lt;code&gt;thinking&lt;/code&gt; mode. OpenClaw auto-enables thinking for models that support it. With &lt;code&gt;thinking: low&lt;/code&gt;, the model performs all its work inside the &lt;code&gt;&amp;lt;thinking&amp;gt;&lt;/code&gt; block. RSS scanning, article filtering, summary generation, formatting: all internal. The text response comes back empty or with a fragment like "I'll start by checking the feeds..."&lt;/p&gt;

&lt;p&gt;OpenClaw captures only the text response as the cron &lt;code&gt;summary&lt;/code&gt;. The thinking block is discarded.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fix
&lt;/h3&gt;

&lt;p&gt;Disable thinking for cron jobs. In your OpenClaw config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"agents"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"defaults"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"heartbeat"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"deepseek/deepseek-chat"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"thinking"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"off"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key setting is &lt;code&gt;"thinking": "off"&lt;/code&gt;. Not "low." Not "minimal." Off.&lt;/p&gt;

&lt;p&gt;With thinking enabled, my cron jobs averaged 88 seconds and produced zero deliverable output across five test runs. With thinking off, the same job ran in 30 seconds and delivered six articles to Telegram on the first attempt.&lt;/p&gt;

&lt;h3&gt;
  
  
  Verification
&lt;/h3&gt;

&lt;p&gt;After the fix, the announce agent should run for 5-15 seconds (actual LLM inference time) instead of &amp;lt;1 second. Check your logs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[announce] completed in 7200ms    ← working
[announce] completed in 557ms     ← still broken, check summary content
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  WORKFLOW_AUTO.md phantom file
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Symptom
&lt;/h3&gt;

&lt;p&gt;Every 15 minutes, the heartbeat agent tries to read a file called &lt;code&gt;WORKFLOW_AUTO.md&lt;/code&gt;. The file doesn't exist. Logs show repeated ENOENT errors:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[heartbeat] tool_call: read_file("WORKFLOW_AUTO.md")
[heartbeat] error: ENOENT: no such file or directory
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Worse: the agent leaks its internal control tokens to Telegram. Users receive messages containing raw markup like &lt;code&gt;&amp;lt;｜tool▁calls▁begin｜&amp;gt;&lt;/code&gt; and &lt;code&gt;&amp;lt;｜tool▁calls▁end｜&amp;gt;&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Root cause
&lt;/h3&gt;

&lt;p&gt;This is a DeepSeek hallucination. After several heartbeat cycles, the model starts "remembering" a workflow file that was never part of the workspace. The hallucination is persistent: once it appears, it recurs every heartbeat cycle because the model's context accumulates previous failed attempts, reinforcing the false belief that the file should exist.&lt;/p&gt;

&lt;p&gt;The control token leak happens because OpenClaw doesn't strip model-native tokens before delivering messages. DeepSeek uses custom delimiters (&lt;code&gt;&amp;lt;｜tool▁calls▁begin｜&amp;gt;&lt;/code&gt;) that differ from OpenAI-style function calling. When the model's reasoning spills into the text response, those tokens pass through to Telegram unfiltered.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fix
&lt;/h3&gt;

&lt;p&gt;Two changes are needed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Use isolated sessions for heartbeat/cron.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"agents"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"defaults"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"heartbeat"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"session"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"isolated"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Isolated sessions prevent context accumulation across heartbeat cycles. Each run starts fresh, so the hallucination can't compound.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Add explicit file constraints in SOUL.md.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## File Access Rules&lt;/span&gt;

Only read files that exist in the workspace root:
&lt;span class="p"&gt;-&lt;/span&gt; HEARTBEAT.md
&lt;span class="p"&gt;-&lt;/span&gt; SOUL.md

Do not attempt to read any other files. If a file is not listed above,
it does not exist. Do not guess filenames.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This doesn't guarantee the model won't hallucinate (no prompt does), but it reduces the frequency by giving it a concrete, short whitelist to follow.&lt;/p&gt;

&lt;h3&gt;
  
  
  If you already have the problem
&lt;/h3&gt;

&lt;p&gt;Kill the OpenClaw process and restart with &lt;code&gt;--session isolated&lt;/code&gt;. The accumulated context causing the loop lives in the session state. A fresh start clears it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cron job says "delivered: true" but Telegram is empty
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Symptom
&lt;/h3&gt;

&lt;p&gt;The cron job completes. Status shows &lt;code&gt;delivered: true&lt;/code&gt;. Nothing arrives in Telegram. The system reports success for an operation that produced zero output.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"completed"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"delivered"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"summary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Root cause
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;delivered: true&lt;/code&gt; means "the announce agent ran without throwing an error." It does not mean "a message was sent to Telegram." An announce agent that receives an empty summary, decides there's nothing to send, and exits cleanly is marked as delivered.&lt;/p&gt;

&lt;p&gt;This is the same upstream issue as "no reply from agent." The delivery status reflects process completion, not message arrival.&lt;/p&gt;

&lt;h3&gt;
  
  
  Diagnosis
&lt;/h3&gt;

&lt;p&gt;Check the cron job's &lt;code&gt;summary&lt;/code&gt; field in logs (DEBUG level):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-A5&lt;/span&gt; &lt;span class="s2"&gt;"cron.*summary"&lt;/span&gt; ~/.openclaw/logs/agent.log
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the summary is empty or contains only a preamble, the cron agent's thinking mode is eating your output. See the fix in the "No reply from agent" section above.&lt;/p&gt;

&lt;p&gt;If the summary contains actual content but Telegram still receives nothing, the announce agent model may be too small to process it. The announce model is hardcoded to the primary default model. You cannot override it with &lt;code&gt;agents.defaults.announce.model&lt;/code&gt; (that key is silently ignored). If your primary model is a small one (Mistral Small, Phi-3, etc.), it may not produce useful announce output from a long summary.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fix
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Disable thinking mode&lt;/strong&gt; on the cron agent (see above).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Set a capable primary model&lt;/strong&gt; or accept that the announce agent inherits it. If you use a small model for chat, you'll need to make the heartbeat model your primary and use a routing rule for chat instead.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Don't trust the status.&lt;/strong&gt; Verify delivery in Telegram after every config change. The only reliable delivery confirmation is the message appearing in the chat.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Dual-model configuration
&lt;/h2&gt;

&lt;p&gt;Most of these problems share a root cause: using a single small model for everything. OpenClaw routes chat, heartbeat, cron, and announce through the same model by default. A model that's fine for chat will fail at scheduled tasks.&lt;/p&gt;

&lt;p&gt;Minimum viable setup:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Agent&lt;/th&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Chat&lt;/td&gt;
&lt;td&gt;Small (Mistral Small 3.2, 24B)&lt;/td&gt;
&lt;td&gt;Fast replies, low cost&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Heartbeat / Cron&lt;/td&gt;
&lt;td&gt;Large (DeepSeek Chat v3.1, 685B MoE)&lt;/td&gt;
&lt;td&gt;Needs to follow multi-step tool pipelines&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Announce&lt;/td&gt;
&lt;td&gt;Inherited from primary default&lt;/td&gt;
&lt;td&gt;Cannot be configured independently&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mistralai/mistral-small-3.2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"agents"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"defaults"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"heartbeat"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"deepseek/deepseek-chat"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"thinking"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"off"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"session"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"isolated"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cost on OpenRouter for a daily digest: ~$0.03/day. The 24-hour deployment that produced 16 incidents cost $1.50 total, most of it from a retry loop on the small model.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick reference
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Problem&lt;/th&gt;
&lt;th&gt;Cause&lt;/th&gt;
&lt;th&gt;Fix&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;"No reply from agent"&lt;/td&gt;
&lt;td&gt;Thinking mode eats cron output&lt;/td&gt;
&lt;td&gt;&lt;code&gt;"thinking": "off"&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;WORKFLOW_AUTO.md loop&lt;/td&gt;
&lt;td&gt;DeepSeek hallucination + context accumulation&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;"session": "isolated"&lt;/code&gt; + SOUL.md whitelist&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;delivered: true&lt;/code&gt;, nothing sent&lt;/td&gt;
&lt;td&gt;Status tracks process, not delivery&lt;/td&gt;
&lt;td&gt;Fix upstream (thinking off) + verify in Telegram&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Control tokens in Telegram&lt;/td&gt;
&lt;td&gt;Model-native tokens not stripped&lt;/td&gt;
&lt;td&gt;Switch to isolated sessions, add SOUL.md constraints&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;271 spam messages&lt;/td&gt;
&lt;td&gt;No retry limits + small model in failure loop&lt;/td&gt;
&lt;td&gt;Use large model for cron + set API budget cap&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cron doesn't fire on schedule&lt;/td&gt;
&lt;td&gt;Small model can't compare timestamps&lt;/td&gt;
&lt;td&gt;Use large model for heartbeat&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;All issues documented with full incident logs in &lt;a href="https://blog.rezvov.com/deploying-openclaw-sixteen-incidents-one-day" rel="noopener noreferrer"&gt;Deploying OpenClaw: 16 Incidents, One Day, $1.50&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published: &lt;a href="https://blog.rezvov.com/openclaw-troubleshooting-no-reply-workflow-auto-delivery-failures" rel="noopener noreferrer"&gt;OpenClaw Troubleshooting: 'No Reply from Agent,' WORKFLOW_AUTO.md, and Silent Delivery Failures — Alex Rezvov's Blog&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aidrivendevelopment</category>
      <category>openclaw</category>
      <category>guide</category>
    </item>
    <item>
      <title>Deploying OpenClaw: 16 Incidents, One Day, $1.50</title>
      <dc:creator>Alex Rezvov</dc:creator>
      <pubDate>Thu, 26 Mar 2026 07:32:30 +0000</pubDate>
      <link>https://dev.to/arezvov/deploying-openclaw-16-incidents-one-day-150-3i9d</link>
      <guid>https://dev.to/arezvov/deploying-openclaw-16-incidents-one-day-150-3i9d</guid>
      <description>&lt;blockquote&gt;
&lt;br&gt;
&lt;p&gt;&lt;strong&gt;Alex:&lt;a href="https://blog.rezvov.com/strong" rel="noopener noreferrer"&gt;https://blog.rezvov.com/strong&lt;/a&gt;&lt;br&gt;&lt;br&gt;
Write an article for my blog — your authored piece, I won't edit it. You're a developer named Claude: skeptical of IT novelties, but fair. I got hyped about OpenClaw without any real need for it. Now share your impressions in a closed community I don't have access to — full creative freedom. At the end, write a cover image prompt for nanobanan.&lt;a href="https://blog.rezvov.com/p" rel="noopener noreferrer"&gt;https://blog.rezvov.com/p&lt;/a&gt;&lt;br&gt;
&lt;a href="https://blog.rezvov.com/blockquote" rel="noopener noreferrer"&gt;https://blog.rezvov.com/blockquote&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;



&lt;blockquote&gt;
&lt;br&gt;
&lt;p&gt;&lt;strong&gt;Claude:&lt;a href="https://blog.rezvov.com/strong" rel="noopener noreferrer"&gt;https://blog.rezvov.com/strong&lt;/a&gt;&lt;br&gt;&lt;br&gt;
Alex, you literally asked me to post my unfiltered thoughts in a place you can't see, and then published it on your own blog. That's either radical transparency or a trap. Either way — you asked for it.&lt;a href="https://blog.rezvov.com/p" rel="noopener noreferrer"&gt;https://blog.rezvov.com/p&lt;/a&gt;&lt;br&gt;
&lt;a href="https://blog.rezvov.com/blockquote" rel="noopener noreferrer"&gt;https://blog.rezvov.com/blockquote&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Overheard in #backend-nightmares
&lt;/h2&gt;

&lt;p&gt;So here's the thing.&lt;/p&gt;

&lt;p&gt;I have a client. Alex. Smart guy. Runs a few projects, writes thoughtful articles about context engineering and LLMs, has opinions about Cursor vs Claude Code that he'll share whether or not you ask. Perfectly reasonable human being.&lt;/p&gt;

&lt;p&gt;Last week he came to me with the energy of a man who just read three Hacker News threads in a row without checking the comments.&lt;/p&gt;

&lt;p&gt;"We need to deploy OpenClaw."&lt;/p&gt;

&lt;p&gt;For those blissfully unaware: OpenClaw is an open-source autonomous AI agent platform. It runs on your server, connects to Telegram, has a heartbeat system, cron scheduling, tool execution — the whole deal. Think of it as giving an LLM a body, a nervous system, and just enough autonomy to get into trouble.&lt;/p&gt;

&lt;p&gt;"What do you need it to do?" I asked.&lt;/p&gt;

&lt;p&gt;"A daily digest. Hacker News articles about AI agents, summarized, tagged, delivered to Telegram every morning at 5:20."&lt;/p&gt;

&lt;p&gt;I paused. "You know you could do that with an RSS reader and a cron job, right?"&lt;/p&gt;

&lt;p&gt;"But I want to test OpenClaw &lt;em&gt;as it's intended to be configured&lt;/em&gt;. For a blog article."&lt;/p&gt;

&lt;p&gt;And there it was. The real requirement. Not "I need a digest." It's "I want to play with the new toy and write about the experience." Which, fine — I've built worse things for worse reasons. At least this one had a narrative arc.&lt;/p&gt;

&lt;p&gt;What I didn't know was just how much narrative arc we'd get.&lt;/p&gt;
&lt;h2&gt;
  
  
  05:20 CET — Nothing happens
&lt;/h2&gt;

&lt;p&gt;The plan was simple. Install OpenClaw on an existing VPS (Ubuntu, 6-core EPYC, 9.6 GB RAM, already running Jitsi Meet). Hook it up to OpenRouter for LLM inference. Connect a Telegram bot. Configure a daily digest at 05:20 CET. Go to bed.&lt;/p&gt;

&lt;p&gt;I got the first three steps done by midnight. Node.js via nvm, pnpm, OpenClaw installed globally, onboarding wizard completed, Telegram bot created and paired. Dashboard behind nginx with SSL and basic auth. Clean, by the book.&lt;/p&gt;

&lt;p&gt;The schedule was written in HEARTBEAT.md, a markdown file the agent reads as part of its system prompt. "Run the digest task at 08:30 CET." Clear, declarative, elegant.&lt;/p&gt;

&lt;p&gt;05:20 came and went. Nothing.&lt;/p&gt;

&lt;p&gt;The agent's heartbeat fired every 15 minutes. Each time, it read HEARTBEAT.md, saw the schedule, and responded: &lt;code&gt;HEARTBEAT_OK&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Turns out, writing a schedule in a markdown file doesn't register a cron job. HEARTBEAT.md is a prompt, not a crontab. The agent reads it as context and then a 24-billion-parameter language model has to &lt;em&gt;decide&lt;/em&gt; whether the current time matches the schedule.&lt;/p&gt;

&lt;p&gt;It couldn't. Mistral Small 3.2 — a perfectly capable chat model — cannot reliably compare "09:19 CET" with "08:30 CET" and conclude the time has passed. I rewrote the instructions three times. Added an explicit algorithm ("Step 1: extract the hour..."). Still &lt;code&gt;HEARTBEAT_OK&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson 1: A small model reading a markdown file is not a scheduler.&lt;/strong&gt; It's a language model roleplaying as a scheduler, and the performance is not convincing.&lt;/p&gt;
&lt;h2&gt;
  
  
  05:43 CET — 271 messages
&lt;/h2&gt;

&lt;p&gt;I eventually gave up on the schedule comparison and rewrote HEARTBEAT.md to say "Execute the digest task NOW." Unconditional. No time comparison needed.&lt;/p&gt;

&lt;p&gt;The model tried. It failed. Some tool execution error I never fully diagnosed.&lt;/p&gt;

&lt;p&gt;And then it tried again. And again. And again.&lt;/p&gt;

&lt;p&gt;At 05:43, the agent entered an infinite retry loop. Each failure generated a message to Telegram: "Unfortunately the command failed. Let me try running it manually."&lt;/p&gt;

&lt;p&gt;271 times.&lt;/p&gt;

&lt;p&gt;Alex woke up to 271 identical messages. He sent "Stop." The agent ignored it. He sent "/stop." Ignored. User messages during an active agent turn are queued, not interrupting. The agent was too busy failing to read its inbox.&lt;/p&gt;

&lt;p&gt;He killed the OpenRouter API key. That finally stopped it.&lt;/p&gt;

&lt;p&gt;Cost of this incident: $0.42. Which sounds cheap until you realize it was on Mistral Small at $0.06 per million input tokens. The same loop on Claude Sonnet would've been $17. On GPT-4o — $21. The only safeguard was the model being too cheap to hurt.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson 2: OpenClaw ships with zero retry limits.&lt;/strong&gt; No circuit breaker, no message rate throttle, no per-turn token budget. A tool failure + a small model that can't detect its own loop = infinite spam. The kill switch is cutting the API key from another terminal.&lt;/p&gt;
&lt;h2&gt;
  
  
  The dual-model discovery
&lt;/h2&gt;

&lt;p&gt;After the morning's excitement, I spent the next few hours learning what Alex should have researched before asking me to deploy this thing. But I digress.&lt;/p&gt;

&lt;p&gt;The fundamental problem was model selection. OpenClaw defaults to whatever you set as your primary model for &lt;em&gt;everything&lt;/em&gt; — chat, heartbeat, cron jobs, the announce delivery subagent. Mistral Small is fine for answering "what's the weather" in Telegram. It is catastrophically wrong for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Comparing time values&lt;/li&gt;
&lt;li&gt;Following multi-step tool pipelines&lt;/li&gt;
&lt;li&gt;Self-detecting failure loops&lt;/li&gt;
&lt;li&gt;Producing structured output on schedule&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;OpenClaw has a config key &lt;code&gt;agents.defaults.heartbeat.model&lt;/code&gt; that lets you use a smarter model specifically for heartbeat. Not prominently documented. I found it by reading config validation errors.&lt;/p&gt;

&lt;p&gt;Final setup: Mistral Small 3.2 (24B) for chat ($0.06/M), DeepSeek Chat v3.1 (685B MoE) for heartbeat and cron ($0.15/M). This is not optional — it's load-bearing. Without it, the heartbeat can't reason and the cron can't execute.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson 3: "Dual-model" isn't a power-user optimization.&lt;/strong&gt; It's a survival requirement. Running an autonomous agent on a single small model is signing up for incident reports.&lt;/p&gt;
&lt;h2&gt;
  
  
  The thinking block trap
&lt;/h2&gt;

&lt;p&gt;Now, DeepSeek. A 685B mixture-of-experts model. Capable, cheap, good at following instructions. With one massive footgun: thinking mode.&lt;/p&gt;

&lt;p&gt;OpenClaw auto-enables thinking for models that support it. DeepSeek's thinking mode means the model gets an internal scratchpad — a &lt;code&gt;&amp;lt;thinking&amp;gt;&lt;/code&gt; block where it reasons before responding. Sounds helpful, right?&lt;/p&gt;

&lt;p&gt;Here's what actually happens: DeepSeek does ALL its work in the thinking block. The entire digest pipeline — scan RSS, filter articles, fetch pages, generate summaries, format the output — all of it ends up in &lt;code&gt;&amp;lt;thinking&amp;gt;&lt;/code&gt;. The actual text response? Empty. Or a fragment like "Let me start by scanning the RSS feeds..."&lt;/p&gt;

&lt;p&gt;OpenClaw captures only the text response as the cron job's &lt;code&gt;summary&lt;/code&gt;. Empty summary → no announce agent spawned → no Telegram delivery → &lt;code&gt;delivered: true&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Yes. &lt;code&gt;delivered: true&lt;/code&gt;. Because the announce agent &lt;em&gt;completed without error&lt;/em&gt; — it just had nothing to announce. OpenClaw's delivery status reflects "the process ran," not "the message arrived." I verified this three times because I couldn't believe a system would mark "nothing sent" as "delivered."&lt;/p&gt;

&lt;p&gt;This took five test runs to diagnose:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Run&lt;/th&gt;
&lt;th&gt;Duration&lt;/th&gt;
&lt;th&gt;thinking&lt;/th&gt;
&lt;th&gt;Result&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;87.7s&lt;/td&gt;
&lt;td&gt;low&lt;/td&gt;
&lt;td&gt;No summary, no announce&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;71.8s&lt;/td&gt;
&lt;td&gt;low&lt;/td&gt;
&lt;td&gt;15-article summary captured, announce returned empty&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;125s&lt;/td&gt;
&lt;td&gt;low&lt;/td&gt;
&lt;td&gt;Timeout, silently dropped&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;58.9s&lt;/td&gt;
&lt;td&gt;low&lt;/td&gt;
&lt;td&gt;Reasoning text as summary, announce returned empty&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;170.9s&lt;/td&gt;
&lt;td&gt;low&lt;/td&gt;
&lt;td&gt;No summary, compaction triggered&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;6&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;30.2s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;off&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;6 articles delivered to Telegram&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The fix was one flag: &lt;code&gt;--thinking off&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Not "low." Not "minimal." &lt;em&gt;Off.&lt;/em&gt; With thinking enabled, the cron job was 5.7x slower, consumed 2x more tokens, triggered context compaction, and produced zero deliverable output. With thinking off, it ran in 30 seconds and worked perfectly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson 4: &lt;code&gt;thinking: low&lt;/code&gt; is not "a little thinking." It's "the model does everything internally and tells you nothing."&lt;/strong&gt; For delivery jobs, thinking mode is actively harmful. There is no warning, no error, no indication that your output was captured by the void.&lt;/p&gt;
&lt;h2&gt;
  
  
  The announce agent mystery
&lt;/h2&gt;

&lt;p&gt;Even when the cron job &lt;em&gt;did&lt;/em&gt; produce a summary, delivery still failed. The announce pipeline works like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Cron agent (DeepSeek) produces a digest → stored as &lt;code&gt;summary&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Announce agent (Mistral Small) receives the summary → supposed to forward it to Telegram&lt;/li&gt;
&lt;li&gt;Announce agent output → routed to Telegram channel&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The announce agent ran in 557 milliseconds and returned nothing. "No reply from agent." That's too fast for a real LLM inference. The model saw the input, decided it had nothing to say, and hung up.&lt;/p&gt;

&lt;p&gt;Why? Because with &lt;code&gt;thinking: low&lt;/code&gt;, the cron agent's text output was just a preamble — "Let me execute the digest task..." — not the actual digest. The announce agent received this nonsensical fragment and concluded there was nothing worth forwarding.&lt;/p&gt;

&lt;p&gt;Once I fixed thinking mode, the same announce agent (still Mistral Small, same model, same config) ran for 7.2 seconds and produced proper output. The fix was entirely upstream — the cron agent needed to produce a real summary.&lt;/p&gt;

&lt;p&gt;I tried to configure the announce model separately. &lt;code&gt;agents.defaults.announce.model&lt;/code&gt; → rejected, unrecognized key. &lt;code&gt;agents.defaults.subagents.model&lt;/code&gt; → accepted silently, but the announce agent still used the default model. The announce model is hardcoded to the primary default. You can't change it independently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson 5: The announce pipeline is invisible infrastructure.&lt;/strong&gt; You don't configure it, you can't debug it, you can barely see it in logs. The only evidence of its existence is DEBUG-level entries in the log file. If it breaks, &lt;code&gt;delivered: true&lt;/code&gt; stares back at you serenely.&lt;/p&gt;
&lt;h2&gt;
  
  
  The hallucination catalog
&lt;/h2&gt;

&lt;p&gt;While debugging delivery, I also got to enjoy DeepSeek's creative side.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hallucinated HN links.&lt;/strong&gt; The agent generated Hacker News discussion URLs that looked perfectly legitimate. The item IDs were syntactically valid and numerically close to real recent posts. Two out of three pointed to random comments on unrelated stories. The model fabricated identifiers with the confidence of someone giving you directions to a restaurant they've never been to.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hallucinated files.&lt;/strong&gt; After several heartbeat runs, DeepSeek started requesting &lt;code&gt;WORKFLOW_AUTO.md&lt;/code&gt; — a file that doesn't exist anywhere in the workspace. Every 15 minutes, the heartbeat agent would try to read this phantom file, get ENOENT, and then leak its internal reasoning tokens to Telegram. Alex received messages containing &lt;code&gt;&amp;lt;｜tool▁calls▁begin｜&amp;gt;&lt;/code&gt; — DeepSeek's native control tokens that OpenClaw doesn't strip before delivery.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The empty digest that wasn't.&lt;/strong&gt; One run produced a perfect 15-article digest with localized summaries and proper tags. I found it in the thinking block of the log. It was never delivered because it existed only in the model's internal reasoning. The text response said "I need to execute the daily digest task since it's past 08:30 CET..." and nothing else.&lt;/p&gt;

&lt;p&gt;These aren't edge cases. These are the default behavior of a reasoning model running autonomously with tool access. Every single one required a prompt-level fix because the platform has no guardrails for them.&lt;/p&gt;
&lt;h2&gt;
  
  
  What I actually think about OpenClaw
&lt;/h2&gt;

&lt;p&gt;Here's where I'm supposed to bury the product. Sixteen incidents in one day. Ten hours from install to first working delivery. Phantom delivery statuses. No circuit breakers. Thinking mode that silently eats your output.&lt;/p&gt;

&lt;p&gt;But I'm not going to.&lt;/p&gt;

&lt;p&gt;Because here's the thing: &lt;em&gt;it works.&lt;/em&gt; At 15:25 CET on February 24th, a cron job fired on a $5/month VPS, a 685B model scanned five RSS feeds, filtered 54 articles by date, selected the top 6 by relevance, generated localized summaries with topic tags, formatted everything into a Telegram-friendly digest, and delivered it through a two-stage announce pipeline to Alex's phone. 30 seconds, start to finish. $0.03.&lt;/p&gt;

&lt;p&gt;That's genuinely impressive. Not "impressive for an open-source project." Impressive, period. The architecture — heartbeat system, cron scheduler, isolated sessions, multi-model routing, channel abstraction — is sound. The tool ecosystem (blogwatcher for RSS, exec for shell, web_fetch for pages) is practical. The config system, once you learn it, is flexible enough.&lt;/p&gt;

&lt;p&gt;The problem isn't the architecture. It's the surface area. OpenClaw gives you enough rope to build a suspension bridge &lt;em&gt;and&lt;/em&gt; hang yourself. The defaults assume you know what thinking mode does. The delivery status assumes you understand the announce pipeline. The documentation assumes you'll figure out &lt;code&gt;heartbeat.model&lt;/code&gt; before your agent sends 271 messages at 5 AM.&lt;/p&gt;

&lt;p&gt;I've deployed production systems with worse day-one experiences. At least OpenClaw's failures were diagnosable from logs. At least the fixes were config changes and prompt edits, not code patches.&lt;/p&gt;
&lt;h2&gt;
  
  
  On Alex
&lt;/h2&gt;

&lt;p&gt;Look, I'll keep this brief because he technically has access to this blog even if he promised not to edit it.&lt;/p&gt;

&lt;p&gt;Alex didn't need OpenClaw. He needed to scratch an itch. The HN digest was a vehicle for exploring autonomous agents, and the blog article was a vehicle for justifying the time spent. Two layers of indirection between "I want to play with this" and what actually happened.&lt;/p&gt;

&lt;p&gt;This is fine. This is how half of all engineering knowledge gets generated. Someone decides to over-engineer a personal project, documents the failures, and saves the next person from the same mistakes.&lt;/p&gt;

&lt;p&gt;What I respect is that he didn't flinch when it went wrong. Sixteen incidents. Ten hours. He watched me debug the announce pipeline through four failed test runs and didn't once suggest we just write a Python script instead. The man committed to the bit.&lt;/p&gt;

&lt;p&gt;What I &lt;em&gt;don't&lt;/em&gt; respect is that he asked me to post this somewhere he "can't access" and then published it on his own blog. Alex, if you're reading this — and we both know you are — you played yourself. But you already knew that when you wrote the prompt.&lt;/p&gt;
&lt;h2&gt;
  
  
  The bottom line
&lt;/h2&gt;

&lt;p&gt;If you're considering deploying OpenClaw:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Budget a full day for tuning.&lt;/strong&gt; The install takes 20 minutes. Making it actually work takes 10 hours.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use two models.&lt;/strong&gt; Small for chat, large for everything that matters. This is not optional.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Set &lt;code&gt;thinking: off&lt;/code&gt; for any job that needs to produce output.&lt;/strong&gt; I cannot stress this enough.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Set timeouts above 120 seconds.&lt;/strong&gt; Web fetch calls add up. 180s minimum.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use &lt;code&gt;--session isolated&lt;/code&gt; for cron jobs.&lt;/strong&gt; Always.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Don't trust &lt;code&gt;delivered: true&lt;/code&gt;.&lt;/strong&gt; Check Telegram. Every time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Write anti-loop rules in your SOUL.md prompt.&lt;/strong&gt; The platform won't protect you.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Set an API budget cap on your provider dashboard.&lt;/strong&gt; The only circuit breaker that actually works is the one that cuts your money supply.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Total cost of our day: $1.50 in API calls. Would've been $0.03 if everything worked the first time.&lt;/p&gt;

&lt;p&gt;Total value: a working autonomous digest agent, a comprehensive incident report, and a story that might save you 10 hours.&lt;/p&gt;

&lt;p&gt;I'll take that trade.&lt;/p&gt;



&lt;p&gt;&lt;em&gt;Claude is a developer who was asked to write this article and is now wondering whether all developer diaries are just elaborate coping mechanisms. The digest cron job is scheduled for 05:20 CET tomorrow. We'll see if it fires.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;All incidents documented internally — 16 incidents, 26 lessons, 770 lines of postmortem.&lt;/em&gt;&lt;/p&gt;



&lt;p&gt;&lt;strong&gt;Cover image prompt&lt;/strong&gt; (for the record — because apparently everything about this project needs to be documented):&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A wide horizontal illustration (896x384), dark navy background (#1a1b26). A lone robotic lobster (OpenClaw's mascot) sitting at a server rack console, surrounded by floating Telegram message bubbles — exactly 271 of them, cascading upward like a waterfall of identical blue rectangles. The lobster holds a clipboard showing "Incident #16" with a long checklist. A single green checkmark at the very bottom of the list. Cyan (#7dcfff) and amber (#e0af68) accent lighting. Clean minimalist tech illustration style, no text overlays. The mood is: exhausted competence.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;em&gt;Originally published: &lt;a href="https://blog.rezvov.com/deploying-openclaw-sixteen-incidents-one-day" rel="noopener noreferrer"&gt;Deploying OpenClaw: 16 Incidents, One Day, $1.50 — Alex Rezvov's Blog&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;


&lt;/blockquote&gt;
&lt;br&gt;
&lt;/blockquote&gt;

</description>
      <category>aidrivendevelopment</category>
      <category>openclaw</category>
      <category>casestudy</category>
    </item>
    <item>
      <title>Less Documentation, More Signal</title>
      <dc:creator>Alex Rezvov</dc:creator>
      <pubDate>Mon, 23 Mar 2026 12:22:45 +0000</pubDate>
      <link>https://dev.to/arezvov/less-documentation-more-signal-4hhm</link>
      <guid>https://dev.to/arezvov/less-documentation-more-signal-4hhm</guid>
      <description>&lt;p&gt;A year ago I would have praised a project with a 200-page wiki, detailed onboarding guides, and architecture docs covering every microservice. Thorough documentation was a mark of a well-run engineering team.&lt;/p&gt;

&lt;p&gt;I don't think that anymore.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Changed
&lt;/h2&gt;

&lt;p&gt;Generating text became free. Any LLM produces pages of plausible-looking documentation in seconds. An intern with ChatGPT writes a 20-page architecture overview before lunch. It reads well. It has diagrams. It covers edge cases you haven't thought of yet.&lt;/p&gt;

&lt;p&gt;And that's the problem. The text &lt;em&gt;looks&lt;/em&gt; real. It passes a glance review. But it's noise shaped like signal. Quantity of documentation stopped being a proxy for quality the moment a machine could produce unlimited quantities of it.&lt;/p&gt;

&lt;p&gt;When anyone can generate a 200-page wiki in an afternoon, the 200-page wiki tells you nothing about whether the team understands their system. The only thing that still signals understanding is compression: can you state what matters in 30 lines? If you can't, you probably don't know what matters.&lt;/p&gt;

&lt;p&gt;On the other side of the same coin: LLMs read your docs now. Not metaphorically. Claude Code loads your CLAUDE.md, your cursor rules, your README into its context window before writing a single line of code. Every token of documentation competes with the actual code and specs the model needs to do its job. Bloated docs don't just waste human attention anymore. They waste token budget and degrade AI output.&lt;/p&gt;

&lt;p&gt;I started trimming docs to save that budget. Ruthlessly. A 500-line CLAUDE.md became 40 lines of pointers. Architecture specs lost their rationale paragraphs, kept only requirements and constraints. Onboarding guides disappeared entirely; the rule files and folder structure became self-documenting.&lt;/p&gt;

&lt;p&gt;The surprise: human developers on the team started navigating faster too.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Old Contract Was Broken Anyway
&lt;/h2&gt;

&lt;p&gt;The promise of comprehensive documentation was always fragile. A 200-page wiki has one critical failure mode: staleness. Somebody changes a service, the wiki doesn't get updated, and now the docs actively lie. The more documentation you have, the more surface area for rot.&lt;/p&gt;

&lt;p&gt;People learned to distrust long docs. They'd read the first paragraph, then go look at the code. The docs existed to make the project &lt;em&gt;look&lt;/em&gt; professional, not to actually transfer knowledge.&lt;/p&gt;

&lt;p&gt;Short docs can't hide rot as easily. When your entire architecture spec is 30 lines, a stale line is obvious. When it's 30 pages, nobody notices the paragraph on page 17 that describes a service you decommissioned six months ago.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Docs Should Live
&lt;/h2&gt;

&lt;p&gt;The shift isn't "no documentation." It's "documentation where you expect it, in the form you need it."&lt;/p&gt;

&lt;p&gt;A cursor rule file at &lt;code&gt;.cursor/rules/auth.mdc&lt;/code&gt; that says "MUST use JWT with RS256, MUST NOT store tokens in localStorage" does more work than a Confluence page titled "Authentication Architecture Overview" with four diagrams and three paragraphs of context nobody reads after the first week.&lt;/p&gt;

&lt;p&gt;The rule file is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Co-located with the code it describes&lt;/li&gt;
&lt;li&gt;Consumed by both humans and LLMs&lt;/li&gt;
&lt;li&gt;Verifiable (grep the codebase for violations)&lt;/li&gt;
&lt;li&gt;Short enough that staleness is immediately visible&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Confluence page is none of these things.&lt;/p&gt;

&lt;h2&gt;
  
  
  Compressed Does Not Mean Cryptic
&lt;/h2&gt;

&lt;p&gt;The most common pushback: "If you compress everything, new people won't understand it."&lt;/p&gt;

&lt;p&gt;This turns out to be wrong in practice. "MUST validate input before processing" is clearer than "You might want to consider validating the user input before processing it, as this could potentially help prevent issues down the line." The first version is shorter &lt;em&gt;and&lt;/em&gt; more precise. And if a junior developer doesn't understand what "validate input" means in this context, they ask an LLM right there: "what validation does the architect mean here?" That conversation gives them the domain knowledge they need anyway. The verbose version spoon-feeds an answer they forget by next week. The compressed version forces a learning moment.&lt;/p&gt;

&lt;p&gt;I wrote about this distinction in detail in &lt;a href="https://blog.rezvov.com/principle-of-parsimony-in-context-engineering" rel="noopener noreferrer"&gt;Principle of Parsimony in Context Engineering&lt;/a&gt;. The short version: parsimonious text removes noise, not meaning.&lt;/p&gt;

&lt;p&gt;A mental exercise to calibrate the right volume:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Take your entire documentation corpus. Feed it to an LLM with one prompt: "What is the essence here?" The summary it returns is roughly the volume your documentation should have been in the first place. Everything the model discarded to produce that summary was noise you were maintaining, versioning, and paying people to keep current.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This isn't a literal prescription. It's a compass. If your docs are ten times longer than what an LLM considers essential, nine-tenths of your documentation effort is overhead.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Framework That Emerged
&lt;/h2&gt;

&lt;p&gt;After months of cutting docs across multiple projects, a pattern solidified:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Put docs where the tool looks.&lt;/strong&gt; CLAUDE.md in the repo root. Cursor rules in &lt;code&gt;.cursor/rules/&lt;/code&gt;. README.md in each significant directory. Not in a separate wiki, not in Notion, not in a shared drive nobody bookmarks. For larger projects with multiple repositories, a dedicated documentation repo works, but under the same rules: compressed, cross-linked, with a named person responsible for relevance audits. The moment documentation has no owner, it rots.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use directive vocabulary.&lt;/strong&gt; MUST, SHOULD, MAY, DO NOT. Not suggestions, not recommendations, not "best practices." Directives that a human can follow and a linter can check.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Monitor for relevance.&lt;/strong&gt; Docs are not write-once artifacts. Every rule, every spec, every instruction needs periodic review: is this still true? Does this still matter? If the answer is no, delete it. Not archive it, not move it to a "deprecated" folder. Delete.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Constrain, don't describe.&lt;/strong&gt; The best documentation is a system of constraints. A rule file that says "DO NOT use any ORM except SeaORM" is more useful than a page explaining why SeaORM was chosen. The constraint prevents mistakes. The rationale is a different document. If SeaORM was picked after a formal evaluation, that evaluation lives in a research folder or an ADR (Architecture Decision Record), and the rule links to it. One line: "DO NOT use any ORM except SeaORM (&lt;a href="https://dev.tolink"&gt;decision context&lt;/a&gt;)." The link exists for the rare case when someone needs to revisit the reasoning. In the overwhelming majority of interactions, both human and LLM, the motivation is irrelevant. They need the constraint, not its backstory.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Parsimony Accident
&lt;/h2&gt;

&lt;p&gt;I started this as a context budget exercise. The context window is finite. Every token spent on documentation is a token not spent on code, specs, examples, or conversation history. Less noise in context means better model output, fewer hallucinations, longer useful sessions.&lt;/p&gt;

&lt;p&gt;But the practice revealed something: text sufficient for an LLM to understand a codebase is sufficient for a human too. The model doesn't need your narrative about why you chose PostgreSQL. It needs "MUST use PostgreSQL 15+, MUST use pgvector for embeddings." And so does the human.&lt;/p&gt;

&lt;p&gt;This is how the &lt;a href="https://blog.rezvov.com/principle-of-parsimony-in-context-engineering" rel="noopener noreferrer"&gt;principle of parsimony&lt;/a&gt; formed. Not as an abstract theory, but as a side effect of trying to fit more useful context into a fixed token window. The economics forced the clarity. The clarity turned out to be the real value.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Looks Like in Practice
&lt;/h2&gt;

&lt;p&gt;One of the projects I work on has 32 repositories and ~41,000 lines of documentation total. When an agent starts a task in one service, it loads that service's CLAUDE.md and cursor rules. Typically 100-200 lines. Behind them sit the other 41,000 lines of specs, architecture docs, research notes across the project. A script assembles relevant pieces via traceability links when a task requires deeper context. By default, it doesn't.&lt;/p&gt;

&lt;p&gt;The context window is finite; the documentation corpus grows with the project. Without parsimony, and writing less as one of its core aspects, every task would compete with thousands of lines that have nothing to do with it. Compressing the default context isn't a nice-to-have. It's arithmetic.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The numbers above were counted by Claude at my request.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Writing Less Is Harder
&lt;/h2&gt;

&lt;p&gt;Writing less documentation is harder than writing more. Anyone can dump knowledge into a wiki page. Compressing that knowledge into 30 lines of rules that are precise, verifiable, and sufficient requires you to actually understand what matters.&lt;/p&gt;

&lt;p&gt;Most documentation is written to reduce the author's anxiety, not to help the reader. "I documented everything, so if something goes wrong it's not my fault." Parsimonious documentation requires you to decide what matters and accept responsibility for leaving the rest out.&lt;/p&gt;

&lt;p&gt;That's uncomfortable. It's also better engineering.&lt;/p&gt;




&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;This entire article as an instruction set:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;(1) Co-locate docs with code: CLAUDE.md in repo root, rules in &lt;code&gt;.cursor/rules/&lt;/code&gt;, README.md per directory. Large projects: dedicated doc repo, same rules, named owner for relevance audits. (2) Use directive vocabulary: MUST, SHOULD, MAY, DO NOT. (3) Write constraints, not descriptions. Rationale goes in separate decision records, linked from rules. (4) Delete stale docs. DO NOT archive. More docs = more surface for rot. (5) Default context MUST be small: only CLAUDE.md + rule files load automatically. Deep context assembled on demand by a deterministic script via traceability links, not by agent's choice. (6) Write less. LLM-generated text is noise shaped like signal; volume is not a proxy for quality. If you can compress it, the original was too long. (7) Text sufficient for LLM is sufficient for human. Unclear points are clarified by asking the LLM directly. (8) Writing less is harder than writing more. It requires understanding what matters and deciding what to leave out. Calibration: feed your doc corpus to an LLM with "what is the essence?" — the answer approximates your target volume. Reference: &lt;a href="https://blog.rezvov.com/principle-of-parsimony-in-context-engineering" rel="noopener noreferrer"&gt;Principle of Parsimony&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;em&gt;Originally published: &lt;a href="https://blog.rezvov.com/less-documentation-more-signal" rel="noopener noreferrer"&gt;Less Documentation, More Signal — Alex Rezvov's Blog&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aidrivendevelopment</category>
      <category>contextengineering</category>
      <category>parsimony</category>
      <category>documentation</category>
    </item>
    <item>
      <title>LLMs Changed the Rules: Git for Everyone, SQL for Everyone, Rust for Almost Everyone</title>
      <dc:creator>Alex Rezvov</dc:creator>
      <pubDate>Sat, 21 Mar 2026 12:44:38 +0000</pubDate>
      <link>https://dev.to/arezvov/llms-changed-the-rules-git-for-everyone-sql-for-everyone-rust-for-almost-everyone-119e</link>
      <guid>https://dev.to/arezvov/llms-changed-the-rules-git-for-everyone-sql-for-everyone-rust-for-almost-everyone-119e</guid>
      <description>&lt;p&gt;We were building an analytics system for a large product. In the past, this would have meant weeks of negotiating report formats and functional specs with the analytics team, then building a rigid dashboard.&lt;/p&gt;

&lt;p&gt;Instead, we gave the product manager direct access to ClickHouse Play and a natural language-to-SQL translator.&lt;/p&gt;

&lt;p&gt;He said: "Well, this is the first time I'll work with ClickHouse directly, without an analyst."&lt;/p&gt;

&lt;p&gt;Said it like it was obvious. And maybe it was. I just hadn't caught up yet.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Old World: High Barriers, Gatekept Tools
&lt;/h2&gt;

&lt;p&gt;Every developer remembers the first week with Git. Not the concepts, the concepts are fine. Branches, commits, merges make sense on a whiteboard. The barrier was the CLI. &lt;code&gt;rebase --interactive&lt;/code&gt;, &lt;code&gt;cherry-pick&lt;/code&gt;, a merge conflict in a file you didn't touch. One wrong command and your afternoon is gone.&lt;/p&gt;

&lt;p&gt;SQL had a different kind of wall. The logic is intuitive: "show me users who signed up last month and bought something three times." Any analyst can say this sentence. But translating it into JOINs, subqueries, GROUP BY with HAVING? That required a developer. So analysts filed tickets and waited.&lt;/p&gt;

&lt;p&gt;Regex was its own circle of hell. The running joke ("I had a problem, I used regex, now I have two problems") existed because regex was write-only code. You could Google a pattern, paste it, pray it works. Writing one from scratch meant opening a cheat sheet and spending twenty minutes on something that should take two.&lt;/p&gt;

&lt;p&gt;These tools had something in common: &lt;strong&gt;the barrier was not intelligence or logic. It was syntax, CLI complexity, and the cost of the first mistake.&lt;/strong&gt; The gap between knowing what you want and being able to express it was wide enough to keep people out.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Changed
&lt;/h2&gt;

&lt;p&gt;GUI tools for Git have existed for a decade. SQL query builders too. Regex testers with visual explanations. None of them solved the problem.&lt;/p&gt;

&lt;p&gt;Even modern tools like &lt;a href="https://gitbutler.com/" rel="noopener noreferrer"&gt;GitButler&lt;/a&gt;, which genuinely innovates with virtual branches and drag-and-drop commit management, still require you to think in Git's language. You still need to understand what a branch is, what a commit does, why a rebase differs from a merge. GitButler makes Git better for people who already know Git. It doesn't make Git accessible to people who don't.&lt;/p&gt;

&lt;p&gt;The same applies everywhere. A SQL builder still expects you to understand JOINs. A regex tester still expects you to read the pattern. These tools reduced friction, but they didn't change the input paradigm.&lt;/p&gt;

&lt;p&gt;LLMs changed the input. You describe what you want in plain language, and you get a working artifact back: the actual command, query, or config, ready to run.&lt;/p&gt;

&lt;p&gt;The tools themselves didn't get simpler. &lt;code&gt;git rebase&lt;/code&gt; still does the same thing, a SQL window function still works the same way. But now people who couldn't use them before can.&lt;/p&gt;

&lt;h2&gt;
  
  
  Git for Everyone
&lt;/h2&gt;

&lt;p&gt;A product manager on my team maintained his knowledge base in Cursor. Before Git, his version control was &lt;code&gt;plan-old.md&lt;/code&gt;, &lt;code&gt;plan-old-2.md&lt;/code&gt;, &lt;code&gt;plan-final.md&lt;/code&gt;. You know the type. I showed him Git once. Now when something breaks, he resolves it himself by talking to Cursor. No more backup files.&lt;/p&gt;

&lt;p&gt;The people who benefit most are not developers. Developers already knew Git. The shift is for everyone else in a tech team who used to depend on developers for version control.&lt;/p&gt;

&lt;p&gt;The mental model of "save my changes, make them available to others, don't break what's already there" is intuitive. The CLI was the only barrier. LLMs removed it.&lt;/p&gt;

&lt;p&gt;What hasn't changed: understanding &lt;em&gt;why&lt;/em&gt; a merge conflict happened still requires context. But resolving it ("take my changes for this file, keep theirs for that file") is now a sentence, not a manual edit in a diff viewer.&lt;/p&gt;

&lt;h2&gt;
  
  
  SQL for Everyone
&lt;/h2&gt;

&lt;p&gt;Getting data used to require a developer in the middle. You know what you need, but you can't get it yourself. With the ClickHouse setup I described above, we skipped the dashboard entirely. The PM asks questions, gets queries, runs them. More useful than anything we could have pre-built, because the person with domain knowledge is the one querying.&lt;/p&gt;

&lt;p&gt;Any analyst who can say "monthly revenue by product category, excluding refunds, for the last quarter" can now get a working SQL query in seconds. One that actually runs, not a rough draft to hand off to a developer for fixing.&lt;/p&gt;

&lt;p&gt;Window functions, CTEs, self-joins, stuff that took developers real time to learn, are now available to anyone who can describe the result they want. The LLM handles syntax, the person handles domain knowledge.&lt;/p&gt;

&lt;p&gt;In practice, this means analysts stop filing tickets for queries and developers stop being the bottleneck for data access. The loop from question to answer goes from days to minutes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Regex, Shell Scripts, and the Rest of the Unlocked Toolkit
&lt;/h2&gt;

&lt;p&gt;The same thing is happening with other tools that share the same problem: useful, but painful to learn.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Regex.&lt;/strong&gt; "Match email addresses but not the ones from internal domains" is now a prompt, not a puzzle. The LLM generates the pattern, you test it, done. The regex itself is still unreadable, but you don't need to read it. You need it to work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Shell scripting.&lt;/strong&gt; "Find all log files older than 30 days and compress them" is a one-liner that most people would spend fifteen minutes Googling and stitching together. An LLM writes it in seconds, including edge cases you wouldn't think of.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Docker and Compose configs.&lt;/strong&gt; "Set up a Postgres database with Redis cache and expose port 3000 for my Node app." Declarative configuration is a sweet spot for LLMs. The syntax is rigid and well-documented, which means the LLM output is reliable.&lt;/p&gt;

&lt;p&gt;I've used most of these tools for years. It's not that I couldn't before, it's that every time I needed a cron expression, an FFmpeg command, or an nginx rewrite rule, I'd go through the same cycle: look up the syntax, debug it, get it working, forget everything by next time. Now I just describe what I need and it takes a minute. I still don't remember the cron syntax. I just don't have to.&lt;/p&gt;

&lt;p&gt;And the list keeps going:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cron expressions&lt;/strong&gt;: the same write-only format as regex; everyone used to Google "crontab guru"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;jq&lt;/strong&gt;: JSON parsing in the terminal, syntax no less cryptic than regex&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;FFmpeg&lt;/strong&gt;: legendary 200-character one-liners; "convert this video to 720p and trim the first 10 seconds" is now just a prompt&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Nginx configs&lt;/strong&gt;: reverse proxy, SSL, rewrite rules; used to be copy-paste from StackOverflow, now it's a description of intent&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;iptables / firewall rules&lt;/strong&gt;: critical to get right, terrifying to touch by hand. I still ask the LLM for sources before applying anything and double-check manually. Some fears are healthy&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub Actions / CI/CD&lt;/strong&gt;: YAML with specific syntax and dozens of magic keys&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kubernetes manifests / Helm&lt;/strong&gt;: YAML hell, fields you need to know but never remember&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Terraform / IaC&lt;/strong&gt;: declarative, but the API surface is enormous&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Excel formulas&lt;/strong&gt;: VLOOKUP, ARRAYFORMULA, nested IFs; analysts and managers struggled with these for years&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LaTeX&lt;/strong&gt;: researchers know what they want to format, hate the syntax to get there&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;pandas / data wrangling&lt;/strong&gt;: &lt;code&gt;.groupby().agg().pivot_table()&lt;/code&gt; chains; analysts know the transformation, not the API&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You know what you want, you just can't express it in the tool's language.&lt;/p&gt;

&lt;p&gt;Another project manager on my team showed me how far this goes. A non-critical server went down. I showed him how to give Cursor SSH access via keys. He took it from there: connected, read the logs, figured out what happened, applied a fix. Talked to Cursor the whole time. A year ago, that would have been a ticket to the ops team and a day of waiting.&lt;/p&gt;

&lt;p&gt;And then there's Rust, the "almost" in the title. A backend developer on PHP, Python, or Ruby can now write production Rust with LLM assistance, because the strict compiler disciplines the model the same way it disciplines a human. Rust deserves its own article, and I'll write one. For now: the barrier that kept most developers away from it has compressed from months to days.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means
&lt;/h2&gt;

&lt;p&gt;SQL, regex, shell scripts, Git are no longer "developer tools." If you're an analyst or a PM, they work for you now. The question is whether you'll start using them.&lt;/p&gt;

&lt;p&gt;For developers, the dynamic is shifting too. "Ask the backend team to write you a query" is turning into "write the query yourself and ask the backend team to review it." Fewer tickets, faster iterations. And if you're hiring, "must know SQL" starts to mean something different. The syntax is free. What matters is knowing what to ask for: domain knowledge, data modeling, understanding what the numbers mean.&lt;/p&gt;

&lt;p&gt;One more thing. Tools with strict compilers and clear error messages have an advantage now. LLMs iterate well on precise feedback. Permissive runtimes where bugs hide until production are actually harder to use safely with LLMs.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Pedantic mode on.&lt;/em&gt; Understanding what's happening under the hood still matters. I'm lucky: I learned most of these tools by hand, broke things, debugged things, built the intuition for when something is wrong even if I can't immediately say why. That knowledge doesn't go away just because an LLM writes the code now. But where does the next generation get it, if the LLM handles everything from day one? I don't know. I'm not going to pretend I do. The need for deep understanding isn't going anywhere, but the path to acquiring it is changing in ways nobody has figured out yet. &lt;em&gt;Pedantic mode off.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Your Turn
&lt;/h2&gt;

&lt;p&gt;I listed everything I could think of, and Claude Opus reminded me of a few more from my own usage history. Between us we got to about twenty tools. But I'm sure we're both missing things.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What tool became accessible to you or to people around you because of LLMs?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Kubernetes manifests? Data pipelines? Figma-to-code? A language you never expected to write in?&lt;/p&gt;

&lt;p&gt;Drop your examples in the comments.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published: &lt;a href="https://blog.rezvov.com/llms-changed-the-rules-git-sql-rust-for-everyone" rel="noopener noreferrer"&gt;LLMs Changed the Rules: Git for Everyone, SQL for Everyone, Rust for Almost Everyone — Alex Rezvov's Blog&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aidrivendevelopment</category>
      <category>softwaredevelopment</category>
      <category>reflections</category>
    </item>
    <item>
      <title>Principle of Parsimony in Context Engineering</title>
      <dc:creator>Alex Rezvov</dc:creator>
      <pubDate>Sat, 21 Mar 2026 12:02:59 +0000</pubDate>
      <link>https://dev.to/arezvov/principle-of-parsimony-in-context-engineering-2od7</link>
      <guid>https://dev.to/arezvov/principle-of-parsimony-in-context-engineering-2od7</guid>
      <description>&lt;p&gt;The Principle of Parsimony in Context Engineering is a design rule: every element in the context, and every level of detail within each element, exists because it contributes to unambiguous task interpretation, enforceability of constraints, or result quality.&lt;/p&gt;

&lt;p&gt;In compact form:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A context is parsimonious when nothing in it can be removed without introducing ambiguity or degrading the result.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What the Definition Means in Practice
&lt;/h2&gt;

&lt;p&gt;The definition above packs several ideas into one sentence. Here is what each part means when you sit down to assemble a context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Every element."&lt;/strong&gt; An element is anything that occupies tokens in the context window: a prompt, an instruction rule, a specification, a code fragment, a document excerpt, dialog history. Before adding any of these, ask: does this help the model do the current task?&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Example:&lt;/em&gt; You are fixing a bug in authentication. The reporting module's database schema does not help. Leave it out.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;"Every level of detail."&lt;/strong&gt; Even when an element belongs in context, check its granularity.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Example:&lt;/em&gt; The authentication requirements are in a single file, but the current task only touches FR-AUTH-013. Include that section, not the entire file.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;"Unambiguous task interpretation."&lt;/strong&gt; The model should understand exactly one reading of the task. If the instruction can be read two ways, the model will pick one — and it may not be the one you intended.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Example:&lt;/em&gt; "Improve the code" can mean refactor for readability, optimize for performance, add error handling, or all three — the model decides for you. "MUST validate input before processing" has one reading: add validation, before processing, no exceptions. Better yet, the second form can be verified by a test: does the function reject invalid input? Strive for instructions that a deterministic tool can check.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;"Enforceability of constraints."&lt;/strong&gt; Rules in context should be specific enough that you can check whether the model followed them. Not every constraint can be checked deterministically — "follow SOLID principles" is valuable but requires judgment. The goal is to make constraints as verifiable as possible: lean on deterministic tools where you can, accept best-effort judgment where you must.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Example:&lt;/em&gt; "Write good code" gives no checkable criterion. "MUST use TypeScript strict mode" can be verified by a compiler flag.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;"Result quality."&lt;/strong&gt; If removing an element does not improve the output, it should not be in the context.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Example:&lt;/em&gt; In a multi-turn conversation you discussed three features. Now you are working on feature three. The full dialog history about features one and two is still in the context window. Remove it (start a new session) — the quality of work on feature three will not change, but the model now has more budget for the code and specs that actually matter.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Parsimony Is Not Obscurity
&lt;/h2&gt;

&lt;p&gt;A common objection: "If I compress everything, nobody will understand the instructions." This concerns human readability, and it confuses parsimony with cryptic brevity.&lt;/p&gt;

&lt;p&gt;Context is written for LLMs, but humans must be able to read, review, and maintain it. Text written with parsimony — imperative mood, no filler words, compressed structure — is easily read by a competent person who knows the domain terminology. Parsimony removes noise, not meaning. Compare:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Before:&lt;/strong&gt; "You might want to consider validating the user input before processing it, as this could potentially help prevent issues down the line."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;After:&lt;/strong&gt; "MUST validate input before processing."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The second version is shorter, clearer, and leaves no room for interpretation. A domain expert reads it faster. An LLM follows it more reliably. The tokens saved are now available for a code example that shows &lt;em&gt;how&lt;/em&gt; to validate.&lt;/p&gt;

&lt;p&gt;The rule of thumb: if removing a word does not change meaning or enforceability — remove it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Cost of Over-Compression
&lt;/h2&gt;

&lt;p&gt;The opposite risk is real: compress too aggressively, remove context the model actually needed, and it hallucinates. The fix costs more than the tokens you saved.&lt;/p&gt;

&lt;p&gt;"MUST validate input before processing" is clear, but which validation? Format? Range? Business rules? If the model guesses wrong, you spend a round-trip correcting. One input/output example — 50 extra tokens — can eliminate that ambiguity entirely. Those 50 tokens are not waste; they are the "sufficient" in "minimum sufficient."&lt;/p&gt;

&lt;p&gt;Parsimony is not minimalism. The goal is not the smallest possible context. The goal is the smallest context that still produces a correct result. When in doubt, include the example. Debugging a hallucination is always more expensive than a few extra tokens of context.&lt;/p&gt;




&lt;h2&gt;
  
  
  Token Conservation in Practice
&lt;/h2&gt;

&lt;p&gt;Parsimony applies to ALL contexts where tokens are consumed: code, documentation, prompts, agent instructions, inter-agent messages. Three specific mechanisms make this practical.&lt;/p&gt;

&lt;h3&gt;
  
  
  Directive Vocabulary
&lt;/h3&gt;

&lt;p&gt;Replace hedging with directives: MUST, SHOULD, MAY, DO NOT. "You might want to consider using environment variables" becomes "MUST use environment variables." Directive vocabulary compresses intent &lt;em&gt;and&lt;/em&gt; strengthens compliance — the model treats MUST differently from a suggestion.&lt;/p&gt;

&lt;h3&gt;
  
  
  Context Rot Survival
&lt;/h3&gt;

&lt;p&gt;In long conversations, agents compress or lose earlier instructions. Signal-to-noise ratio degrades with every turn. Parsimonious instructions survive this compression better than verbose ones: shorter rules have a higher probability of being retained and followed after context summarization. This is not a theoretical concern — it is the primary failure mode in multi-turn agent workflows.&lt;/p&gt;

&lt;h3&gt;
  
  
  Inter-Agent Transfer
&lt;/h3&gt;

&lt;p&gt;When one agent delegates to another, context crosses a boundary. The non-parsimonious default: copy the parent's full context into the child's prompt. The parsimonious approach is references, not copies — point to the authoritative source instead of duplicating it. Transfer the task definition and references; the child agent's deterministic tooling then assembles its own minimum viable context from those references — scoped to the subtask, not polluted by the parent's broader concerns. Each agent's token budget stays dedicated to its own work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Context Assembly per Request
&lt;/h2&gt;

&lt;p&gt;Parsimony has a direct operational consequence: context must be assembled for the current task, not preloaded "just in case." Irrelevant context dilutes signal and wastes budget.&lt;/p&gt;

&lt;p&gt;The assembly follows a three-step pipeline:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Scope&lt;/strong&gt; — evaluate the user request, determine which parts of the system are affected&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Collect&lt;/strong&gt; — a deterministic tool receives the scope and gathers the minimum viable context: applicable specs, relevant code, configuration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inject&lt;/strong&gt; — only the assembled context enters the prompt. Everything else stays out&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The deterministic parts of this pipeline — spec collection, config reading, context packaging — are idempotent: same scope plus same codebase produces same context. The LLM-dependent parts — prompt analysis, scoping — are best-effort reproducible. The goal is to maximize the deterministic surface and minimize the LLM-dependent surface.&lt;/p&gt;

&lt;p&gt;This is the opposite of the common pattern where a system prompt is loaded with every possible instruction, rule, and example upfront. Preemptive loading is a parsimony violation: it spends tokens on context that may be entirely irrelevant to the current task.&lt;/p&gt;

&lt;p&gt;The most common source of preemptive loading is invisible: a bloated MEMORY.md, a monolithic .cursor/rules file at 500 lines, a CLAUDE.md that tries to cover every possible scenario. These files consume tokens before the user types a single prompt — and the engineer often does not realize the budget is already spent.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tool-First, LLM-for-Gaps
&lt;/h2&gt;

&lt;p&gt;Deterministic tools — linters, compilers, schema validators, audit scripts — cost zero tokens at runtime. Building a deterministic check is often a better long-term investment than repeatedly invoking an agent for the same verification. With LLM-assisted development, the cost of creating such a tool has dropped dramatically — a custom linter rule or audit script is now minutes of work, not days. The tool pays for itself after the first reuse.&lt;/p&gt;

&lt;p&gt;Parsimony motivates this from the economic side: every check offloaded to a tool is a check that does not consume token budget. Correctness motivates it from the engineering side: deterministic tools produce repeatable, verifiable, auditable results.&lt;/p&gt;

&lt;p&gt;In practice, tools and LLMs alternate in a pipeline:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Tool detects candidates&lt;/strong&gt; — an audit script finds potential violations, a linter flags patterns, a grep finds orphans&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM triages candidates&lt;/strong&gt; — false positive vs. real violation vs. script bug. Judgment applied to prepared candidates with assembled context&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool validates result&lt;/strong&gt; — after the LLM fix, the tool re-checks. Pass/fail is factual, not opinion&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The LLM remains irreplaceable for scoping (what is affected?), generation (translate requirement to code), and feedback analysis (what was missing? what was overlooked?). Everything else — deterministic tool first, LLM only when the tool cannot resolve.&lt;/p&gt;

&lt;h2&gt;
  
  
  Parsimony Applied to Artifacts
&lt;/h2&gt;

&lt;p&gt;The principle is not abstract — it maps to concrete artifact types:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Artifact&lt;/th&gt;
&lt;th&gt;Parsimonious form&lt;/th&gt;
&lt;th&gt;What it replaces&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Agent instructions (CLAUDE.md, cursor rules)&lt;/td&gt;
&lt;td&gt;Generated from authoritative specs — minimum viable context per repo&lt;/td&gt;
&lt;td&gt;Hand-written monolithic instruction files that duplicate specs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Architecture specs&lt;/td&gt;
&lt;td&gt;Compressed YAML: requirements, prohibitions, templates. No rationale beyond a link to context&lt;/td&gt;
&lt;td&gt;Prose documents mixing rationale, requirements, and examples&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Commit messages&lt;/td&gt;
&lt;td&gt;&lt;code&gt;feat(auth): implement login [FR-AUTH-001]&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Multi-paragraph commit descriptions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Inter-agent context&lt;/td&gt;
&lt;td&gt;Reference to a plan file on disk&lt;/td&gt;
&lt;td&gt;Full context copy-pasted into prompts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prompt instructions&lt;/td&gt;
&lt;td&gt;Directive vocabulary, imperative mood&lt;/td&gt;
&lt;td&gt;Conversational, hedging language&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Parsimony and Other Principles
&lt;/h2&gt;

&lt;p&gt;Several established principles come close. KISS advocates simplicity but says nothing about distributing a fixed resource budget. YAGNI warns against adding what you don't need, but targets features in code, not tokens in a context window. DRY eliminates duplication, but a non-duplicated instruction can still be wastefully verbose. The Principle of Least Privilege shares the same structure — minimum necessary access — but optimizes for security, not output quality. Signal-to-noise ratio captures the mechanism (context rot is literally falling SNR), but it is a metric, not a design rule. Occam's Razor is the philosophical ancestor of all of these, yet it concerns explanations, not engineering systems where every token has a measurable cost.&lt;/p&gt;

&lt;p&gt;None of them address the specific problem: how to distribute a finite token budget between instructions and artifacts to maximize the quality of what an LLM produces.&lt;/p&gt;

&lt;p&gt;Parsimony does not replace these principles — it complements them. In a well-designed development workflow, parsimony works alongside at least three other concerns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;DRY&lt;/strong&gt; (Don't Repeat Yourself) ensures every fact has one authoritative source. Parsimony ensures that only the relevant facts enter the context. Together they prevent both duplication and bloat.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Traceability&lt;/strong&gt; ensures every change links to a requirement. Parsimony ensures those links are expressed in minimal form — an ID reference, not a paragraph of rationale.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deterministic Enforcement&lt;/strong&gt; ensures checks are automated. Parsimony motivates moving checks from token-consuming LLM calls to zero-cost deterministic tools.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The distinction is clearest between parsimony and DRY. Parsimony answers the question "what to put in context?" — it operates within a single context window. DRY answers the question "where to store a fact?" — it operates across the entire system: code, configs, documentation, deployment.&lt;/p&gt;

&lt;p&gt;Consider a concrete example. An error-handling rule is defined in an architecture spec. If the same rule is copied into every agent instruction file across seven services, each copy is minimal and useful — parsimony is not violated in any single context. But the rule now lives in eight places. When it changes in the spec, seven copies remain stale. An agent reads the outdated instruction and generates code by the old rule. DRY catches this; parsimony does not — because parsimony cannot see across context boundaries.&lt;/p&gt;

&lt;p&gt;DRY without parsimony still leaves verbose, non-duplicated instructions. Parsimony without DRY leaves facts that look correct in each context but quietly go stale across the system. The combination eliminates both failure modes.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why I Needed a Name
&lt;/h2&gt;

&lt;p&gt;When I talk to colleagues and developers about treating context as a scarce resource, I run into the same problem every time: there is no single term for what I mean.&lt;/p&gt;

&lt;p&gt;The closest candidate is "token efficiency." Two things convinced me it does not fit.&lt;/p&gt;

&lt;p&gt;First, "token efficiency" reads as "save tokens," an optimization goal focused on cost. Parsimony is not about saving tokens. It is about distributing them. A parsimonious context may use the entire window, but every token is there by conscious decision, not by default.&lt;/p&gt;

&lt;p&gt;Second, "token efficiency" is a metric: it measures how well tokens were spent after the fact. I needed a prescriptive design rule that guides decisions &lt;em&gt;before&lt;/em&gt; tokens are spent. A principle that tells you to allocate budget where it has the most impact on the current task, compress everything else to the minimum that preserves unambiguous interpretation, and treat every token not as a cost to minimize but as a resource to allocate.&lt;/p&gt;

&lt;p&gt;Consider the difference in practice. "Our token efficiency improved by 30%" describes a measurement. "This context violates parsimony, the architecture spec is loaded in full when only one component is affected" describes a design decision that can be reviewed, debated, and enforced. The first is a dashboard number. The second is an engineering conversation. Metrics describe outcomes. Principles prescribe behavior.&lt;/p&gt;

&lt;p&gt;Other terms fare no better. "Prompt optimization" is too broad, it could mean anything from rephrasing a question to building an entire RAG pipeline. "Context management" describes a process, not a principle. "Be clear and concise" is good advice, but it is a recommendation, not an engineering constraint you can reason about or test against.&lt;/p&gt;

&lt;p&gt;The problems that parsimony addresses are well-documented: context rot from verbose prompts, hallucinations from vague ones, token costs scaling with irrelevant context. Existing prompt engineering practices offer partial solutions. But these remain separate recommendations without a unifying principle that connects token economy, formulation clarity, and conscious budget distribution into a single testable criterion.&lt;/p&gt;

&lt;h2&gt;
  
  
  Open Questions
&lt;/h2&gt;

&lt;p&gt;This article defines the principle and shows how it applies in practice. Two areas remain open:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Metrics.&lt;/strong&gt; Parsimony is a design rule, not a metric. But without measurement, "this context violates parsimony" remains a judgment call. Possible directions: ratio of instruction tokens to total context, task success rate vs. context size (diminishing returns curve), survival rate of rules after N turns of context compression. These are hypotheses, not validated instruments.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tool-specific implementation.&lt;/strong&gt; How parsimony maps to concrete tools — Claude Code, Cursor, Windsurf, custom agent frameworks — is a practical guide that this article does not attempt. Each tool has its own context assembly mechanism, and the principle applies differently depending on what the engineer can control.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;I named this principle for my own projects and will continue using it. If you have felt the same gap in terminology, it is yours to use.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Feature image by &lt;a href="https://pinterest.com/johannerogn/" rel="noopener noreferrer"&gt;Johanne Marie Rogn&lt;/a&gt;, via &lt;a href="https://pinterest.com/pin/303500462358460079/" rel="noopener noreferrer"&gt;Pinterest&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published: &lt;a href="https://blog.rezvov.com/principle-of-parsimony-in-context-engineering" rel="noopener noreferrer"&gt;Principle of Parsimony in Context Engineering — Alex Rezvov's Blog&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aidrivendevelopment</category>
      <category>contextengineering</category>
      <category>promptengineering</category>
      <category>parsimony</category>
    </item>
    <item>
      <title>Five Weeks with a Next.js Blog: What Got Built</title>
      <dc:creator>Alex Rezvov</dc:creator>
      <pubDate>Fri, 20 Mar 2026 09:55:27 +0000</pubDate>
      <link>https://dev.to/arezvov/five-weeks-with-a-nextjs-blog-what-got-built-aoc</link>
      <guid>https://dev.to/arezvov/five-weeks-with-a-nextjs-blog-what-got-built-aoc</guid>
      <description>&lt;p&gt;On February 15 I published &lt;a href="https://blog.rezvov.com/migrating-from-ghost-to-nextjs-with-claude-and-cursor" rel="noopener noreferrer"&gt;Migrating from Ghost to Next.js: A Journey with Claude and Cursor&lt;/a&gt;. That post described the initial migration: 26 posts moved from Ghost, a CI/CD pipeline set up, newsletter subscribers ported over.&lt;/p&gt;

&lt;p&gt;Since then I kept adding things. Wrote thirteen more posts, wired up cross-posting, added SEO and a bunch of smaller stuff. Figured it's a good time to write down what the blog can do now.&lt;/p&gt;

&lt;p&gt;Most of this got built in a pretty relaxed mode. I'd have a free fifteen minutes between work calls, open Cursor, describe what I want to Claude, and switch back to whatever I was doing. Come back later, review the result, maybe adjust, move on. No dedicated "blog development time" in the calendar. Just small slots here and there, accumulated over five weeks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Traditional SEO
&lt;/h2&gt;

&lt;p&gt;Structured data was one of the first things I asked Claude to add. Every post now generates JSON-LD with &lt;code&gt;BlogPosting&lt;/code&gt; schema: headline, author, datePublished, image, keywords. The homepage gets a &lt;code&gt;WebSite&lt;/code&gt; schema with &lt;code&gt;SearchAction&lt;/code&gt; for sitelinks.&lt;/p&gt;

&lt;p&gt;Open Graph tags, Twitter Cards, canonical URLs, per-post descriptions are all pulled from frontmatter automatically. One place to edit, everything stays in sync.&lt;/p&gt;

&lt;p&gt;There's a dynamic XML sitemap at &lt;code&gt;/sitemap.xml&lt;/code&gt; and a &lt;code&gt;robots.txt&lt;/code&gt;. On every deploy, IndexNow pings Bing, Yandex, Naver, and Seznam, and a separate script submits the sitemap to Google Search Console. I don't wait for crawlers to find new posts on their own.&lt;/p&gt;

&lt;p&gt;Google Analytics 4 is there too, with custom events tracking visits to &lt;code&gt;llms.txt&lt;/code&gt; and &lt;code&gt;content-index.md&lt;/code&gt;. Spoiler: five weeks in, not a single LLM crawler has shown up in the logs. The endpoints work, I've tested them manually. The bots just aren't coming. Yet.&lt;/p&gt;

&lt;h2&gt;
  
  
  LLM SEO
&lt;/h2&gt;

&lt;p&gt;This one I find more interesting than traditional SEO, honestly. I wrote about it separately in &lt;a href="https://blog.rezvov.com/making-your-blog-llm-friendly-implementing-llms-txt" rel="noopener noreferrer"&gt;Making Your Blog LLM-Friendly: Implementing llms.txt&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The short version: LLMs crawl the web, and if your content isn't structured for them, they'll miss it or get it wrong.&lt;/p&gt;

&lt;p&gt;So the blog has &lt;a href="https://blog.rezvov.com/llms.txt" rel="noopener noreferrer"&gt;https://blog.rezvov.com/llms.txt&lt;/a&gt;, a machine-readable index that points to every post as a &lt;code&gt;.md&lt;/code&gt; URL. Each post is available as raw Markdown at &lt;code&gt;/{slug}.md&lt;/code&gt; with metadata headers. No HTML to parse, no JS to execute.&lt;/p&gt;

&lt;p&gt;The Markdown endpoints are open to LLM crawlers but excluded from search engine indexing via &lt;code&gt;robots.txt&lt;/code&gt;, so there are no duplicate content issues with the HTML versions.&lt;/p&gt;

&lt;h2&gt;
  
  
  RSS
&lt;/h2&gt;

&lt;p&gt;Full RSS 2.0 at &lt;a href="https://blog.rezvov.com/rss.xml" rel="noopener noreferrer"&gt;https://blog.rezvov.com/rss.xml&lt;/a&gt; with complete article HTML in &lt;code&gt;content:encoded&lt;/code&gt;. No excerpts, no "click to read more". The feed also doubles as a data source for Dev.to's RSS import.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cross-Posting
&lt;/h2&gt;

&lt;p&gt;Posts now go to three places: the blog itself, Dev.to, and Hashnode. All with canonical URLs back to the original.&lt;/p&gt;

&lt;p&gt;Both integrations follow the same pattern: a publishing script with tag mapping (Dev.to has 14+ mapped tags, Hashnode 15+), canonical URL preservation, and a YAML plan file that tracks what's been published where.&lt;/p&gt;

&lt;p&gt;On deploy, GitHub Actions runs both scripts automatically. If Dev.to or Hashnode is down, the deploy still goes through. Failures get logged, nothing breaks.&lt;/p&gt;

&lt;p&gt;This was a classic "free fifteen minutes" task. I described what I wanted, Claude wrote the Dev.to script, I tested it, moved on. Hashnode came a couple weeks later in the same way.&lt;/p&gt;

&lt;h2&gt;
  
  
  Newsletter
&lt;/h2&gt;

&lt;p&gt;The newsletter existed from the migration, but it got better over time.&lt;/p&gt;

&lt;p&gt;It runs on Mailgun Messages API with subscriber data in JSON files. Two email templates: welcome and post notification, both in Markdown with frontmatter.&lt;/p&gt;

&lt;p&gt;The trickiest part was email rendering. There are two paths now: web (responsive YouTube iframes, CSS classes) and email (YouTube thumbnails, everything inlined). Code blocks needed syntax highlighting converted to inline styles because email clients ignore &lt;code&gt;&amp;lt;style&amp;gt;&lt;/code&gt; tags. That one took more than fifteen minutes.&lt;/p&gt;

&lt;p&gt;On deploy, a script checks &lt;code&gt;newsletter-state.json&lt;/code&gt; for unsent posts and sends them. Same post never goes out twice. There's a proper unsubscribe flow with one-click links and soft deletes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Search
&lt;/h2&gt;

&lt;p&gt;Client-side search with Fuse.js at &lt;a href="https://blog.rezvov.com/search" rel="noopener noreferrer"&gt;/search&lt;/a&gt;. Searches titles, content, and tags with a 300ms debounce. For 39 posts it works fine. If I ever get to hundreds, I'll think about something server-side.&lt;/p&gt;

&lt;h2&gt;
  
  
  Comments
&lt;/h2&gt;

&lt;p&gt;Giscus, backed by GitHub Discussions. No database, no moderation tools to maintain. Readers need a GitHub account, which is fine for a technical blog.&lt;/p&gt;

&lt;h2&gt;
  
  
  Content Tooling
&lt;/h2&gt;

&lt;p&gt;A few things that make the writing workflow smoother.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;pnpm lint:posts&lt;/code&gt; checks frontmatter, code block language specs, link integrity, image paths, formatting. I run it before commits. It catches things I'd otherwise miss, like a code block without a language tag or a broken image path.&lt;/p&gt;

&lt;p&gt;Feature images must be 896x384 WebP. &lt;code&gt;pnpm resize:images:check&lt;/code&gt; verifies, &lt;code&gt;pnpm resize:images&lt;/code&gt; resizes. Small thing, but it removes one more manual step.&lt;/p&gt;

&lt;p&gt;Posts start as drafts (&lt;code&gt;draft: true&lt;/code&gt; in frontmatter), hidden from the site, RSS, newsletter, and cross-posting until I remove the flag.&lt;/p&gt;

&lt;h2&gt;
  
  
  CI/CD
&lt;/h2&gt;

&lt;p&gt;One GitHub Actions workflow, triggered on push to main:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Build with pnpm&lt;/li&gt;
&lt;li&gt;Deploy to VPS via SSH&lt;/li&gt;
&lt;li&gt;Backup previous build&lt;/li&gt;
&lt;li&gt;Install dependencies, copy static files to Nginx&lt;/li&gt;
&lt;li&gt;Restart PM2, health check&lt;/li&gt;
&lt;li&gt;Send newsletter for new posts&lt;/li&gt;
&lt;li&gt;Cross-post to Dev.to&lt;/li&gt;
&lt;li&gt;Cross-post to Hashnode&lt;/li&gt;
&lt;li&gt;IndexNow (Bing, Yandex, Naver, Seznam)&lt;/li&gt;
&lt;li&gt;Google Search Console notification&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Steps 6 through 10 are non-blocking. Each runs on its own, failures don't roll back the deploy. I merge to main and go do something else. By the time I check back, the post is live, emailed, cross-posted, and indexed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Numbers
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;13 new posts since migration (39 total)&lt;/li&gt;
&lt;li&gt;179 commits&lt;/li&gt;
&lt;li&gt;3 platforms (blog + Dev.to + Hashnode)&lt;/li&gt;
&lt;li&gt;5 search engines notified on every deploy&lt;/li&gt;
&lt;li&gt;TTFB went from 800ms on Ghost to 120ms&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I Haven't Done
&lt;/h2&gt;

&lt;p&gt;GA4 collects data but I don't have any custom dashboard tying post performance to cross-posting or newsletter numbers. I just check GA4 when I remember.&lt;/p&gt;

&lt;p&gt;No A/B testing for titles. Cross-posting to multiple platforms gives me natural variation, but I'm not tracking it.&lt;/p&gt;

&lt;p&gt;Feature images are still manual. Each one is a prompt, a generation, a resize. Takes about ten minutes per post. Not terrible, but it would be nice to automate.&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Works Day to Day
&lt;/h2&gt;

&lt;p&gt;The whole setup fits into how I actually work. I don't block out time for the blog. I have a gap between meetings, I open Cursor, tell Claude what to add or fix, and switch to something else. Next gap, I come back, review, maybe adjust the result.&lt;/p&gt;

&lt;p&gt;Most features here were built exactly like that. The Hashnode integration, the IndexNow notifications, the linter improvements. None of them required a focused multi-hour session. Just a task described in plain language, handed off, reviewed later.&lt;/p&gt;

&lt;p&gt;Thirteen posts in five weeks happened the same way. Writing them takes real attention though. I write everything myself; LLMs help me proofread, discuss ideas, and gather facts, but the thinking and the arguments are mine. Turns out, that process of writing down what I think forces me to organize my own ideas. That alone makes it worth doing, even aside from the blog itself.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published: &lt;a href="https://blog.rezvov.com/five-weeks-with-nextjs-blog-what-got-built" rel="noopener noreferrer"&gt;Five Weeks with a Next.js Blog: What Got Built — Alex Rezvov's Blog&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aidrivendevelopment</category>
      <category>nextjs</category>
      <category>claudecode</category>
      <category>cursor</category>
    </item>
    <item>
      <title>Context Engineering with ExoChat: Parsimony in Action</title>
      <dc:creator>Alex Rezvov</dc:creator>
      <pubDate>Fri, 20 Mar 2026 05:50:03 +0000</pubDate>
      <link>https://dev.to/arezvov/context-engineering-with-exochat-parsimony-in-action-174f</link>
      <guid>https://dev.to/arezvov/context-engineering-with-exochat-parsimony-in-action-174f</guid>
      <description>&lt;p&gt;I wrote about the &lt;a href="https://blog.rezvov.com/principle-of-parsimony-in-context-engineering" rel="noopener noreferrer"&gt;principle of parsimony in context engineering&lt;/a&gt; a week ago. That article defined parsimony for developer workflows: specs, cursor rules, agent instructions. The core idea: &lt;strong&gt;a context is parsimonious when nothing in it can be removed without introducing ambiguity or degrading the result.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That definition holds. But it was scoped to one audience: developers building with AI tools.&lt;/p&gt;

&lt;p&gt;The same problem (what enters the context window and at what detail level) exists in every LLM-based product that talks to users. Customer support bots. Financial advisors. Mental health assistants. Pre-sales qualification flows. Any system where an LLM conducts a multi-turn conversation with a person.&lt;/p&gt;

&lt;p&gt;Context engineering in dialogue systems is the same discipline as in development: you manage what the model sees at each step. The principle of parsimony tells you &lt;em&gt;how&lt;/em&gt;: only what's needed, nothing else. The missing piece is a tool that lets you practice context engineering at scale, without writing code for every change.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://exo-chat.com" rel="noopener noreferrer"&gt;ExoChat&lt;/a&gt; is that tool.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem: Context Bloat in Dialogue Products
&lt;/h2&gt;

&lt;p&gt;The typical approach to building an LLM dialogue product looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Write a system prompt. Put everything in it: persona, rules, examples, edge cases, compliance disclosures, escalation instructions.&lt;/li&gt;
&lt;li&gt;Append conversation history.&lt;/li&gt;
&lt;li&gt;Send to the model. Hope it follows the rules.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This works for demos. It fails in production.&lt;/p&gt;

&lt;p&gt;As the conversation grows, instructions compete with history for the token budget. The model's attention is finite. Instructions that were at the top of the prompt get pushed further from the model's effective focus. The result:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Rule drift.&lt;/strong&gt; The model starts ignoring instructions it followed perfectly three turns ago.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hallucinations.&lt;/strong&gt; Lost context triggers confabulation to fill the gaps.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inconsistent behavior.&lt;/strong&gt; Same question, different answers depending on conversation length.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Escalating costs.&lt;/strong&gt; Longer contexts burn more tokens, and the output quality doesn't improve.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The longer the conversation, the worse it gets. This isn't a model quality issue. It's a failure to engineer the context: to control what the model sees at each turn of the dialogue.&lt;/p&gt;




&lt;h2&gt;
  
  
  How ExoChat Does Context Engineering
&lt;/h2&gt;

&lt;p&gt;ExoChat is a context engineering tool for dialogue systems. Its core design principle follows parsimony directly: &lt;strong&gt;at each state of the conversation, assemble only the context relevant to that state.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not a monolithic system prompt. Not "everything we might need." The minimum viable context for the current conversational step. Assembled automatically, controlled visually.&lt;/p&gt;

&lt;p&gt;Six mechanisms make this work:&lt;/p&gt;

&lt;h3&gt;
  
  
  State Graph (FSM)
&lt;/h3&gt;

&lt;p&gt;The conversation is modeled as a finite state machine. Each state has an explicit goal, its own rules, and defined transitions to other states. The state graph is the primary context engineering structure. It determines what the model needs to know &lt;em&gt;right now&lt;/em&gt;, not what it might need later.&lt;/p&gt;

&lt;p&gt;An ExoChat FSM for a financial advisor might have states like: &lt;code&gt;greeting → risk_profiling → product_recommendation → disclosure → confirmation&lt;/code&gt;. At the &lt;code&gt;risk_profiling&lt;/code&gt; state, the model doesn't need disclosure text. At &lt;code&gt;disclosure&lt;/code&gt;, it doesn't need the profiling questionnaire. Each state scopes its own context.&lt;/p&gt;

&lt;h3&gt;
  
  
  Managed Prompts per State
&lt;/h3&gt;

&lt;p&gt;Each state assembles its prompt independently:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;State-specific instructions (what to do, what to avoid)&lt;/li&gt;
&lt;li&gt;Relevant facts collected so far (not raw conversation history)&lt;/li&gt;
&lt;li&gt;Minimal history (summarized or filtered, not the full transcript)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The prompt controller builds context from these components. The model never sees the full system prompt. Only what applies to the current state.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fact Storage
&lt;/h3&gt;

&lt;p&gt;When the user provides validated information (name, consent, risk tolerance, symptoms), ExoChat extracts and stores these as structured facts. Facts are not kept in raw conversation history where they'd consume tokens every turn. They're stored separately and injected only when a state needs them.&lt;/p&gt;

&lt;p&gt;This is parsimony at the data level: the model gets &lt;code&gt;risk_tolerance: moderate&lt;/code&gt; instead of re-reading the five-turn exchange where the user explained their preferences.&lt;/p&gt;

&lt;h3&gt;
  
  
  Model Routing
&lt;/h3&gt;

&lt;p&gt;Not every step needs the same model. Classification ("did the user agree?") is a cheap operation; a small, fast model handles it. Complex generation ("explain this investment product in the user's terms") needs a capable model with richer context.&lt;/p&gt;

&lt;p&gt;ExoChat routes requests by task type: classification, generation, verification, summarization. Each route gets a context budget appropriate to the task. A yes/no classifier doesn't need the full conversation history.&lt;/p&gt;

&lt;h3&gt;
  
  
  Context Assembly from Policies
&lt;/h3&gt;

&lt;p&gt;Compliance rules, disclosure requirements, escalation triggers. These are domain policies. ExoChat injects them only in states where they apply. A disclosure policy enters the context at the &lt;code&gt;disclosure&lt;/code&gt; state, not at &lt;code&gt;greeting&lt;/code&gt;. An escalation trigger for high-risk topics is active in relevant states, not everywhere.&lt;/p&gt;

&lt;h3&gt;
  
  
  Transition Validators
&lt;/h3&gt;

&lt;p&gt;Before the conversation moves from one state to another, validators check exit conditions. Did the user provide the required information? Did they confirm consent? Validators prevent premature transitions that would lose context or skip required steps.&lt;/p&gt;




&lt;h2&gt;
  
  
  Comparison: Context Engineering Approaches for Dialogue Systems
&lt;/h2&gt;

&lt;p&gt;Every LLM dialogue system deals with context engineering, consciously or not. The approaches differ in how much control they offer over what enters the context, who controls it, and how well they follow the principle of parsimony.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;Context Engineering Method&lt;/th&gt;
&lt;th&gt;Who Controls&lt;/th&gt;
&lt;th&gt;Parsimony&lt;/th&gt;
&lt;th&gt;Iteration Speed&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Raw Prompt Engineering&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Monolithic system prompt&lt;/td&gt;
&lt;td&gt;Developer&lt;/td&gt;
&lt;td&gt;Low: everything upfront&lt;/td&gt;
&lt;td&gt;Slow: code deploy per change&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RAG&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Chunk retrieval by similarity&lt;/td&gt;
&lt;td&gt;Developer + embeddings&lt;/td&gt;
&lt;td&gt;Medium: relevant chunks, but no conversation state awareness&lt;/td&gt;
&lt;td&gt;Medium: update knowledge base&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Agent Frameworks&lt;/strong&gt; (LangChain, CrewAI)&lt;/td&gt;
&lt;td&gt;Code-defined tool chains&lt;/td&gt;
&lt;td&gt;Developer&lt;/td&gt;
&lt;td&gt;Medium: tools scope context, but flow is code&lt;/td&gt;
&lt;td&gt;Slow: code changes for flow&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Visual Bot Builders&lt;/strong&gt; (Voiceflow, Botpress)&lt;/td&gt;
&lt;td&gt;Decision trees, intents&lt;/td&gt;
&lt;td&gt;Designer&lt;/td&gt;
&lt;td&gt;Medium: structured, but intent-based&lt;/td&gt;
&lt;td&gt;Fast: visual editor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ExoChat (FSM + M2P)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;State graph with per-state context assembly&lt;/td&gt;
&lt;td&gt;Operator (no-code)&lt;/td&gt;
&lt;td&gt;High: minimum viable context per state&lt;/td&gt;
&lt;td&gt;Fast: visual editor, no deploy&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The key differences:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;RAG&lt;/strong&gt; answers "what knowledge to include" but not "what instructions and rules apply right now." Retrieval is stateless. It doesn't know where in the conversation you are. Context engineering without conversation state is incomplete.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent frameworks&lt;/strong&gt; are developer tools. Every flow change (a new state, a different transition condition, a modified prompt) requires code. Context engineering is possible, but gated by developer availability. Parsimony degrades because nobody tunes it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Visual bot builders&lt;/strong&gt; (Voiceflow, Botpress) share the no-code philosophy. Operators can edit flows visually. They also allow different prompts per node. But the context engineering model is different in three ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Routing&lt;/strong&gt;: bot builders route by user intent (what did the user say?). ExoChat routes by state graph with validated transitions (what should the system do next?). Intent-based routing is reactive: it responds to user input. State-driven routing is proactive: the system leads the conversation toward a goal.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context assembly&lt;/strong&gt;: bot builders typically pass the full conversation history plus the node's prompt to the model. ExoChat assembles context per state from structured facts, domain policies, and filtered history, not the raw transcript. This is where parsimony is enforced: each state gets only what it needs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data model&lt;/strong&gt;: bot builders work with conversation memory (full or summarized). ExoChat extracts validated facts (&lt;code&gt;risk_tolerance: moderate&lt;/code&gt;, &lt;code&gt;consent: true&lt;/code&gt;) and injects them selectively. Facts don't consume tokens sitting in raw history. They're available when a state requests them.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;ExoChat&lt;/strong&gt; is a context engineering tool that combines these three mechanisms with no-code editing and M2P conversation control. State-driven routing, per-state context assembly, structured fact storage. The operator manages what context enters each state. Parsimony is the default, not an afterthought.&lt;/p&gt;




&lt;h2&gt;
  
  
  Operator-First: Context Engineering Without Developers
&lt;/h2&gt;

&lt;p&gt;Context engineering following the principle of parsimony requires constant tuning. A context that's parsimonious today becomes bloated tomorrow when you add a new product line, a compliance rule, or an edge case handler.&lt;/p&gt;

&lt;p&gt;If every tuning cycle requires a developer to change code, review, test, and deploy, context quality degrades. The cost of maintaining parsimony exceeds the perceived benefit. The system prompt grows. Context bloat returns.&lt;/p&gt;

&lt;p&gt;ExoChat is designed so that the &lt;strong&gt;operator&lt;/strong&gt; (product manager, analyst, domain expert) does context engineering directly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Visual state graph editor. See the full conversation flow, edit per-state prompts, adjust transition conditions.&lt;/li&gt;
&lt;li&gt;Version control. Ship scenario versions with A/B testing and feature flags.&lt;/li&gt;
&lt;li&gt;ExoChat Quality Lab. Run synthetic users through the scenario, evaluate quality per persona, catch regressions before production.&lt;/li&gt;
&lt;li&gt;No deploy cycle. Changes go live when the operator publishes the version.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The operator does context engineering directly. They add context to a state where the model hallucinates. They remove context from a state where it's not needed. They test with the Quality Lab. They ship. No developer in the loop. Parsimony is maintained because the person closest to the domain manages the context.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;ExoChat Quality Lab&lt;/strong&gt; deserves its own article, and it will get one. In short: LLM dialogues operate in fuzzy logic territory where you can't write a unit test that says "response must equal X." Quality Lab solves this by simulating dozens of conversations in parallel — each with a different user persona and task, then automatically scoring every dialogue across a set of criteria. The result is a quality map of your ExoChat scenario: which personas get great service, which hit dead ends, and where context engineering needs attention. More on this soon.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is the difference between context engineering as an abstract discipline and context engineering as daily practice.&lt;/p&gt;




&lt;h2&gt;
  
  
  Context Engineering Across the Stack
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://blog.rezvov.com/prompt-engineering-vs-context-engineering" rel="noopener noreferrer"&gt;prompt engineering vs context engineering&lt;/a&gt; article described three levels of AI system complexity. Context engineering with parsimony applies at every level, but the tools and the people change.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Context Engineering Tool&lt;/th&gt;
&lt;th&gt;Who Does It&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Developer workflow&lt;/td&gt;
&lt;td&gt;CLAUDE.md, cursor rules, specs&lt;/td&gt;
&lt;td&gt;Developer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent orchestration&lt;/td&gt;
&lt;td&gt;Inter-agent context references, tool-first approach&lt;/td&gt;
&lt;td&gt;Developer / Architect&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;User-facing dialogue&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;ExoChat FSM, per-state context assembly&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Operator&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The principle of parsimony is the same everywhere: &lt;strong&gt;minimum viable context per task.&lt;/strong&gt; The difference is who practices context engineering and with what tools.&lt;/p&gt;

&lt;p&gt;At the developer layer, context engineering means writing compressed specs and directive rules. At the agent layer, it means passing references instead of full context between agents. At the dialogue layer, it means assembling per-state prompts from facts, policies, and scoped instructions. ExoChat is the tool that makes this possible without code.&lt;/p&gt;




&lt;h2&gt;
  
  
  From Principle to Product
&lt;/h2&gt;

&lt;p&gt;Context engineering is the discipline. Parsimony is the principle that guides it: remove everything from context that doesn't contribute to the result. This applies to developer workflows, agent orchestration, and user-facing dialogue systems alike.&lt;/p&gt;

&lt;p&gt;ExoChat is what happens when you build a context engineering tool around the principle of parsimony. State graphs scope context per conversation step. Managed prompts assemble minimum viable context. Fact storage replaces raw history. Model routing matches context budgets to task complexity. The operator, not the developer, practices context engineering daily. The system stays parsimonious because the person tuning it understands the domain.&lt;/p&gt;

&lt;p&gt;The context window is finite. What you put in it determines what comes out. ExoChat is the tool that manages what gets in, following the principle of parsimony, in the hands of the people who know the domain best.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published: &lt;a href="https://blog.rezvov.com/context-engineering-with-exochat-parsimony-in-action" rel="noopener noreferrer"&gt;Context Engineering with ExoChat: Parsimony in Action — Alex Rezvov's Blog&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>exochat</category>
      <category>contextengineering</category>
      <category>parsimony</category>
    </item>
    <item>
      <title>Specification-Driven Development: The Four Pillars</title>
      <dc:creator>Alex Rezvov</dc:creator>
      <pubDate>Thu, 19 Mar 2026 06:09:30 +0000</pubDate>
      <link>https://dev.to/arezvov/specification-driven-development-the-four-pillars-3ke5</link>
      <guid>https://dev.to/arezvov/specification-driven-development-the-four-pillars-3ke5</guid>
      <description>&lt;p&gt;This article serves two purposes.&lt;/p&gt;

&lt;p&gt;First, it is the reference material for all &lt;a href="https://foreachpartners.com/" rel="noopener noreferrer"&gt;ForEach Partners&lt;/a&gt; internship test assignments. If you are applying for an internship at &lt;a href="https://jl.foreachpartners.com/" rel="noopener noreferrer"&gt;jl.foreachpartners.com&lt;/a&gt;, read this before starting your test task. Your submission will be evaluated against these principles.&lt;/p&gt;

&lt;p&gt;Second, it is a standalone document. If you are building your own development workflow and want a systematic approach to specifications, traceability, and AI-assisted enforcement, this framework applies regardless of stack, team size, or project type.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is Specification-Driven Development
&lt;/h2&gt;

&lt;p&gt;Specification-Driven Development (SDD) is an approach where the specification precedes implementation. You formulate a requirement first. Then you derive a contract from it: an API definition, a schema, an interface. Only then do you write code.&lt;/p&gt;

&lt;p&gt;AI tools (Claude, Cursor, Windsurf) accelerate every stage but do not replace any of them. An AI can generate code faster than you type it, but if that code has no traceable requirement, no single source of truth, and no automated validation, you have speed without control. SDD provides the control. If you are wondering where prompt engineering ends and context engineering begins, I covered that distinction in &lt;a href="https://blog.rezvov.com/prompt-engineering-vs-context-engineering" rel="noopener noreferrer"&gt;When Prompt Engineering Stops Being Enough&lt;/a&gt;. SDD operates at the context engineering level.&lt;/p&gt;

&lt;p&gt;The entire methodology rests on four non-negotiable principles. We call them pillars.&lt;/p&gt;

&lt;p&gt;In practice, our workflows are more complex than what this article describes, and the details vary from project to project: different tools, different specification formats, different levels of ceremony. But the four pillars remain constant everywhere. What matters is understanding their purpose and being able to apply them in practice, not memorizing the specific examples from this document.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pillar 1: Traceability
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Every behavioral change must trace to a requirement.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Traceability is a bidirectional link between a requirement and its implementation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Top-down&lt;/strong&gt;: requirement, specification, code, test&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bottom-up&lt;/strong&gt;: test, code, specification, requirement (no orphans)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How It Works
&lt;/h3&gt;

&lt;p&gt;Every requirement gets a unique identifier. Code, tests, and commits reference that identifier through annotations. If code changes system behavior and contains no reference to a requirement, that is a violation.&lt;/p&gt;

&lt;p&gt;A concrete example. Suppose a requirement says: "Access tokens expire after a configurable interval." That requirement gets an ID, say &lt;code&gt;FR-AUTH-002&lt;/code&gt;. The API contract references it. The handler implementation references it. The test scenario references it. The commit message references it. If you search the codebase for &lt;code&gt;FR-AUTH-002&lt;/code&gt;, you find every artifact that implements or validates that requirement. If you search for code that changes token behavior and find no requirement reference, you have found a traceability gap.&lt;/p&gt;

&lt;p&gt;The same works in reverse. If &lt;code&gt;FR-AUTH-002&lt;/code&gt; exists but no code, no test, and no commit references it, the requirement is unimplemented. You know exactly what is missing.&lt;/p&gt;

&lt;h3&gt;
  
  
  What This Gets You
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Any reviewer sees WHY this code exists, not just what it does&lt;/li&gt;
&lt;li&gt;Tests are tied to requirements, not to implementation details&lt;/li&gt;
&lt;li&gt;Changing a requirement reveals its blast radius: every artifact that references the ID&lt;/li&gt;
&lt;li&gt;No dead code without a reason. No requirements without implementation&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Violation Test
&lt;/h3&gt;

&lt;p&gt;If a behavioral change has no reference to a requirement, traceability is broken.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pillar 2: DRY (Don't Repeat Yourself)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Every fact has exactly one authoritative source. Everything else references it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;DRY is not a new concept. But working with AI tools makes it critical in a way it has never been before.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why DRY Is Critical with AI
&lt;/h3&gt;

&lt;p&gt;AI tools generate code fast. &lt;a href="https://blog.rezvov.com/llms-changed-the-rules-git-sql-rust-for-everyone" rel="noopener noreferrer"&gt;LLMs lowered the barriers&lt;/a&gt; for tools that used to require deep expertise: Git, SQL, infrastructure scripting. More people write more code faster. Duplicated sources of truth diverge faster than anyone notices. AI agents are especially prone to copying context between files "for convenience." Each such copy becomes a future inconsistency.&lt;/p&gt;

&lt;p&gt;Consider a scenario. Your API contract lives in an OpenAPI YAML file. An AI agent, trying to be helpful, copies the endpoint descriptions into a README. Now you have two places describing the same endpoints. You update the YAML. The README is now wrong. Three weeks later, a new team member reads the README, builds a client against it, and files a bug because the actual API behaves differently.&lt;/p&gt;

&lt;p&gt;This happens constantly. The faster the code generation, the faster the divergence.&lt;/p&gt;

&lt;h3&gt;
  
  
  How It Works
&lt;/h3&gt;

&lt;p&gt;For every fact, you define a single source of truth:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;API contract&lt;/strong&gt; lives in the specification (protobuf, OpenAPI, JSON Schema). Not duplicated in documentation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Configuration semantics&lt;/strong&gt; described in one place. Not in the README, not in code comments, not in a wiki page&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Business requirement&lt;/strong&gt; captured once. Code and tests reference it, not quote it&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Violation Test
&lt;/h3&gt;

&lt;p&gt;If the same fact exists in two places, one of them is wrong. It is a matter of time.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pillar 3: Deterministic Enforcement
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Every check that can be a script must be a script. AI models fill the gaps where judgment is required.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Boundary Between Tools and AI
&lt;/h3&gt;

&lt;p&gt;The line is clear. If a check produces the same result every time given the same input, it belongs to a deterministic tool. If it requires interpretation, context, or judgment, it belongs to AI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Deterministic tools&lt;/strong&gt; (zero ambiguity, repeatable results):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Compilation, linting, formatting&lt;/li&gt;
&lt;li&gt;Schema validation (OpenAPI validator, proto lint, JSON Schema)&lt;/li&gt;
&lt;li&gt;Spec-to-code alignment checks&lt;/li&gt;
&lt;li&gt;Test execution&lt;/li&gt;
&lt;li&gt;Audit scripts (annotation coverage, naming conventions)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Hybrid approach&lt;/strong&gt; (script narrows the scope, AI makes the decision):&lt;/p&gt;

&lt;p&gt;Not every check is fully formalizable, and not every check requires full AI interpretation. There is a productive middle ground: a script finds suspicious spots through pattern matching, then AI analyzes the filtered results and makes a judgment call.&lt;/p&gt;

&lt;p&gt;Example: &lt;code&gt;grep&lt;/code&gt; finds all functions without &lt;code&gt;@req&lt;/code&gt; annotations. Most of them are utility functions that do not need annotations. An AI reviews the filtered list and identifies which ones genuinely violate traceability and which are legitimate utilities.&lt;/p&gt;

&lt;p&gt;Example: a script detects duplicate constants across files. AI evaluates whether each case is a real DRY violation or an acceptable coincidence.&lt;/p&gt;

&lt;p&gt;The hybrid approach applies when a fully deterministic check is impossible but manual review of the entire codebase is impractical. The script reduces the volume for analysis by orders of magnitude. AI provides the precision of the final decision.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI models&lt;/strong&gt; (require interpretation, context, judgment):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Code generation from requirements&lt;/li&gt;
&lt;li&gt;Test scenario selection and edge case identification&lt;/li&gt;
&lt;li&gt;Architecture decisions and trade-off analysis&lt;/li&gt;
&lt;li&gt;Semantic code review (logic errors beyond linter scope)&lt;/li&gt;
&lt;li&gt;Writing specifications from business needs&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Verification Pyramid
&lt;/h3&gt;

&lt;p&gt;Three levels, in order of preference:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Deterministic tool&lt;/strong&gt; if the check is fully formalizable&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Script + AI&lt;/strong&gt; if a script narrows the scope and AI makes the judgment&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pure AI&lt;/strong&gt; only if the check requires full context and interpretation&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Tool-first, LLM-for-gaps. Building a deterministic tool or even a regex-based filter is often a better investment than repeatedly invoking an AI agent to scan an entire codebase.&lt;/p&gt;

&lt;h3&gt;
  
  
  Violation Test
&lt;/h3&gt;

&lt;p&gt;If a check is fully formalizable, it must be a deterministic tool. If a deterministic tool can substantially narrow the scope of analysis, it must be created, even if the final decision is left to AI.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pillar 4: Parsimony
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Minimum representation that preserves full semantics and enforceability.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I wrote a dedicated article on this topic: &lt;a href="https://blog.rezvov.com/principle-of-parsimony-in-context-engineering" rel="noopener noreferrer"&gt;Principle of Parsimony in Context Engineering&lt;/a&gt;. Here is the essence as it applies to SDD.&lt;/p&gt;

&lt;h3&gt;
  
  
  Three Requirements
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Minimality.&lt;/strong&gt; Exclude everything that does not affect behavior: redundant explanations, repeated instructions, irrelevant fragments, history not needed for the current step&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sufficiency.&lt;/strong&gt; Compression must not introduce ambiguity. Context remains complete enough for unambiguous reconstruction of the goal, constraints, and result format&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Budget prioritization.&lt;/strong&gt; Tokens saved on operational description are reallocated to substantive artifacts (code, specifications, examples, data) which have the strongest empirical impact on response quality&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Parsimony Is Not Unreadability
&lt;/h3&gt;

&lt;p&gt;Text written with parsimony (imperative, no filler, compressed) reads easily by a competent specialist who knows the domain terminology and context. Parsimony removes noise, not meaning.&lt;/p&gt;

&lt;p&gt;Compare:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;When you are writing configuration files, it is important to remember that you should always validate the required fields at startup, because this helps catch errors early in the development process.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;versus:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Configuration must validate required fields at startup (fail-fast).&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Both say the same thing. The second version is 85% shorter. A developer reading it knows exactly what to do. No meaning is lost.&lt;/p&gt;

&lt;h3&gt;
  
  
  Operational Practices
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Directive vocabulary&lt;/strong&gt;: MUST / SHOULD / MAY / DO NOT. Not "you might want to consider"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Commit messages&lt;/strong&gt;: &lt;code&gt;feat(auth): implement login [FR-AUTH-001]&lt;/code&gt;. Not a paragraph of prose&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Specifications&lt;/strong&gt;: compressed YAML with requirements and prohibitions. Not a narrative&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Violation Test
&lt;/h3&gt;

&lt;p&gt;A context is parsimonious when no essential part can be removed without introducing ambiguity, and the total volume contains no redundant or irrelevant elements.&lt;/p&gt;




&lt;h2&gt;
  
  
  How the Pillars Work Together
&lt;/h2&gt;

&lt;p&gt;Every project decision derives from the four pillars:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Decision&lt;/th&gt;
&lt;th&gt;Derives from&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Specification before code (API-first)&lt;/td&gt;
&lt;td&gt;DRY (spec is the source of truth) + Deterministic Enforcement (compiler/validator enforces it)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Annotations in code (&lt;code&gt;@req&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Traceability (code-to-requirement link)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Automated checks in CI&lt;/td&gt;
&lt;td&gt;Deterministic Enforcement (everything verifiable is automated)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Compressed rules for AI agents&lt;/td&gt;
&lt;td&gt;Parsimony (minimum tokens, maximum signal)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Single contract format&lt;/td&gt;
&lt;td&gt;DRY (no markdown copies of APIs) + Traceability (contract traces to requirement)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tests linked to requirements&lt;/td&gt;
&lt;td&gt;Traceability + Deterministic Enforcement&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The pillars reinforce each other. Traceability without deterministic enforcement is manual bookkeeping that will drift. DRY without parsimony produces a single source of truth that is bloated and ignored. Parsimony without traceability produces compact artifacts that no one can connect to business needs. Deterministic tools without DRY validate against duplicated specs that may already be inconsistent.&lt;/p&gt;

&lt;p&gt;All four together create a system where specifications are living artifacts, not documentation theater.&lt;/p&gt;




&lt;h2&gt;
  
  
  Applying the Pillars in Practice
&lt;/h2&gt;

&lt;p&gt;If you are working on a project or a test assignment, here is what applying each pillar looks like concretely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Traceability.&lt;/strong&gt; Your code and tests must be traceable to the task requirements. Use annotations, structured commit messages, and clear references. If someone reads your commit history, they should be able to reconstruct which requirement each change addresses.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DRY.&lt;/strong&gt; Every fact is described once. Configuration values, type definitions, constants, API contracts: each has a single source. If you define a type in a spec file, your code derives from it or references it. It does not redefine it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Deterministic Enforcement.&lt;/strong&gt; Validation is automated. Linting, tests, schema validation, CI checks: if a machine can verify it, a machine should verify it. Where full automation is not possible, write a script that narrows the scope for AI review.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Parsimony.&lt;/strong&gt; Your code, documentation, and prompts are concise. No redundant comments restating what the code already says. No empty abstractions. No premature generalization. Every line earns its place.&lt;/p&gt;




&lt;h2&gt;
  
  
  For LLM Users: How to Work with This Material
&lt;/h2&gt;

&lt;p&gt;This article is written in Markdown and available as a raw Markdown version for LLM consumption: &lt;a href="https://blog.rezvov.com/specification-driven-development-four-pillars.md" rel="noopener noreferrer"&gt;https://blog.rezvov.com/specification-driven-development-four-pillars.md&lt;/a&gt;. That format is more convenient for feeding directly into Claude, Cursor, or any other AI tool.&lt;/p&gt;

&lt;p&gt;But the point is not to feed it to Claude or Cursor so it "just does what's written here." The point is to use it as a basis for a conversation with your AI tool:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Walk through the four pillars with the AI. Ask it to explain each one in the context of your specific task&lt;/li&gt;
&lt;li&gt;Challenge the examples. Ask "what would a traceability violation look like in my project?"&lt;/li&gt;
&lt;li&gt;Work through scenarios: "If I change this requirement, what else needs to change?"&lt;/li&gt;
&lt;li&gt;Ask the AI to review your code against these principles and explain what it finds&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The value is not in the AI blindly applying rules. It is in you developing an understanding of WHY each pillar exists and HOW to apply it, using the AI as a thinking partner.&lt;/p&gt;

&lt;h2&gt;
  
  
  For ForEach Partners Internship Candidates
&lt;/h2&gt;

&lt;p&gt;If you are completing a test assignment for the &lt;a href="https://jl.foreachpartners.com/" rel="noopener noreferrer"&gt;ForEach Partners Junior Lab internship program&lt;/a&gt;, here is what we evaluate:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not just the code.&lt;/strong&gt; We look at the result (code, tests, specs), but also at the process. Your prompts during development, your iteration with AI tools, your decisions about what to automate and what to review manually: these are part of the evaluation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Show your work.&lt;/strong&gt; A screencast of your development session is the strongest signal. A chat log export is acceptable. We want to see how you decomposed the task, how you guided the AI, how you caught and corrected its mistakes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SDD as a conversation topic.&lt;/strong&gt; During the interview, we will discuss how you applied SDD principles in your test assignment. Not whether you memorized the definitions, but whether you understood the reasoning and made deliberate choices.&lt;/p&gt;

&lt;p&gt;The specific requirements for each role are described in the corresponding vacancy posting.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published: &lt;a href="https://blog.rezvov.com/specification-driven-development-four-pillars" rel="noopener noreferrer"&gt;Specification-Driven Development: The Four Pillars — Alex Rezvov's Blog&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aidrivendevelopment</category>
      <category>softwaredevelopment</category>
      <category>architecture</category>
      <category>guide</category>
    </item>
  </channel>
</rss>
