<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jakob Steinmann</title>
    <description>The latest articles on DEV Community by Jakob Steinmann (@mjcs).</description>
    <link>https://dev.to/mjcs</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3615796%2F64a73cf5-b080-408b-a046-1a77c03fe7a0.jpg</url>
      <title>DEV Community: Jakob Steinmann</title>
      <link>https://dev.to/mjcs</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mjcs"/>
    <language>en</language>
    <item>
      <title>Harness Engineering — The Quality Pillar of Agentic Engineering</title>
      <dc:creator>Jakob Steinmann</dc:creator>
      <pubDate>Sat, 06 Jun 2026 13:09:58 +0000</pubDate>
      <link>https://dev.to/mjcs/harness-engineering-the-quality-pillar-of-agentic-engineering-31e0</link>
      <guid>https://dev.to/mjcs/harness-engineering-the-quality-pillar-of-agentic-engineering-31e0</guid>
      <description>&lt;h3&gt;
  
  
  How to use deterministic tooling to hard-enforce code quality
&lt;/h3&gt;

&lt;p&gt;In February 2026, OpenAI published an &lt;a href="https://openai.com/de-DE/index/harness-engineering/" rel="noopener noreferrer"&gt;article&lt;/a&gt; about building a complete product without writing a single line of code by hand. An agent did all the work. They used the term &lt;strong&gt;harness engineering&lt;/strong&gt; to describe the discipline around it.&lt;/p&gt;

&lt;p&gt;TL;DR: &lt;strong&gt;humans steer, agents execute&lt;/strong&gt;. The team did not write code anymore. Their job was to build the environment around the agent so it could work reliably.&lt;/p&gt;

&lt;p&gt;The article groups this work into roughly four sections:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Context&lt;/em&gt; — the repository as the single source of truth&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Observability&lt;/em&gt; — DevTools, logs, metrics&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Feedback loops&lt;/em&gt; — basically a &lt;a href="https://ghuntley.com/loop/" rel="noopener noreferrer"&gt;Ralph loop&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Guardrails&lt;/em&gt; — the deterministic &lt;strong&gt;quality pillar&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Let's focus the quality pillar
&lt;/h3&gt;

&lt;p&gt;Software developers let agents write their code. But &lt;strong&gt;we still own it and we are still accountable for it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Given this constraint, we need to trust the output. Before coding agents and even now this trust came from review. We read the code, added comments, made improvements. &lt;strong&gt;We were crafting code to establish trust&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;With coding agents code creation becomes code generation. And generated code will always break at some point if it's not heavily audited &lt;strong&gt;during&lt;/strong&gt; the generation phase. I strongly lean to using pre-commit hooks for this, as this gives a bunch of benefits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The gate runs locally, in your teams workflow&lt;/li&gt;
&lt;li&gt;The tooling is old and battle proven, the agent knows it well&lt;/li&gt;
&lt;li&gt;Nothing broken reaches the branch&lt;/li&gt;
&lt;li&gt;It lives in the repo, usable by everyone&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most of us already use pre-commit hooks. Usually one or two. A commit message linter for Conventional Commits, maybe a formatter. That is where it normally ends.&lt;/p&gt;

&lt;p&gt;But here is the new thing: you can amass the tools, build a huge stack. In a human team, every check you add reduces your ROI. Now you can apply 50+ hooks and your agent (with autocompaction) will run 24/7 until they pass completely autonomously. Your code quality improves automatically, your ROI increases!&lt;/p&gt;

&lt;h3&gt;
  
  
  Know that you don't know the tooling
&lt;/h3&gt;

&lt;p&gt;Some tools everyone knows. We are used to measuring coverage. We have seen Sonar reports on cyclomatic complexity and ignored them because of the refactoring effort it would take. But I was really surprised when I started digging into this, how broad the tooling really is.&lt;/p&gt;

&lt;p&gt;I built a TypeScript scaffolding template for greenfield projects, and while working on it I stumbled across jscpd. Never heard of it before. (You might argue that as an IT architect I should have, and &lt;strong&gt;you are absolutely right&lt;/strong&gt;). Now it is a central part of my pre-commit hook pipeline.&lt;/p&gt;

&lt;p&gt;You are the "show me the code" guy? Throw this &lt;a href="https://github.com/steinmann321/myprojectclone-typescript" rel="noopener noreferrer"&gt;TypeScript template&lt;/a&gt; on your agent and let it explain it to you. Or its &lt;a href="https://github.com/MO2k4/myprojectclone-dotnet" rel="noopener noreferrer"&gt;.NET version&lt;/a&gt; a colleague of mine built.&lt;/p&gt;

&lt;p&gt;For all others I have a concrete example.&lt;/p&gt;

&lt;h3&gt;
  
  
  Feeding &lt;a href="https://banay.me/dont-waste-your-backpressure/" rel="noopener noreferrer"&gt;pressure&lt;/a&gt; back into the agent
&lt;/h3&gt;

&lt;p&gt;The second important part is how to build the rule.&lt;/p&gt;

&lt;p&gt;When a check fails, the agent &lt;strong&gt;really&lt;/strong&gt; wants to make it green. It might fix the code or weaken the rule, which means: raise a threshold, extend an ignore list, apply a #noqa comment. The bad news: Models are trained on data created by humans so weakening the rule looks as valid as fixing the issue. And one thing hasn't changed too: &lt;strong&gt;fixing the issue is almost always harder&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;So a raw tool output is not enough. I suggest this best practice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;wrap the tool in a custom script&lt;/li&gt;
&lt;li&gt;on error emit an instructive message&lt;/li&gt;
&lt;li&gt;the message focuses on how to fix quantitatively and qualitatively&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the tool needs it, give it a very small bypass window and instruct the agent to document the reasoning and tag it as a bypass. This gives a reviewer the chance to see why the decision was made.&lt;/p&gt;

&lt;p&gt;This is one of my hooks, type coverage at 100 percent.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt; bunx type-coverage &lt;span class="nt"&gt;--strict&lt;/span&gt; &lt;span class="nt"&gt;--detail&lt;/span&gt; &lt;span class="nt"&gt;--at-least&lt;/span&gt; 100 2&amp;gt;&amp;amp;1&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"✗ Type coverage is below 100%."&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"  THE ONLY CORRECT FIX IS TO PROPERLY TYPE THE REPORTED EXPRESSION."&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"  This means: define an interface or type for the data, annotate the"&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"  variable, narrow with a type guard — whatever it takes. Even if it"&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"  is a lot of work, that work is always the right answer."&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"  Suppression (type-coverage:ignore-next-line / ignore-file) and blind"&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"  casts (as SomeType) are introducing technical debt and are only"&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"  permitted when it is 100% certain that no proper typing is possible"&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"  (e.g. a 3rd-party API with no published schema and no way to infer"&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"  one from the code). If you reach for suppression before exhausting"&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"  proper typing options, you are doing it wrong."&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"  When suppression truly cannot be avoided, add a BYPASS-JUSTIFICATION"&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"  comment on the line above explaining exactly why proper typing is"&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"  impossible — not just inconvenient."&lt;/span&gt;
  &lt;span class="nb"&gt;exit &lt;/span&gt;1
&lt;span class="k"&gt;fi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The wording is not cosmetic. An earlier version of this message gave the bypass the same weight as the fix, &lt;strong&gt;token wise and from the wording&lt;/strong&gt;. On a large codebase the model took the cheap path and bypassed everywhere. It even put real effort into the justifications. I changed the message to the one you see here, and this version flipped the result from escaping to fixing.&lt;/p&gt;

&lt;p&gt;This is the craft now. The tool gives you the deterministic gate, pass or fail. The message gives the model the instruction on how to react to it properly instead of randomly. You need both, and &lt;strong&gt;the message is where the work is&lt;/strong&gt;.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>automation</category>
      <category>codequality</category>
    </item>
  </channel>
</rss>
