<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Leonid Bugaev</title>
    <description>The latest articles on DEV Community by Leonid Bugaev (@leonidbugaev).</description>
    <link>https://dev.to/leonidbugaev</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3894592%2F46d1e7e7-f074-4ce0-9117-3b4df83e6863.png</url>
      <title>DEV Community: Leonid Bugaev</title>
      <link>https://dev.to/leonidbugaev</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/leonidbugaev"/>
    <language>en</language>
    <item>
      <title>What is the new engineering bottleneck?</title>
      <dc:creator>Leonid Bugaev</dc:creator>
      <pubDate>Thu, 07 May 2026 07:05:53 +0000</pubDate>
      <link>https://dev.to/leonidbugaev/what-is-the-new-engineering-bottleneck-3p4g</link>
      <guid>https://dev.to/leonidbugaev/what-is-the-new-engineering-bottleneck-3p4g</guid>
      <description>&lt;p&gt;Something I keep thinking about:&lt;/p&gt;

&lt;p&gt;Maybe AI is not exposing a new problem in engineering. Maybe it is exposing an old one that we were already bad at.&lt;/p&gt;

&lt;p&gt;We talk about AI like the bottleneck is still writing code. But honestly, writing code has not been the hard part for a long time. The hard part is all the surrounding context.&lt;/p&gt;

&lt;p&gt;Why are we making this change? Who asked for it? Which customer depends on the current behavior? Was this weird edge case intentional? Is this a product decision or just an implementation accident? Did security already review something similar before? Is the documentation describing the current behavior or the behavior we wish we had?&lt;/p&gt;

&lt;p&gt;And the uncomfortable part is that most of this context is not in one place.&lt;/p&gt;

&lt;p&gt;It is all over GitHub, Slack, docs, tests, head of the engineer who left six months ago.&lt;/p&gt;

&lt;p&gt;This was already a problem. AI just makes it harder to ignore. Because now we can create more code from less context, more tests, more docs, more confident explanations.&lt;/p&gt;

&lt;p&gt;But if the context is incomplete, then all of that output is built on sand. This is the part I find interesting.&lt;/p&gt;

&lt;p&gt;Not “will AI replace engineers?” I don’t think that is the most useful question.&lt;/p&gt;

&lt;p&gt;The more interesting question is: What happens when engineering teams can generate implementation faster than they can preserve intent?&lt;/p&gt;

&lt;p&gt;Because that is where things get messy.&lt;/p&gt;

&lt;p&gt;You can have a clean PR. You can have passing tests. You can have updated docs. You can even have a very convincing AI-generated explanation.&lt;/p&gt;

&lt;p&gt;And still nobody can answer the basic question: “Is this actually the right change?”&lt;/p&gt;

&lt;p&gt;That question is much more expensive than people admit. I have felt this many times in open source and infrastructure work.&lt;/p&gt;

&lt;p&gt;You look at a small change and think, “This should be simple.” Then you start pulling the thread. There is a backwards compatibility issue. There is some behavior that looks wrong but someone depends on it. There is a test that protects the implementation but not the real promise. There is a doc page that says one thing and production behavior says another. There is a customer workaround that became part of the product without anyone naming it. Suddenly the small change is not small.&lt;/p&gt;

&lt;p&gt;And this is why I think “AI will make everyone ship faster” is only half true. AI can make the creation part faster. But creation is not the same as shipping. Shipping means the organization understands the change well enough to stand behind it.&lt;/p&gt;

&lt;p&gt;That is a different problem. I don’t have the perfect answer yet.&lt;/p&gt;

&lt;p&gt;But I think “AI coding” is the wrong frame. The real problem is not coding. The real problem is engineering memory.&lt;/p&gt;

&lt;p&gt;And most teams’ engineering memory is held together with Slack search, old PR comments, and someone saying: “I think I remember why we did that.”&lt;/p&gt;

&lt;p&gt;That does not scale.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
    </item>
    <item>
      <title>Trust Is the Bottleneck</title>
      <dc:creator>Leonid Bugaev</dc:creator>
      <pubDate>Thu, 07 May 2026 07:04:16 +0000</pubDate>
      <link>https://dev.to/leonidbugaev/trust-is-the-bottleneck-7p2</link>
      <guid>https://dev.to/leonidbugaev/trust-is-the-bottleneck-7p2</guid>
      <description>&lt;p&gt;Everyone is asking the same question now: if AI can help us create much more code, why aren’t engineering teams suddenly moving much faster?&lt;/p&gt;

&lt;p&gt;I think the question is right, but the answer usually stops too early.&lt;/p&gt;

&lt;p&gt;AI does make some things dramatically faster. MVPs are faster. Prototypes are faster. The time to validate an idea is reduced a lot. You can explore directions that previously weren’t worth the effort. This is real, and I don’t want to pretend otherwise. But creating the first version of something isn’t the same as maintaining a product, and creating more pull requests isn’t the same as creating more trusted change.&lt;/p&gt;

&lt;p&gt;This is where the economics breaks. If your team can create ten times more pull requests, your product doesn’t automatically move ten times faster. Your company doesn’t become ten times faster. The economy doesn’t double or triple. Because the expensive part of mature engineering was never only the typing of code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The expensive part is trust.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Can I trust this change? Does it match the intent? Does it break a hidden customer flow? Does it affect backwards compatibility? Are the docs updated? Are the tests proving the right thing? Did we think about security, performance, malformed input, error states, release notes, migration, support?&lt;/p&gt;

&lt;p&gt;A pull request doesn’t answer all of this.&lt;br&gt;
A pull request is just something asking to be trusted.&lt;/p&gt;

&lt;p&gt;So I don’t think the interesting question is “can AI create more code?” It can. The interesting question is: what needs to exist around the code so we can safely absorb more change?&lt;/p&gt;

&lt;p&gt;If we can scale trust, we can unlock the real scaling of AI. But not by sending maintainers ten times more PRs. That only moves the bottleneck. What I want is a pull request that comes with enough context that I can actually believe it: why this change exists, what it affects, which tests prove it, which docs changed, what can break, and what still needs a human decision.&lt;/p&gt;

&lt;p&gt;AI made implementation cheaper, but trust is still expensive. Join me on journey to find what to trust in new, post AI world.&lt;/p&gt;

&lt;p&gt;If that is the kind of engineering problem you care about, subscribe.&lt;/p&gt;

&lt;h2&gt;
  
  
  If your trust model is green CI, you are in trouble
&lt;/h2&gt;

&lt;p&gt;AI isn’t going back in the box. Even if you personally didn’t join the hype train, people around you probably did. Engineers use it to write code. PMs use it to write specs. Someone asks it to validate the plan, then write the code, then write the tests, then check everything against the same plan again.&lt;/p&gt;

&lt;p&gt;I do it too. I ask AI to help me write the spec. Then I ask it to validate the plan. Then I ask it to write the code. Then validate its own code. Then write the tests. Then check everything against the original plan. It’s tempting because it works surprisingly well. In a lot of cases, it feels almost magical.&lt;/p&gt;

&lt;p&gt;But this is exactly why it becomes dangerous.&lt;/p&gt;

&lt;p&gt;For a long time, our basic engineering trust model was something like this: write the code, write the tests, pass CI/CD, review the pull request, ship. It was never perfect, but it was much better than nothing. Green CI never meant the product was correct. It meant the code passed the checks we had.&lt;/p&gt;

&lt;p&gt;The problem is that those checks don’t prove intent. They don’t prove the requirement was correct. They don’t prove the tests were testing the right thing. They don’t prove the documentation was complete. They don’t prove the change matched the real product behavior we needed.&lt;/p&gt;

&lt;p&gt;They prove that the current artifacts agreed with the current checks.&lt;/p&gt;

&lt;p&gt;With AI, the whole chain can be generated. The spec can be wrong, the code can follow the wrong spec, the tests can validate the wrong code, the docs can describe the wrong behavior, and CI can still be green. Everything agrees with everything, but the intent is wrong.&lt;/p&gt;

&lt;p&gt;That’s not trust. That’s a consistent mistake.&lt;/p&gt;

&lt;p&gt;This is why high coverage isn’t enough either. In my previous article about jsonparser, the painful part wasn’t that I had no tests. I had near-100% coverage in the area that mattered. The problem was that malformed input behavior was never properly described. So the tests proved what existed, not what should have existed.&lt;/p&gt;

&lt;p&gt;You cannot test what you never described.&lt;/p&gt;

&lt;p&gt;Security makes this even less optional. For years, many teams survived with some quiet version of security by obscurity. Not officially, of course. Everyone says security matters. But in practice, a lot of software depended on nobody looking too closely, or on attackers moving slowly enough that maintainers had time to react.&lt;/p&gt;

&lt;p&gt;That assumption is breaking. VulnCheck reported that in the first half of 2025, 32.1% of known exploited vulnerabilities had exploitation evidence on or before the day the CVE was issued. This doesn’t mean every vulnerability becomes an exploit in hours, but it does mean the old time cushion isn’t something you can build your product around anymore.&lt;/p&gt;

&lt;p&gt;So things that felt optional before become normal engineering requirements: malformed input, authorization boundaries, resource limits, timeout behavior, error states, data exposure, public API behavior. These aren’t enterprise extras. They’re product requirements.&lt;/p&gt;

&lt;p&gt;This is the uncomfortable part: the trust problem is now everyone’s problem. Even if your company hasn’t “adopted AI,” your people probably have. Even if your CI is green, it may be green against the wrong intent. Even if your coverage is high, it may cover the behavior you remembered to describe, not the behavior the product actually needs.&lt;/p&gt;

&lt;p&gt;So we need a different source of truth. Not instead of CI/CD, not instead of tests, not instead of code review. Above them. Something that says what the system is supposed to do, which obligations apply, what evidence proves them, and what becomes suspicious when something changes.&lt;/p&gt;

&lt;p&gt;Otherwise AI won’t only help us move faster.&lt;br&gt;
It will help us move faster with a false feeling of safety.&lt;/p&gt;

&lt;h2&gt;
  
  
  The outside structure is not the product
&lt;/h2&gt;

&lt;p&gt;I know this problem from open source. For the last 12 years at least, I worked a lot in open source. I had my own popular open source projects, and today at Tyk we build an open source API Gateway.&lt;/p&gt;

&lt;p&gt;Open source is hard. Not because people are bad. Usually it’s the opposite. Someone from the outside sends you a pull request. Maybe it’s a bug fix. Maybe a new feature. Maybe it’s useful. Maybe it’s technically correct. Maybe they spent their evening on it.&lt;/p&gt;

&lt;p&gt;But as a maintainer, you still need to get inside the context. You need to understand what’s happening and why this person is doing it. You can be fast and accept too much, or stay picky and make people unhappy. Neither option really solves the trust problem.&lt;/p&gt;

&lt;p&gt;The real issue isn’t that contributors are bad. The issue is that they see the outside structure. They see the code. Maybe they see the tests. Maybe they see the docs. But they don’t see the intent in the same way the owner of the project sees it. They don’t know all the small product promises made over the years. They don’t know which ugly thing is accidental and which ugly thing is load-bearing. They don’t know which customer flow depends on some behavior that looks strange from the outside.&lt;/p&gt;

&lt;p&gt;They are not inside of this bubble.&lt;/p&gt;

&lt;p&gt;And this isn’t only open source. The same thing happens inside a company. Someone from support knows the product very well. They see customer pain every day. They may even be technical enough to raise a pull request. Someone from solutions architecture can do the same. Another team can contribute to your service. AI makes all of this easier.&lt;/p&gt;

&lt;p&gt;But internal doesn’t automatically mean trusted.&lt;/p&gt;

&lt;p&gt;A support engineer may understand the product from the customer side, but not the architecture. Another team may understand code, but not the local history. AI may generate something that looks clean, but it has no real ownership unless someone gives it context and checks it.&lt;/p&gt;

&lt;p&gt;These contributions can become shallow. Not useless. Shallow. They touch the visible layer of the system, but they aren’t backed by the deep intent of the people who own this part of the product.&lt;/p&gt;

&lt;p&gt;We tried relaxing quality gates a few times. More people contributing sounds obviously good, especially when every company has more backlog than humans. But we had cases where a simple line, a simple fix, broke everything. We had other cases where the fix was so big in scope that it was too dangerous to move there.&lt;/p&gt;

&lt;p&gt;The conclusion wasn’t “only engineers can write code.”&lt;br&gt;
The conclusion was: if you want to scale engineering, it’s always about trust.&lt;/p&gt;

&lt;p&gt;This is also why “move fast” changes meaning when you have customers. When you’re still searching for an MVP, you can break things and call it learning. But when customers put your product inside their infrastructure, the product is no longer fully yours. They pay you for stability, security, and predictable behavior. In a way, you give away part of the ownership.&lt;/p&gt;

&lt;p&gt;At Tyk, this is very real. We build software used by banks, governments, and large enterprises. Quality assurance isn’t some internal slogan. It’s part of the relationship with customers. Every software has bugs; I don’t want to pretend otherwise. But the price of a bug isn’t the same everywhere. Sometimes it’s legal. Sometimes it’s regulatory. Sometimes it’s very big money. Forget even the money for a second: what if the bank goes down? It can become a national-level issue.&lt;/p&gt;

&lt;p&gt;Speed isn’t how quickly you can make a change.&lt;br&gt;
Speed is how quickly you can safely absorb change.&lt;/p&gt;

&lt;p&gt;Lehman’s software evolution work has a phrase that fits here: “The safe rate of change per release is constrained by the process dynamics.” In the same passage, he says that as the number, size, and architectural distance of changes increase, complexity and fault rate grow more than linearly.&lt;/p&gt;

&lt;p&gt;This sounds academic, but it matches product reality. You can move only as fast as your safety norms allow. Your current team, your architecture, your customer base, your process, your quality gates, your review culture — all of this defines your real speed.&lt;/p&gt;

&lt;p&gt;If AI gives you more change than your trust system can absorb, you aren’t scaling engineering. You’re scaling incoming work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Temporary specs become archaeology
&lt;/h2&gt;

&lt;p&gt;One of the deeper problems is how we treat specifications in consumer engineering.&lt;/p&gt;

&lt;p&gt;Most of what we call a specification is a temporary artifact. You start with all the best practices. Maybe an RFC. Then it becomes a detailed Jira ticket. Maybe later there is an ADR. There are comments in GitHub. A Slack thread. A Confluence page. A few decisions made during review because reality was different from the original assumption.&lt;/p&gt;

&lt;p&gt;At the moment, this feels normal. This is how software gets built. But after some time, all these artifacts pile up. If you want to understand how a component works, you need to dig through history. You need to understand why it ended up in this final state. Why this was done and not that. You may be lucky and find the exact explanation. In most cases it’s lost in someone’s head.&lt;/p&gt;

&lt;p&gt;This is archaeology, not development.&lt;/p&gt;

&lt;p&gt;The bigger problem is that these artifacts are independent. The RFC isn’t connected to all the code. The Jira ticket isn’t connected to all the tests. The docs are scattered across ten pages. The final implementation isn’t connected back to the original assumptions. It’s not a graph.&lt;/p&gt;

&lt;p&gt;So we trust a person — engineer, architect, lead, PM — to hold the high-level picture in their head. We trust them to find all dependencies. We trust them to notice backwards compatibility issues. We trust them to know which docs need updating. We trust them to remember which customer flow can break.&lt;/p&gt;

&lt;p&gt;And of course people forget. Not because they’re careless. Because this is too much context for one person to carry.&lt;/p&gt;

&lt;p&gt;You fix a bug and break some other flow. You build a feature and forget a dependency with another service. You update two documentation pages and miss the other eight. The feature exists, but it’s unusable for one group of users. The implementation works, but not in the real production shape.&lt;/p&gt;

&lt;p&gt;The spec was supposed to create clarity.&lt;br&gt;
But because it was temporary, it becomes one more historical artifact.&lt;/p&gt;

&lt;p&gt;This is also why spec-first isn’t enough. Spec-driven development is better than no spec. Planning before coding is obviously better than jumping into implementation. But if the spec is still treated as a temporary artifact, after a few iterations you end up in the same position, with intent chaos.&lt;/p&gt;

&lt;p&gt;During development, the spec always changes. You start with assumptions. Researched assumptions, but still assumptions. Then implementation begins and reality appears. The architecture doesn’t work. A limitation appears. A reviewer notices a security issue. QA finds a case. A customer dependency changes the direction. And in many teams, the spec isn’t updated.&lt;/p&gt;

&lt;p&gt;The real knowledge moves into GitHub comments, Slack messages, review threads, and people’s heads. The Jira ticket becomes stale. The implementation says one thing. The ticket says another.&lt;/p&gt;

&lt;p&gt;Imagine someone from QA comes back from vacation and needs to test the feature. They see the ticket. They see the implementation. They have no idea what is happening. Why this? Why not that? Is this intended?&lt;/p&gt;

&lt;p&gt;It’s so common. I bet a lot of you feel the same.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I can know what I don’t know?
&lt;/h2&gt;

&lt;p&gt;A lot of bugs are not in the code first. They are in the missing specification.&lt;/p&gt;

&lt;p&gt;Have you actually described what should happen if the input is malformed? Have you described that this functionality must not allow SQL injection? What happens if the third-party service times out? What is the error state? Have you described authorization boundaries? Resource limits? Performance boundaries? What happens when something goes from ten requests per second in a test environment to thousands in production?&lt;/p&gt;

&lt;p&gt;There are also more subtle cases. Concurrency. Non-deterministic behavior. Map iteration. Merge order. I’m looking at you, Go.&lt;/p&gt;

&lt;p&gt;Have you described that the behavior should be deterministic? Did you write a test for it? Did the test prove the requirement, or did it just execute the code?&lt;/p&gt;

&lt;p&gt;This is where checklists, obligations, processes, discipline, and all the boring stuff come in. I know people hate boring process. I hate fake process too. Documents nobody reads. Boxes people tick after the fact. Quality theatre.&lt;/p&gt;

&lt;p&gt;But the useful version is different. An obligation is not a test case. It’s a category of behavior you are required to describe: malformed input, boundary behavior, error handling, access denied, determinism, idempotency, atomicity, nil safety, overflow safety, encoding safety.&lt;/p&gt;

&lt;p&gt;The obligation doesn’t tell you the answer.&lt;br&gt;
It forces you to ask the question.&lt;/p&gt;

&lt;p&gt;That is why I like it. It turns “maybe someone remembers” into a deterministic process. The checklist itself is human judgment. But checking whether the spec covered the checklist can be mechanical.&lt;/p&gt;

&lt;p&gt;This is also where AI can help without pretending to own the product. If the code uses goroutines, the system can ask where cancellation, lifecycle, and error propagation are described. If code depends on map iteration or merge logic, it can ask whether determinism or commutativity matters. If code reads time directly, it can ask whether time is part of the behavior and how this is tested. If code changes a public API, it can ask where compatibility and documentation obligations are.&lt;/p&gt;

&lt;p&gt;This isn’t AI judging architecture taste. It’s tooling surfacing missed questions.&lt;/p&gt;

&lt;p&gt;That is the “how I know what I don’t know” loop. Spec obligations force code and test evidence. Code shape can reveal missing spec questions.&lt;/p&gt;

&lt;h2&gt;
  
  
  What regulated industries got right
&lt;/h2&gt;

&lt;p&gt;Consumer engineering and regulated engineering live in different worlds. Different tools. Different conferences. Different language. Some of it is archaic. Some of it is bureaucracy. I don’t want every SaaS team to become an avionics certification team.&lt;/p&gt;

&lt;p&gt;But we shouldn’t ignore what they learned.&lt;/p&gt;

&lt;p&gt;I expected to find paperwork. Annoyingly, I found a lot of things our world forgot to learn.&lt;/p&gt;

&lt;p&gt;In aviation, automotive, medical devices, space systems, the spec isn’t treated as a temporary note. It’s a source of truth that lives together with the software. Requirements have IDs. They have layers. They are linked to documentation, tests, implementation, verification evidence. You can see blast radius. You can see what a change affects. During review, if implementation differs from the spec, the spec must be updated.&lt;/p&gt;

&lt;p&gt;The useful idea is not the paperwork.&lt;br&gt;
The useful idea is that intent is durable, traceable, and connected to evidence.&lt;/p&gt;

&lt;p&gt;NASA’s FRET is one example of this direction. It lets users enter hierarchical system requirements in structured natural language, gives those requirements unambiguous semantics, and can show them as natural language, formal logic, diagrams, and interactive simulation.&lt;/p&gt;

&lt;p&gt;That doesn’t mean every product team needs FRET or formal methods everywhere. It means the requirement is not just a document. It’s something you can analyze, link, verify, and keep alive.&lt;/p&gt;

&lt;p&gt;This is where requirement management becomes interesting again for consumer engineering. Not the old heavy version copied blindly from regulated industries. Not paperwork for paperwork. But the useful part: a source of truth, cross-links, invalidation, traceability, and evidence.&lt;/p&gt;

&lt;p&gt;Combined with everything consumer engineering learned over the years: CI/CD, pull requests, fast feedback, developer experience, automated tests, observability, docs, release automation.&lt;/p&gt;

&lt;p&gt;We should not throw away modern engineering.&lt;br&gt;
We should add the missing trust layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  From pull request to evidence pack
&lt;/h2&gt;

&lt;p&gt;Today a pull request usually gives me code, maybe tests, maybe a description. But it doesn’t give me the whole chain.&lt;/p&gt;

&lt;p&gt;It doesn’t tell me the original intent. It doesn’t tell me which obligations apply. It doesn’t show the blast radius. It doesn’t show which docs changed or should have changed. It doesn’t show which specs this conflicts with. It doesn’t show what changed during implementation compared to the plan.&lt;/p&gt;

&lt;p&gt;So the reviewer has to reconstruct all of that.&lt;/p&gt;

&lt;p&gt;Again, archaeology.&lt;/p&gt;

&lt;p&gt;What I want instead is an evidence pack. Not enterprise theatre. Not documents for the sake of documents. A practical package that makes the change reviewable.&lt;/p&gt;

&lt;p&gt;Here is the intent. Here are the requirements. Here are the obligations. Here are the tests that witness them. Here are the docs. Here is the blast radius. Here is how it aligns with existing specs and where we checked for conflicts. Here is what changed during implementation. Here is what still needs human judgment.&lt;/p&gt;

&lt;p&gt;Then the pull request isn’t only code. It’s the full chain of development.&lt;/p&gt;

&lt;p&gt;This matters for open source. It matters for support engineers contributing fixes. It matters for other internal teams. It matters for AI agents. You don’t trust the contributor blindly. You don’t trust AI blindly. You trust the evidence chain, and then you still apply human judgment where judgment is needed.&lt;/p&gt;

&lt;p&gt;This will feel slower at first. Writing obligations is slower than writing a vague ticket. Linking tests to requirements is slower than writing random tests. Updating docs through the graph is slower than pushing a change and hoping someone remembers. But not all friction is bad.&lt;/p&gt;

&lt;p&gt;The question is whether the friction creates trust.&lt;/p&gt;

&lt;p&gt;Bureaucracy gives you friction without trust.&lt;br&gt;
Evidence gives you friction that lets more people move safely.&lt;/p&gt;

&lt;p&gt;If this trust exists, then AI can actually help us scale. Not by dumping more pull requests into the same review bottleneck, but by making more changes reviewable, traceable, and safe to absorb in parallel.&lt;/p&gt;

&lt;p&gt;Without trust, maintainers become managers of incoming things. Instead of thinking about architecture, future, and vision, they review an endless stream of pull requests, fixes, and generated artifacts.&lt;/p&gt;

&lt;p&gt;That is not the scaling I want.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I am building Proof
&lt;/h2&gt;

&lt;p&gt;This is why I am building &lt;a href="https://reqproof.com" rel="noopener noreferrer"&gt;Proof&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I don’t want another tool whose main purpose is to create more code. We already have many of those. The problem isn’t that we can’t produce enough artifacts. The problem is that the artifacts don’t preserve intent.&lt;/p&gt;

&lt;p&gt;I want specs to stop being temporary. I want requirements to live with the software. I want obligations to force the boring questions before they become production bugs. I want code, tests, docs, and requirements to invalidate each other when they drift. I want a reviewer to see the evidence chain instead of rebuilding it from memory.&lt;/p&gt;

&lt;p&gt;AI will make engineering faster. That part is already happening. But faster without trust is not enough.&lt;/p&gt;

&lt;p&gt;For me, the real question is this: how can I end up in the position where it’s not just a pull request coming from someone from the outside, but a well-thought evidence pack that makes me believe I can merge it as soon as possible?&lt;/p&gt;

&lt;p&gt;That is the scaling I care about.&lt;/p&gt;

&lt;p&gt;Not just more code.&lt;/p&gt;

&lt;p&gt;More trusted change.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
    <item>
      <title>I Had Near 100% Test Coverage. It Didn't Matter.</title>
      <dc:creator>Leonid Bugaev</dc:creator>
      <pubDate>Wed, 29 Apr 2026 18:17:34 +0000</pubDate>
      <link>https://dev.to/leonidbugaev/i-had-near-100-test-coverage-it-didnt-matter-46oi</link>
      <guid>https://dev.to/leonidbugaev/i-had-near-100-test-coverage-it-didnt-matter-46oi</guid>
      <description>&lt;p&gt;&lt;em&gt;You cannot test for what you never described.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I woke up and saw a wall of emails in my personal account. Then logged into my corporate Slack, and it was filled with Zendesk messages from customers. Everyone was looking for me.&lt;/p&gt;

&lt;p&gt;The library I wrote, &lt;a href="https://github.com/buger/jsonparser" rel="noopener noreferrer"&gt;jsonparser&lt;/a&gt;, which got used by a lot of projects, got its very own public CVE. So everyone started freaking out looking at their scanners.&lt;/p&gt;

&lt;p&gt;"That's what the fame is," was my first thought.&lt;/p&gt;

&lt;p&gt;Now I remember some notifications I kept ignoring from the Google OSS Fuzz project, I signed up multiple years ago.&lt;/p&gt;

&lt;p&gt;This lib was written in the pre-AI-agents era (so weird to say that now!). Every piece was handcrafted manually, using best practices, with full test coverage. I checked the function which had the issue, and it literally had near 100% test coverage. But it did not matter, because the issue was in handling of malformed input data. One of the edge cases which was missed. In other words, the issue was in the specification of what this function should do and how it should behave in edge cases.&lt;/p&gt;

&lt;p&gt;But it opened one more can of worms. I wrote this library like 6 years ago. I don't remember anything. And my only source of truth is the code and the tests, which is rather cryptic and looks more like archaeology.&lt;/p&gt;

&lt;p&gt;The issue is fixed now. But how do I prevent such issues happening in the future? And if 100% code coverage is not the answer, what is? And what is my source of truth?&lt;/p&gt;

&lt;p&gt;So I started digging. And it went way deeper than I expected, and changed the way I look at software engineering forever.&lt;/p&gt;

&lt;h2&gt;
  
  
  Down the rabbit hole
&lt;/h2&gt;

&lt;p&gt;I started thinking about what the gold standard of software quality is. My first answer was NASA. How does NASA solve these kinds of issues?&lt;/p&gt;

&lt;p&gt;AI now produces so much code that I feel like I am losing ownership of it. Not only of the code. Of the intent.&lt;/p&gt;

&lt;p&gt;I wanted to understand how people work when tests passing is still not enough and the price of being wrong is huge.&lt;/p&gt;

&lt;p&gt;The surprising thing is that a lot of NASA's work is public. Their software engineering requirements are public. FRET is public. Kind2 is public. A lot of the case studies are public. There are papers about aircraft, Mars rovers, superconducting magnets, and formal requirements that found bugs before code existed.&lt;/p&gt;

&lt;p&gt;I started reading all of this not as an academic exercise, but because I had a very dumb practical problem: my tests were green, my coverage looked fine, and still one missed edge case was enough to create a public CVE.&lt;/p&gt;

&lt;p&gt;Then I went deeper into automotive and aerospace. It opened a whole new world of software engineering for me. For some reason, our world of consumer software engineering and regulated software engineering in those industries almost do not intersect. Different tools, different conferences, different language. Sometimes it feels like they live in a parallel universe.&lt;/p&gt;

&lt;p&gt;Some of it looks archaic. Some methodologies are weird.&lt;/p&gt;

&lt;p&gt;Our engineering progressed a lot too. We got very good at moving fast and catching damage quickly. CI/CD, linters, tests, canaries, observability, rollbacks. I don't want to pretend every SaaS product should behave like avionics certification.&lt;/p&gt;

&lt;p&gt;But we optimized for speed. They optimized for evidence.&lt;/p&gt;

&lt;p&gt;Their industry spends much more time asking what evidence they need before they are allowed to trust the change. Some of it is painful. Some of it is bureaucracy. But the idea underneath is not stupid: if you claim the system should behave in some way, you need a durable chain from that statement to tests, code, and evidence.&lt;/p&gt;

&lt;p&gt;There is real proof there, but it is not the fantasy version I had in my head, where every line of every product is mathematically proven end-to-end. They prove specifications. They use model checking. They simulate models, like with Simulink, against many input/output cases. They measure structural coverage. They use formal proof where the criticality justifies it.&lt;/p&gt;

&lt;p&gt;And they still use testing, code review, static analysis, and all the normal engineering work around it. The difference is that proof and evidence are attached to the parts where being wrong is not acceptable.&lt;/p&gt;

&lt;p&gt;That actually made the idea useful for normal engineering.&lt;/p&gt;

&lt;p&gt;This is a huge topic, which I will cover in future articles. But the first concrete thing I found was MC/DC. It is one of the ways safety-critical industries look at coverage, and it made standard line coverage look very weak to me.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Line coverage says a line was touched at runtime. It does not say that the decision was tested.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why 90% line coverage can still mean 60% real coverage
&lt;/h2&gt;

&lt;p&gt;I still use line coverage. I still look at it.&lt;/p&gt;

&lt;p&gt;But line coverage is bullshit. You should not trust it. Not on its own.&lt;/p&gt;

&lt;p&gt;In Go, when you run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;go &lt;span class="nb"&gt;test&lt;/span&gt; &lt;span class="nt"&gt;-cover&lt;/span&gt; ./...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;you mostly get statement coverage. The tool tells you whether a statement executed during the test run. That's useful. But it doesn't tell you whether the decision was tested.&lt;/p&gt;

&lt;p&gt;Take a tiny parser-style example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;isDigit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="sc"&gt;'0'&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="sc"&gt;'9'&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now test it like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;TestIsDigit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;isDigit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sc"&gt;'5'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"5 should be a digit"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;isDigit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sc"&gt;'x'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"x should not be a digit"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Looks fine. The line ran. The function returned true once. The function returned false once. Your coverage report can look perfect.&lt;/p&gt;

&lt;p&gt;But what did you actually prove?&lt;/p&gt;

&lt;p&gt;You tested &lt;code&gt;'5'&lt;/code&gt;. You tested &lt;code&gt;'x'&lt;/code&gt;. You didn't prove the lower boundary. You didn't prove that &lt;code&gt;'/'&lt;/code&gt; fails because it's before &lt;code&gt;'0'&lt;/code&gt;. You didn't prove that &lt;code&gt;':'&lt;/code&gt; fails because it's after &lt;code&gt;'9'&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The line is covered. The boundary is not.&lt;/p&gt;

&lt;p&gt;MC/DC stands for Modified Condition/Decision Coverage. It asks the question line coverage does not ask: did each condition independently affect the outcome?&lt;/p&gt;

&lt;p&gt;When your code says &lt;code&gt;if a &amp;amp;&amp;amp; b&lt;/code&gt;, line coverage tells you the &lt;code&gt;if&lt;/code&gt; was hit. MC/DC asks whether &lt;code&gt;a&lt;/code&gt; alone can change the result, and whether &lt;code&gt;b&lt;/code&gt; alone can change the result.&lt;/p&gt;

&lt;p&gt;For this line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="sc"&gt;'0'&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="sc"&gt;'9'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;there are two conditions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;c &amp;gt;= '0'
c &amp;lt;= '9'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A simplified MC/DC table looks like this:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Case&lt;/th&gt;
&lt;th&gt;&lt;code&gt;c&lt;/code&gt;&lt;/th&gt;
&lt;th&gt;&lt;code&gt;c &amp;gt;= '0'&lt;/code&gt;&lt;/th&gt;
&lt;th&gt;&lt;code&gt;c &amp;lt;= '9'&lt;/code&gt;&lt;/th&gt;
&lt;th&gt;Result&lt;/th&gt;
&lt;th&gt;What it proves&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;lower edge&lt;/td&gt;
&lt;td&gt;&lt;code&gt;'0'&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;true&lt;/td&gt;
&lt;td&gt;true&lt;/td&gt;
&lt;td&gt;true&lt;/td&gt;
&lt;td&gt;lower edge accepted&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;below lower edge&lt;/td&gt;
&lt;td&gt;&lt;code&gt;'/'&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;false&lt;/td&gt;
&lt;td&gt;true&lt;/td&gt;
&lt;td&gt;false&lt;/td&gt;
&lt;td&gt;lower bound independently blocks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;above upper edge&lt;/td&gt;
&lt;td&gt;&lt;code&gt;':'&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;true&lt;/td&gt;
&lt;td&gt;false&lt;/td&gt;
&lt;td&gt;false&lt;/td&gt;
&lt;td&gt;upper bound independently blocks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;nominal digit&lt;/td&gt;
&lt;td&gt;&lt;code&gt;'5'&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;true&lt;/td&gt;
&lt;td&gt;true&lt;/td&gt;
&lt;td&gt;true&lt;/td&gt;
&lt;td&gt;normal digit works&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The table is just a way to say: these are the cases that matter. This is the part ordinary coverage does not force you to say.&lt;/p&gt;

&lt;p&gt;This used to be mostly a safety-critical tooling conversation. DO-178C requires MC/DC for the highest-criticality aviation software. The tooling was expensive, slow, and hard for normal teams to justify.&lt;/p&gt;

&lt;p&gt;That changed. GCC 14 has &lt;code&gt;-fcondition-coverage&lt;/code&gt;. Clang 18 has &lt;code&gt;-fcoverage-mcdc&lt;/code&gt;. Rust is moving in the same direction with richer branch and condition coverage work, even if I would not call Rust MC/DC stable yet. Go does not have native MC/DC support, so I ended up adding code-level Go MC/DC measurement to &lt;a href="https://reqproof.com" rel="noopener noreferrer"&gt;Proof&lt;/a&gt;, and we have been extending the same direction to JavaScript and TypeScript as well.&lt;/p&gt;

&lt;p&gt;What aerospace and automotive had because they were slow and diligent is now becoming available to normal engineering teams because AI changed the economics. You don't need a certification lab to ask a harder question about your tests. You also don't need to apply all of this to the whole company on day one. Start with the part where wrong behavior actually hurts.&lt;/p&gt;

&lt;h2&gt;
  
  
  The jsonparser numbers weren't subtle
&lt;/h2&gt;

&lt;p&gt;After the CVE fix, I wanted to understand why my previous approach didn't make this kind of missing behavior obvious enough.&lt;/p&gt;

&lt;p&gt;So I applied the MC/DC and requirements approach to jsonparser in a later public PR: &lt;a href="https://github.com/buger/jsonparser/pull/281" rel="noopener noreferrer"&gt;buger/jsonparser#281&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Again: this PR didn't fix the original CVE. It was the follow-up work after the CVE fix. But it was not just a paperwork exercise. The hardening pass found and fixed more real issues and removed dead code that my previous process had not made obvious.&lt;/p&gt;

&lt;p&gt;That was the uncomfortable part for me. I started by asking: what did my tests actually prove?&lt;/p&gt;

&lt;p&gt;On the main branch before that work, ordinary Go statement coverage was already decent:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Before / main&lt;/th&gt;
&lt;th&gt;After / PR&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Standard Go statement coverage&lt;/td&gt;
&lt;td&gt;85.3%&lt;/td&gt;
&lt;td&gt;99.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MC/DC decisions&lt;/td&gt;
&lt;td&gt;138/209 = 66.0%&lt;/td&gt;
&lt;td&gt;203/203 = 100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MC/DC conditions&lt;/td&gt;
&lt;td&gt;175/253 = 69.2%&lt;/td&gt;
&lt;td&gt;244/244 = 100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MC/DC gaps&lt;/td&gt;
&lt;td&gt;71 incomplete decisions, 78 missing condition proofs&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;85.3% coverage isn't bad. Most teams would see that and move on. But decision coverage told a different story: only 66% of decisions were fully covered, and only 69.2% of conditions were proven independently.&lt;/p&gt;

&lt;p&gt;And the more interesting part: some functions already looked perfect by ordinary coverage.&lt;/p&gt;

&lt;p&gt;Examples from the before state:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;parseInt                     100% statement coverage
Unescape                     100% statement coverage
decodeSingleUnicodeEscape    100% statement coverage
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But MC/DC still found missing independent-condition evidence:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;bytes.go:21   parseInt missing proof for c &amp;lt; '0'
escape.go:148 Unescape missing proof for len(in) &amp;gt; 0
escape.go:47  decodeSingleUnicodeEscape missing proof for h1 == badHex
escape.go:47  decodeSingleUnicodeEscape missing proof for h2 == badHex
escape.go:47  decodeSingleUnicodeEscape missing proof for h3 == badHex
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;100% line coverage can still leave a condition unproven.&lt;/p&gt;

&lt;p&gt;The code ran. The decision wasn't tested.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bug was in what I forgot to describe
&lt;/h2&gt;

&lt;p&gt;Coverage does not paint the whole picture. Even MC/DC. The bug can still be in the spec.&lt;/p&gt;

&lt;p&gt;That is what happened with jsonparser. It was a classical case: you are building something, moving forward, and not looking back. You don't know what you don't know. I did not think about what would happen if this edge case appeared. I think most of us do not think about it this way.&lt;/p&gt;

&lt;p&gt;I did not have any specs driving development or anything that forced me to think about the edge cases before writing the code. So of course I did not test for them. You cannot test for what you never described.&lt;/p&gt;

&lt;p&gt;Testing assumes the specification is correct. That is the NASA/formal-methods lesson that changed how I think about this. The hard part is not testing the implementation. The hard part is questioning the specification itself.&lt;/p&gt;

&lt;p&gt;This is where I found two different questions that I had been mashing together.&lt;/p&gt;

&lt;p&gt;The first question starts from my specification: if this is what I claim the system should do, which logical cases need to be witnessed?&lt;/p&gt;

&lt;p&gt;Not the code. The intent.&lt;/p&gt;

&lt;p&gt;NASA built an open-source tool called FRET (Formal Requirements Elicitation Tool) that lets you write requirements in structured English and translates them into formal logic.&lt;/p&gt;

&lt;p&gt;FRET includes an algorithm called FLIP (FuLl Independence Pair). FLIP takes a formalized requirement and generates the minimum set of test cases proving each boolean variable independently affects the outcome. Not every possible combination. Just the ones that matter.&lt;/p&gt;

&lt;p&gt;I still have to write the requirement. I still have to decide what malformed input, boundaries, errors, and edge cases mean. FLIP does not do that for me.&lt;/p&gt;

&lt;p&gt;But once the requirement is formalized, FLIP tells me exactly which test cases that requirement needs.&lt;/p&gt;

&lt;p&gt;I built a tool called &lt;a href="https://reqproof.com" rel="noopener noreferrer"&gt;Proof&lt;/a&gt; that implements this approach.&lt;/p&gt;

&lt;p&gt;That is the part I care about: how many tests are enough for this requirement?&lt;/p&gt;

&lt;p&gt;Not "how many tests did I happen to write?" Enough for what I described.&lt;/p&gt;

&lt;p&gt;The second question starts from my actual code: did my tests exercise every boolean condition in the implementation so each one independently affects the outcome?&lt;/p&gt;

&lt;p&gt;This side does not care what I meant. It looks at what I wrote.&lt;/p&gt;

&lt;p&gt;And sometimes it shows that my code has many more logical cases than my spec. So maybe my spec is not accurate enough.&lt;/p&gt;

&lt;p&gt;Or my spec says this edge case matters, but my tests don't witness it.&lt;/p&gt;

&lt;p&gt;Or my tests cover implementation details, but the behavior is under-described.&lt;/p&gt;

&lt;p&gt;I learned this the hard way on jsonparser. The spec side and the code side kept disagreeing in useful ways, and that is where code drift and spec drift become visible.&lt;/p&gt;

&lt;p&gt;The gap goes in both directions. Sometimes the code is wrong. Sometimes the tests are weak. Sometimes the spec is too vague.&lt;/p&gt;

&lt;p&gt;Sometimes all of it combined badly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Checklists, not memory
&lt;/h2&gt;

&lt;p&gt;What can be more deterministic than a checklist? In aerospace and automotive, everything has its own checklist. The price of a mistake is too high to rely on someone's memory. I think checklists are the driving force behind quality engineering in those industries.&lt;/p&gt;

&lt;p&gt;When you do not have specifications, it is very hard to create a checklist. When you are building a feature, you can have test cases, but that is a moving target. The items are constantly changing. You need something that will be the same all the time.&lt;/p&gt;

&lt;p&gt;You cannot rely on humans here. Even on me, to be frank. I can miss these items too. You need deterministic checklists.&lt;/p&gt;

&lt;p&gt;In practice, the questions are very simple:&lt;/p&gt;

&lt;p&gt;What will happen if this is malformed data? What will happen if this is slow and the request times out? What will happen if the database is down? What will happen if you have a very large object? What will happen if the function returns different values with the same inputs?&lt;/p&gt;

&lt;p&gt;These are the cases where security issues and data bugs tend to live. For jsonparser, these are the exact cases I had not thought about.&lt;/p&gt;

&lt;p&gt;Without obligations, edge cases depend on memory. Maybe I remember to test malformed data. Maybe the AI remembers. Maybe a reviewer notices. Maybe no one does.&lt;/p&gt;

&lt;p&gt;At the moment, it is just a matter of whether someone forgets or not forgets to test it.&lt;/p&gt;

&lt;p&gt;This is where the CVE fix actually changed how I work. The fix itself was mechanical. But the obligations I wrote afterward forced me to think about the cases I had skipped. Every one of those became an explicit question I had to answer. Not "did someone remember to test this?" but "here is the list, and each item needs a witness."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Obligations turn edge cases from "someone remembered to test this" into a deterministic process.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When I first started writing obligations for jsonparser, it was actually quite easy with modern AI tooling. I reviewed all of the specs. The flow is: you cannot pass this check until the checklist is green, until you define obligations for all of those cases, and until you define test cases for all of those cases as well.&lt;/p&gt;

&lt;p&gt;This is what the double link looks like in practice:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// In the code — annotated with the requirement it implements:&lt;/span&gt;
&lt;span class="c"&gt;// SYS-REQ-863&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Service&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;lookupCache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c"&gt;// ...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c"&gt;// In the test — annotated with both the requirement AND the specific MC/DC row:&lt;/span&gt;
&lt;span class="c"&gt;// Verifies: SYS-REQ-863&lt;/span&gt;
&lt;span class="c"&gt;// MCDC SYS-REQ-863: cache_lookup_requested=T, component_inputs_unchanged=F,&lt;/span&gt;
&lt;span class="c"&gt;//                    cached_component_result_reused=F =&amp;gt; TRUE&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;TestMCDC_SYS_REQ_863_Row1&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;evalVerifyScenario&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"SYS-REQ-863"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="s"&gt;"cache_lookup_requested"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;         &lt;span class="no"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;"component_inputs_unchanged"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;     &lt;span class="no"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;"cached_component_result_reused"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="no"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="no"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each test is not just "test the function." Each test is: "prove that this specific variable independently affects the outcome of this specific requirement."&lt;/p&gt;

&lt;p&gt;If I change the spec, I can see exactly which MC/DC rows are affected and which tests need to be reviewed. If I change a test, I can see which spec requirement it was proving and check whether the spec still says the same thing. If I add a new variable to the requirement, FLIP will generate new witness rows, and the missing tests become immediately visible.&lt;/p&gt;

&lt;p&gt;This is the double link. Change the spec, review the tests. Change the tests, review the spec. If you have not touched the spec, why would you touch the test?&lt;/p&gt;

&lt;p&gt;This is where the "how many tests are enough?" question changed for me. Before, the answer was always vibes. Write enough tests. Cover important paths. Don't overdo it. Be pragmatic.&lt;/p&gt;

&lt;p&gt;All true, and also not very helpful.&lt;/p&gt;

&lt;p&gt;Now I think about it differently. Enough tests means enough evidence that every condition I described, or every condition my code actually contains, can independently affect the behavior I care about.&lt;/p&gt;

&lt;p&gt;It is not about how many tests I have. It is about whether I really, really trust my system and whether it actually does what I described.&lt;/p&gt;

&lt;h2&gt;
  
  
  The true challenge is legacy
&lt;/h2&gt;

&lt;p&gt;You can always start a new project and have a really nice experience with all of this. But the true challenge lies in the big legacy projects. They make up like 90% of all software. They bring the majority of the money. And they are the ones where wrong behavior actually hurts.&lt;/p&gt;

&lt;p&gt;I work with very complex software. At Tyk, we build API gateway software used by banks, governments, and other serious enterprise customers. I am a very sceptical person. I always want some proof. At the same time, I understand that software is always about compromises.&lt;/p&gt;

&lt;p&gt;But the game is changing. What was not possible in the past is now possible for small teams in terms of quality and processes. The wind is changing with AI.&lt;/p&gt;

&lt;p&gt;The true power happens when you can apply some of those approaches to legacy large enterprise codebases. If it works there, it will work everywhere.&lt;/p&gt;

&lt;p&gt;I know how challenging it is. You cannot do it in one go. You cannot just make a switch and start using a new process.&lt;/p&gt;

&lt;p&gt;This is not only about the technical part. It is also about the people part. Even at the size of Tyk, with like a hundred people, it is not about the implementation. It is about the processes and the people. The technical part is the easiest one.&lt;/p&gt;

&lt;p&gt;In order to convince people that you can actually make it, you need to be able to do it in parts. Start small, then scale.&lt;/p&gt;

&lt;p&gt;Can you take small parts, turn them into a repeatable process, and then start scaling? That is how it works in the majority of cases.&lt;/p&gt;

&lt;p&gt;So I picked the policy engine. Authorization and gateway policy decisions are obviously critical. If the policy engine behaves incorrectly, you are not talking about a cosmetic bug.&lt;/p&gt;

&lt;p&gt;I applied the same kind of thinking to the Tyk policy package in a public PR: &lt;a href="https://github.com/TykTechnologies/tyk/pull/7932" rel="noopener noreferrer"&gt;TykTechnologies/tyk#7932&lt;/a&gt;.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Before / main&lt;/th&gt;
&lt;th&gt;After / PR&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Standard Go statement coverage&lt;/td&gt;
&lt;td&gt;81.0%&lt;/td&gt;
&lt;td&gt;99.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MC/DC decisions&lt;/td&gt;
&lt;td&gt;74/115 = 64.3%&lt;/td&gt;
&lt;td&gt;111/111 = 100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MC/DC conditions&lt;/td&gt;
&lt;td&gt;95/142 = 66.9%&lt;/td&gt;
&lt;td&gt;137/137 = 100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MC/DC gaps&lt;/td&gt;
&lt;td&gt;41 incomplete decisions, 47 missing condition proofs&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;81% ordinary coverage. 64.3% decision coverage. The normal coverage number says most statements ran. The MC/DC number says a lot of policy decisions still do not have independent evidence.&lt;/p&gt;

&lt;p&gt;For a policy engine, the second number is the one I care about.&lt;/p&gt;

&lt;h2&gt;
  
  
  Code coverage is not about a metric
&lt;/h2&gt;

&lt;p&gt;It is about trust.&lt;/p&gt;

&lt;p&gt;What do we trust? In classical software engineering, we say: here is the code and here are the tests, the tests are the source of truth. If you want to know how the system works, read the tests.&lt;/p&gt;

&lt;p&gt;I do not believe that anymore. Not with AI writing code. Not with AI writing tests. Not with AI validating its own assumptions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The source of truth cannot just be tests anymore. AI can write those too.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A passing test can prove that the code agrees with the test. It cannot prove that both agree with my intent.&lt;/p&gt;

&lt;p&gt;So I moved the source of truth up. For me, it has to be the specification: the static description of what I expect the system to do.&lt;/p&gt;

&lt;p&gt;Then code implements it. Tests witness it. Coverage measures evidence around it. Traceability keeps the chain from silently rotting.&lt;/p&gt;

&lt;p&gt;I started this whole journey because of one CVE in a library I wrote six years ago. I ended up in a completely different place.&lt;/p&gt;

&lt;p&gt;I thought the problem was in the code. It was in what I forgot to describe.&lt;/p&gt;

&lt;p&gt;I thought coverage was the answer. It was the wrong question.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://blog.reqproof.com/p/ai-writes-your-code-nobody-verifies" rel="noopener noreferrer"&gt;first article&lt;/a&gt; was about losing intent. This one is about binding intent back to code.&lt;/p&gt;

&lt;p&gt;Originally published on substack: &lt;a href="https://blog.reqproof.com/p/i-had-near-100-test-coverage-it-didnt" rel="noopener noreferrer"&gt;https://blog.reqproof.com/p/i-had-near-100-test-coverage-it-didnt&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>AI Made Implementation Faster. Verification Is Still the Bottleneck</title>
      <dc:creator>Leonid Bugaev</dc:creator>
      <pubDate>Thu, 23 Apr 2026 15:35:12 +0000</pubDate>
      <link>https://dev.to/leonidbugaev/ai-made-implementation-faster-verification-is-still-the-bottleneck-2o89</link>
      <guid>https://dev.to/leonidbugaev/ai-made-implementation-faster-verification-is-still-the-bottleneck-2o89</guid>
      <description>&lt;p&gt;AI made implementation dramatically faster.&lt;/p&gt;

&lt;p&gt;Trust did not.&lt;/p&gt;

&lt;p&gt;I live in two different worlds now.&lt;/p&gt;

&lt;p&gt;In one, I build my own projects with AI and ship more software than ever. I have written more software in the last two years than across the rest of my career, and I have barely written any code manually in the last year.&lt;/p&gt;

&lt;p&gt;In the other, I lead engineering for software used by banks, governments, and other regulated environments, where mistakes are expensive and confidence matters more than speed.&lt;/p&gt;

&lt;p&gt;In both worlds, I keep hitting the same wall:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Implementation got dramatically faster. Trust did not.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That is the part I think the industry still keeps smoothing over.&lt;/p&gt;

&lt;h2&gt;
  
  
  Faster code generation is not faster engineering
&lt;/h2&gt;

&lt;p&gt;The current AI coding conversation often assumes that if code generation speeds up, engineering speeds up too.&lt;/p&gt;

&lt;p&gt;That is not what I see.&lt;/p&gt;

&lt;p&gt;On my own projects, I can build much faster than before. AI helps me move quickly, clean things up, write tests, refactor, and push ideas further in less time.&lt;/p&gt;

&lt;p&gt;But it also asks me to trust more.&lt;/p&gt;

&lt;p&gt;I am not just delegating typing.&lt;/p&gt;

&lt;p&gt;I am delegating thinking, validation, and judgment too.&lt;/p&gt;

&lt;p&gt;And I am still not sure where the safe line is.&lt;/p&gt;

&lt;p&gt;In enterprise software, the picture is different but the problem is the same.&lt;/p&gt;

&lt;p&gt;AI absolutely helped us in some areas. It reduced noise. It reduced interruption-based work. It helped other teams answer questions about system behavior without constantly pulling senior engineers into ad hoc investigations.&lt;/p&gt;

&lt;p&gt;That mattered.&lt;/p&gt;

&lt;p&gt;People were less interrupted. Context switching got better. Engineers were happier.&lt;/p&gt;

&lt;p&gt;But it did not suddenly make us ship features 2x faster.&lt;/p&gt;

&lt;p&gt;Not even close.&lt;/p&gt;

&lt;p&gt;Because implementation was never the whole job.&lt;/p&gt;

&lt;p&gt;Verification is the bigger slice.&lt;/p&gt;

&lt;h2&gt;
  
  
  The verification gap
&lt;/h2&gt;

&lt;p&gt;The phrase I keep coming back to is:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;verification gap&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;By that I mean the distance between what I intend the software to do and what I can actually prove about its behavior.&lt;/p&gt;

&lt;p&gt;Between intended behavior and demonstrated behavior.&lt;/p&gt;

&lt;p&gt;That gap always existed.&lt;/p&gt;

&lt;p&gt;AI did not invent it.&lt;/p&gt;

&lt;p&gt;It amplified it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why AI makes this problem worse
&lt;/h2&gt;

&lt;p&gt;When humans wrote the code, the same brain often held the intent, the implementation, and the validation loop together.&lt;/p&gt;

&lt;p&gt;Not perfectly. People still shipped bugs. Specs were incomplete. Tests missed things.&lt;/p&gt;

&lt;p&gt;But there was at least one place where the system could be understood as a whole: the person writing it.&lt;/p&gt;

&lt;p&gt;That is no longer the default.&lt;/p&gt;

&lt;p&gt;Now the human writes the prompt.&lt;/p&gt;

&lt;p&gt;The model writes the code.&lt;/p&gt;

&lt;p&gt;The model writes the tests.&lt;/p&gt;

&lt;p&gt;The human skims the diff.&lt;/p&gt;

&lt;p&gt;The model writes the cleanup.&lt;/p&gt;

&lt;p&gt;The CI passes.&lt;/p&gt;

&lt;p&gt;The feature ships.&lt;/p&gt;

&lt;p&gt;And if the original intent was slightly wrong, incomplete, or misunderstood, that mistake does not stay in one place anymore.&lt;/p&gt;

&lt;p&gt;It gets propagated through the whole stack.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The plan is based on the wrong assumption.&lt;/li&gt;
&lt;li&gt;The implementation is based on the wrong assumption.&lt;/li&gt;
&lt;li&gt;The tests are based on the wrong assumption.&lt;/li&gt;
&lt;li&gt;The documentation often reflects the same wrong assumption.&lt;/li&gt;
&lt;li&gt;The "manual validation" is often the same model being asked to sanity-check itself.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At that point, what exactly are we proving?&lt;/p&gt;

&lt;p&gt;Often just that the system is internally consistent with the assumption it invented for itself.&lt;/p&gt;

&lt;p&gt;Not that it matches our intent.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bug free is not the same as intent-correct
&lt;/h2&gt;

&lt;p&gt;This is why I think a lot of AI productivity discourse still misses the real problem.&lt;/p&gt;

&lt;p&gt;People say: just write better tests.&lt;/p&gt;

&lt;p&gt;I do write tests.&lt;/p&gt;

&lt;p&gt;AI writes tests for me too.&lt;/p&gt;

&lt;p&gt;That is not the point.&lt;/p&gt;

&lt;p&gt;Tests verify behavior for cases somebody thought of.&lt;/p&gt;

&lt;p&gt;That somebody used to be a human.&lt;/p&gt;

&lt;p&gt;Now it is often a human plus a model.&lt;/p&gt;

&lt;p&gt;That is still not the same thing as verifying intent.&lt;/p&gt;

&lt;p&gt;You can have 100% line coverage and still miss the thing that matters.&lt;/p&gt;

&lt;p&gt;You can have a green CI run and still not know whether the software behaves the way you intended.&lt;/p&gt;

&lt;p&gt;A green pipeline can still be a polished misunderstanding.&lt;/p&gt;

&lt;p&gt;Bug free is not the same as intent-correct.&lt;/p&gt;

&lt;h2&gt;
  
  
  Software is not flat. It is layers.
&lt;/h2&gt;

&lt;p&gt;This gets worse as software gets bigger.&lt;/p&gt;

&lt;p&gt;Software is not flat.&lt;/p&gt;

&lt;p&gt;It is layers.&lt;/p&gt;

&lt;p&gt;It is wide, deep, and full of interacting components, hidden assumptions, old decisions nobody remembers, backwards compatibility constraints, and behavior that only makes sense if you know four other subsystems.&lt;/p&gt;

&lt;p&gt;Any project that lives long enough eventually reaches a point where one brain is no longer enough.&lt;/p&gt;

&lt;p&gt;That was true before AI.&lt;/p&gt;

&lt;p&gt;It is still true now.&lt;/p&gt;

&lt;p&gt;AI does not remove that limit.&lt;/p&gt;

&lt;p&gt;In some cases it makes you hit it faster, because you can generate change faster than you can understand its consequences.&lt;/p&gt;

&lt;p&gt;A lot of our engineering process exists because of this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CI/CD&lt;/li&gt;
&lt;li&gt;QA&lt;/li&gt;
&lt;li&gt;RFCs&lt;/li&gt;
&lt;li&gt;architecture reviews&lt;/li&gt;
&lt;li&gt;team boundaries&lt;/li&gt;
&lt;li&gt;approval workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are not random rituals.&lt;/p&gt;

&lt;p&gt;They are patches over the same underlying problem: software complexity grows beyond what one brain can safely manage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where does intent actually live?
&lt;/h2&gt;

&lt;p&gt;I think mainstream software engineering is still missing something fundamental.&lt;/p&gt;

&lt;p&gt;We do not maintain a real source of truth for intent.&lt;/p&gt;

&lt;p&gt;If I ask where the intended behavior of a system lives right now, the honest answer in most teams is:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;all of it combined badly&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Some of it is in source code.&lt;/p&gt;

&lt;p&gt;Some of it is in tests.&lt;/p&gt;

&lt;p&gt;Some of it is in RFCs.&lt;/p&gt;

&lt;p&gt;Some of it is in Jira tickets.&lt;/p&gt;

&lt;p&gt;Some of it is in Confluence.&lt;/p&gt;

&lt;p&gt;Some of it is in the heads of senior engineers.&lt;/p&gt;

&lt;p&gt;None of those is the place where I can go and see, clearly, how the system is supposed to behave right now.&lt;/p&gt;

&lt;p&gt;That is not a source of truth.&lt;/p&gt;

&lt;p&gt;That is archaeology.&lt;/p&gt;

&lt;p&gt;And that feels like a major difference between mainstream software and more regulated domains like aerospace or automotive, where intended behavior is at least treated as a first-class artifact.&lt;/p&gt;

&lt;p&gt;In mainstream software, especially in large, complex systems, we mostly reconstruct intent after the fact from scattered artifacts.&lt;/p&gt;

&lt;p&gt;And then we act surprised when regressions keep happening.&lt;/p&gt;

&lt;h2&gt;
  
  
  So what is the actual bottleneck now?
&lt;/h2&gt;

&lt;p&gt;If a feature can be implemented in hours instead of weeks, why have so many teams not seen the full payoff?&lt;/p&gt;

&lt;p&gt;Because implementation was never the only bottleneck.&lt;/p&gt;

&lt;p&gt;The harder part is deciding what should be built, making that intent explicit enough, and then verifying that the resulting system still matches it after the code, tests, and surrounding context have all changed.&lt;/p&gt;

&lt;p&gt;That is where the time goes.&lt;/p&gt;

&lt;p&gt;That is why I think AI did not remove the hard part of engineering.&lt;/p&gt;

&lt;p&gt;It moved it from writing to verification.&lt;/p&gt;

&lt;p&gt;If you want the next essays on this topic, subscribe on Substack: &lt;a href="https://blog.reqproof.com/p/ai-writes-your-code-nobody-verifies" rel="noopener noreferrer"&gt;https://blog.reqproof.com/p/ai-writes-your-code-nobody-verifies&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
