<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ross</title>
    <description>The latest articles on DEV Community by Ross (@rbuckley_).</description>
    <link>https://dev.to/rbuckley_</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3972517%2F35eec4be-14cf-4416-9f4b-5ca626efc709.png</url>
      <title>DEV Community: Ross</title>
      <link>https://dev.to/rbuckley_</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/rbuckley_"/>
    <language>en</language>
    <item>
      <title>I Got GitLab and Airbyte Running Locally, and Realised READMEs Aren’t Enough</title>
      <dc:creator>Ross</dc:creator>
      <pubDate>Sun, 14 Jun 2026 20:52:53 +0000</pubDate>
      <link>https://dev.to/rbuckley_/i-got-gitlab-and-airbyte-running-locally-and-realised-readmes-arent-enough-ip7</link>
      <guid>https://dev.to/rbuckley_/i-got-gitlab-and-airbyte-running-locally-and-realised-readmes-arent-enough-ip7</guid>
      <description>&lt;p&gt;I Got GitLab and Airbyte Running Locally in Under 30 Minutes, then Built BootProof to Prove It&lt;/p&gt;

&lt;p&gt;Every developer knows the weird optimism of cloning a new repo.&lt;/p&gt;

&lt;p&gt;You find something useful on GitHub. The project looks active. The README looks clear enough. There is a neat little “Getting Started” section, and for a few seconds you believe this is going to be one of those rare, beautiful moments where the commands just work.&lt;/p&gt;

&lt;p&gt;git clone ...&lt;br&gt;
npm install&lt;br&gt;
npm run dev&lt;/p&gt;

&lt;p&gt;Then the real project reveals itself.&lt;/p&gt;

&lt;p&gt;Your Node version is wrong. Or your pnpm version is wrong. Or Docker is running, but not in the way the project expects. Postgres exists, but the role does not. Redis is missing. A migration fails halfway through. The app starts, but the browser shows nothing. A container is technically “up”, but the service is not healthy. A command exits cleanly, but there is no proof that the application actually booted.&lt;/p&gt;

&lt;p&gt;This is the point where the README stops being a guide and becomes more like an archaeological clue.&lt;/p&gt;

&lt;p&gt;I hit this problem again and again while trying to run larger open-source projects locally. GitLab. Airbyte. Real projects with real complexity. These are not badly made repos. They are serious pieces of software. But serious software accumulates assumptions: local services, exact versions, database state, orchestration tools, environment variables, ports, health checks, hidden setup steps, and maintainer knowledge that is obvious only after you already know it.&lt;/p&gt;

&lt;p&gt;What bothered me was not just that things failed. Developers can handle failure. What bothered me was how unclear the truth was.&lt;/p&gt;

&lt;p&gt;Did the app fail because my machine was wrong? Because the docs were out of date? Because I used the wrong command? Because a service was missing? Because a dependency was skipped? Because a process started but the app never actually became usable?&lt;/p&gt;

&lt;p&gt;And the worst version of that problem is when a tool, or an AI agent, confidently tells you it worked.&lt;/p&gt;

&lt;p&gt;Because “a command ran” is not the same as “the app booted”.&lt;/p&gt;

&lt;p&gt;That gap is what made me build BootProof.&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/bootproof/bootproof" rel="noopener noreferrer"&gt;https://github.com/bootproof/bootproof&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;BootProof is a CLI for one painfully simple question:&lt;/p&gt;

&lt;p&gt;Can this repo be proven to boot?&lt;/p&gt;

&lt;p&gt;Not guessed. Not assumed. Not “probably running”. Proven.&lt;/p&gt;

&lt;p&gt;The idea came out of the GitLab and Airbyte tests because they exposed the real problem so clearly. Getting a big repo running locally is not just about finding the right command. It is about building a chain of evidence. What does the repo declare? What does the environment actually have? What did we run? What failed? What responded? What can we safely say happened?&lt;/p&gt;

&lt;p&gt;With Airbyte, for example, the answer was not a simple package install and dev command. It involved a more specific local orchestration path with abctl, Kind, Helm, Temporal, containers, and actual service health. With GitLab, the local setup exposed a different kind of complexity: system dependencies, database expectations, build steps, service assumptions, and all the little pieces of hidden knowledge that usually live in someone’s head.&lt;/p&gt;

&lt;p&gt;Those experiences made one thing obvious to me: a fake green check is worse than a failure.&lt;/p&gt;

&lt;p&gt;A failure tells you there is still work to do. A fake success wastes your time, corrupts your confidence, and sends you in the wrong direction. That is even more dangerous now that AI agents are starting to clone repos, install dependencies, run apps, make changes, and tell us things are done.&lt;/p&gt;

&lt;p&gt;Done according to what?&lt;/p&gt;

&lt;p&gt;Did the app actually respond? Did it pass a health check? Did the tool skip the install? Did it invent a missing .env value? Did it silently assume a port? Did it mistake a long-running process for a working application?&lt;/p&gt;

&lt;p&gt;BootProof is my attempt to put a hard boundary around that moment.&lt;/p&gt;

&lt;p&gt;The rule is simple:&lt;/p&gt;

&lt;p&gt;No proof, no green check.&lt;/p&gt;

&lt;p&gt;BootProof inspects a repository, builds a run plan from the evidence it can see, runs only what it can justify, checks for real HTTP health, and writes an attestation of what actually happened.&lt;/p&gt;

&lt;p&gt;A good result should not just be:&lt;/p&gt;

&lt;p&gt;command completed&lt;/p&gt;

&lt;p&gt;It should be something closer to:&lt;/p&gt;

&lt;p&gt;BOOTED&lt;br&gt;
Observed HTTP 200&lt;br&gt;
Evidence written to .bootproof/attestation.json&lt;/p&gt;

&lt;p&gt;And when BootProof cannot prove the repo booted, it should say so clearly:&lt;/p&gt;

&lt;p&gt;NOT VERIFIED - package_manager_version_mismatch&lt;/p&gt;

&lt;p&gt;or:&lt;/p&gt;

&lt;p&gt;NOT VERIFIED - remote_code_execution_blocked&lt;/p&gt;

&lt;p&gt;or:&lt;/p&gt;

&lt;p&gt;NOT VERIFIED - orchestration_not_supported&lt;/p&gt;

&lt;p&gt;That might sound less exciting than a tool that claims to run everything, but I think it is much more useful. BootProof is designed to be useful even when it refuses. Especially when it refuses. A clear refusal with evidence is a better developer experience than another vague terminal failure or a confident lie.&lt;/p&gt;

&lt;p&gt;That is the part I think a lot of developers will recognise.&lt;/p&gt;

&lt;p&gt;The pain is not just that repos fail to run. The pain is that the failure has no shape. You are left staring at logs trying to work out whether you are one command away from success or three hours deep in the wrong setup path. You do not know whether to fix your machine, change the command, read more docs, open an issue, or give up.&lt;/p&gt;

&lt;p&gt;BootProof tries to make that failure legible.&lt;/p&gt;

&lt;p&gt;It is not magic. It is not finished. It is not claiming to run every repo on GitHub. It is not a replacement for good documentation, Docker, CI, or maintainers who know their projects inside out. I see it more as a truth layer for repo onboarding.&lt;/p&gt;

&lt;p&gt;The workflow I want is:&lt;/p&gt;

&lt;p&gt;inspect&lt;br&gt;
plan&lt;br&gt;
run&lt;br&gt;
observe&lt;br&gt;
attest&lt;/p&gt;

&lt;p&gt;If it boots, prove it. If it does not, explain why.&lt;/p&gt;

&lt;p&gt;You can run it against a local repo like this:&lt;/p&gt;

&lt;p&gt;bootproof up .&lt;/p&gt;

&lt;p&gt;For CI-style output:&lt;/p&gt;

&lt;p&gt;bootproof up . --ci --json&lt;/p&gt;

&lt;p&gt;For local execution with explicit consent:&lt;/p&gt;

&lt;p&gt;bootproof up . --provider local --unsafe-local --install&lt;/p&gt;

&lt;p&gt;The output is intended to be useful to humans, but also structured enough for machines. That matters because I think AI coding agents need something like this. If an agent says “the repo runs”, I do not want vibes. I want a receipt.&lt;/p&gt;

&lt;p&gt;I want to know what it inferred, what it ran, what it refused, what responded, and where the evidence is.&lt;/p&gt;

&lt;p&gt;That is the bigger idea behind BootProof. It is not just a local run helper. It is a way of making repo bootstrapping auditable. Not in a heavy enterprise way. In a practical developer way. The kind of thing you wish you had when you are 40 minutes into a broken setup and can no longer tell whether you are making progress or just generating new errors.&lt;/p&gt;

&lt;p&gt;I am posting this because I want BootProof tested on real repos, not just clean examples.&lt;/p&gt;

&lt;p&gt;Try it on the repo that annoyed you last month. Try it on the project you starred but never managed to run. Try it on the monorepo where the README is nearly right but not quite. Try it on the app that starts a process but never opens in the browser. Try it on the thing you gave up on because you could not tell what was missing.&lt;/p&gt;

&lt;p&gt;The repo is here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/bootproof/bootproof" rel="noopener noreferrer"&gt;https://github.com/bootproof/bootproof&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Clone it:&lt;/p&gt;

&lt;p&gt;git clone &lt;a href="https://github.com/bootproof/bootproof.git" rel="noopener noreferrer"&gt;https://github.com/bootproof/bootproof.git&lt;/a&gt;&lt;br&gt;
cd bootproof&lt;br&gt;
npm install&lt;br&gt;
npm run build&lt;br&gt;
npm test&lt;/p&gt;

&lt;p&gt;Then point it at something painful:&lt;/p&gt;

&lt;p&gt;bootproof up /path/to/that/repo&lt;/p&gt;

&lt;p&gt;If it boots, I want to know.&lt;/p&gt;

&lt;p&gt;If it refuses correctly, I want to know.&lt;/p&gt;

&lt;p&gt;If it gets confused, I really want to know, because that is where the next detector should be built.&lt;/p&gt;

&lt;p&gt;Drop a repo in the comments that has been painful to run locally, and I will try to run it through BootProof.&lt;/p&gt;

&lt;p&gt;Because repo onboarding should not depend on hope, terminal archaeology, or fake green checks.&lt;/p&gt;

&lt;p&gt;It should have proof.&lt;/p&gt;

&lt;p&gt;Proof, not vibes.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>productivity</category>
      <category>showdev</category>
      <category>tooling</category>
    </item>
    <item>
      <title>AI Agents Don’t Need More Prompts. They Need Execution Boundaries.</title>
      <dc:creator>Ross</dc:creator>
      <pubDate>Sun, 07 Jun 2026 12:51:34 +0000</pubDate>
      <link>https://dev.to/rbuckley_/ai-agents-dont-need-more-prompts-they-need-execution-boundaries-5acg</link>
      <guid>https://dev.to/rbuckley_/ai-agents-dont-need-more-prompts-they-need-execution-boundaries-5acg</guid>
      <description>&lt;p&gt;AI agents are moving from chat into action.&lt;/p&gt;

&lt;p&gt;They can call tools, send emails, update records, delete data, trigger workflows, deploy code, issue refunds, change IAM permissions, and interact with MCP servers.&lt;/p&gt;

&lt;p&gt;That shift is powerful.&lt;/p&gt;

&lt;p&gt;It is also where things start to get dangerous.&lt;/p&gt;

&lt;p&gt;Most AI safety conversations still focus on the model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Can we make the model follow instructions?&lt;/li&gt;
&lt;li&gt;Can we stop prompt injection?&lt;/li&gt;
&lt;li&gt;Can we make the agent reason better?&lt;/li&gt;
&lt;li&gt;Can we stop it hallucinating?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those questions matter.&lt;/p&gt;

&lt;p&gt;But they miss the moment that matters most:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What happens when the agent is about to actually do something?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Because at that point, the prompt is no longer the control surface.&lt;/p&gt;

&lt;p&gt;The execution boundary is.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem: the agent can be wrong, but the side effect still happens
&lt;/h2&gt;

&lt;p&gt;Imagine an agent connected to a refund tool.&lt;/p&gt;

&lt;p&gt;The user asks:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Refund order ord-123 for £25.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The agent correctly calls:&lt;/p&gt;

&lt;p&gt;python issue_refund(order_id="ord-123", amount_cents=2500) &lt;/p&gt;

&lt;p&gt;Fine.&lt;/p&gt;

&lt;p&gt;But now imagine the agent is prompt-injected, confused, compromised, or just wrong.&lt;/p&gt;

&lt;p&gt;It calls:&lt;/p&gt;

&lt;p&gt;python issue_refund(order_id="ord-456", amount_cents=250000) &lt;/p&gt;

&lt;p&gt;Or it repeats the same refund twice.&lt;/p&gt;

&lt;p&gt;Or it uses a proof meant for one customer against another customer.&lt;/p&gt;

&lt;p&gt;Or it calls a more dangerous tool than the one the user actually authorised.&lt;/p&gt;

&lt;p&gt;At that point, another system has to decide:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Is this exact action allowed to happen?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Not “does the model seem trustworthy?”&lt;/p&gt;

&lt;p&gt;Not “did the prompt say to be careful?”&lt;/p&gt;

&lt;p&gt;Not “does this look roughly similar to the original request?”&lt;/p&gt;

&lt;p&gt;The question is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Is there valid proof for this exact action, with these exact parameters, for this exact service, right now?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If not, the side effect should not execute.&lt;/p&gt;

&lt;h2&gt;
  
  
  The idea: no valid proof, no execution
&lt;/h2&gt;

&lt;p&gt;I’ve been working on an open-source project called Actenon Kernel.&lt;/p&gt;

&lt;p&gt;The idea is simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;AI agents can propose actions. Protected systems decide whether those actions are allowed to execute.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Actenon is not a prompt filter.&lt;/p&gt;

&lt;p&gt;It is not an output moderator.&lt;/p&gt;

&lt;p&gt;It does not try to make the model truthful.&lt;/p&gt;

&lt;p&gt;It sits at the execution boundary and refuses consequential actions unless the caller presents a cryptographic proof bound to the exact action being attempted.&lt;/p&gt;

&lt;p&gt;That proof can bind:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the action name&lt;/li&gt;
&lt;li&gt;the capability&lt;/li&gt;
&lt;li&gt;the exact parameters&lt;/li&gt;
&lt;li&gt;the target resource&lt;/li&gt;
&lt;li&gt;the intended audience/service&lt;/li&gt;
&lt;li&gt;expiry time&lt;/li&gt;
&lt;li&gt;replay protection&lt;/li&gt;
&lt;li&gt;policy or approval evidence&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the proof is missing, expired, replayed, audience-mismatched, malformed, or bound to different parameters, the action is refused before the side effect.&lt;/p&gt;

&lt;h2&gt;
  
  
  A tiny example
&lt;/h2&gt;

&lt;p&gt;The mental model looks like this:&lt;/p&gt;

&lt;p&gt;python from actenon import ActenonGate  gate = ActenonGate.local_dev(audience="service:refunds")  action = gate.build_action(     "refund.issue",     "payment.refund",     {"order_id": "ord-123", "amount_cents": 2500},     target_type="order",     target_id="ord-123",     tenant_id="demo",     requester_id="support-agent", )  # Local demo only. # In production, this proof would be minted by your auth layer, # policy engine, approval workflow, or control plane. proof = gate.mint_proof(action)  outcome = gate.protect(     action,     proof,     lambda: issue_refund("ord-123", 2500),     audience="service:refunds", ) &lt;/p&gt;

&lt;p&gt;The important part is the lambda.&lt;/p&gt;

&lt;p&gt;If the proof does not validate, that function never runs.&lt;/p&gt;

&lt;p&gt;The model can ask.&lt;/p&gt;

&lt;p&gt;The boundary decides.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters for MCP and agent tools
&lt;/h2&gt;

&lt;p&gt;MCP makes it easier for agents to reach tools.&lt;/p&gt;

&lt;p&gt;That is useful.&lt;/p&gt;

&lt;p&gt;But it also means a model-visible tool can become a bridge into real systems: filesystems, databases, CRMs, terminals, deployment pipelines, payment systems, and internal admin workflows.&lt;/p&gt;

&lt;p&gt;So the question becomes:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;How does the tool decide whether a specific call should execute?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Actenon’s answer is that the MCP tool should not rely on the model behaving correctly. It should require proof at the point of execution.&lt;/p&gt;

&lt;p&gt;A prompt-injected agent might call the tool.&lt;/p&gt;

&lt;p&gt;The tool still refuses unless the proof matches the exact action.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this is different from IAM
&lt;/h2&gt;

&lt;p&gt;IAM answers:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Who or what has access?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Actenon answers:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Is this exact agentic action authorised right now?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Those are different controls.&lt;/p&gt;

&lt;p&gt;An agent may have access to a refund API.&lt;/p&gt;

&lt;p&gt;That does not mean every refund amount, every customer, every retry, and every target should be allowed.&lt;/p&gt;

&lt;p&gt;IAM is necessary.&lt;/p&gt;

&lt;p&gt;But for autonomous or semi-autonomous agents, it is not always granular enough at execution time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Local demo
&lt;/h2&gt;

&lt;p&gt;The repo includes a tiny interactive demo:&lt;/p&gt;

&lt;p&gt;bash python examples/interactive_execution_demo.py &lt;/p&gt;

&lt;p&gt;It shows:&lt;/p&gt;

&lt;p&gt;text ✅ approved refund: ord-123 £25.00              -&amp;gt; executed 🛑 hallucinated refund: ord-456 £2,500.00       -&amp;gt; refused / INTENT_MISMATCH 🛑 replay approved refund                       -&amp;gt; refused / DUPLICATE_REPLAY 🛑 refund with no proof                         -&amp;gt; refused / PCCB_REQUIRED &lt;/p&gt;

&lt;p&gt;Only the approved action reaches the side-effect function.&lt;/p&gt;

&lt;p&gt;Everything else is dropped.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I’m looking for
&lt;/h2&gt;

&lt;p&gt;I’d love feedback from people building with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;MCP&lt;/li&gt;
&lt;li&gt;LangChain / LangGraph&lt;/li&gt;
&lt;li&gt;Claude tools&lt;/li&gt;
&lt;li&gt;OpenAI tool calling&lt;/li&gt;
&lt;li&gt;coding agents&lt;/li&gt;
&lt;li&gt;internal workflow agents&lt;/li&gt;
&lt;li&gt;agentic CI/CD&lt;/li&gt;
&lt;li&gt;AI admin tools&lt;/li&gt;
&lt;li&gt;finance, healthcare, IAM, or regulated workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The question I’m trying to sharpen is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Where should the proof boundary sit in real-world agent architectures?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Repo here, if useful:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/Actenon/actenon-kernel" rel="noopener noreferrer"&gt;https://github.com/Actenon/actenon-kernel&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The goal is not to make every agent safe.&lt;/p&gt;

&lt;p&gt;The goal is to make consequential action surfaces deterministic.&lt;/p&gt;

&lt;p&gt;No valid proof, no execution.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devops</category>
      <category>security</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
