<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Rob Kang</title>
    <description>The latest articles on DEV Community by Rob Kang (@rob_kang_7e54350f8af26743).</description>
    <link>https://dev.to/rob_kang_7e54350f8af26743</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3850226%2F8b641610-5e41-4878-bd25-ffca2c1ef726.png</url>
      <title>DEV Community: Rob Kang</title>
      <link>https://dev.to/rob_kang_7e54350f8af26743</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/rob_kang_7e54350f8af26743"/>
    <language>en</language>
    <item>
      <title>SafeBrowse: A Trust Layer for AI Browser Agents (Prevent Prompt Injection &amp; Data Exfiltration)</title>
      <dc:creator>Rob Kang</dc:creator>
      <pubDate>Mon, 30 Mar 2026 00:39:29 +0000</pubDate>
      <link>https://dev.to/rob_kang_7e54350f8af26743/safebrowse-a-trust-layer-for-ai-browser-agents-prevent-prompt-injection-data-exfiltration-3i3b</link>
      <guid>https://dev.to/rob_kang_7e54350f8af26743/safebrowse-a-trust-layer-for-ai-browser-agents-prevent-prompt-injection-data-exfiltration-3i3b</guid>
      <description>&lt;p&gt;If your agent can browse the web, download files, connect tools, and write memory, a stronger model is helpful, but it is not enough.&lt;/p&gt;

&lt;p&gt;I built &lt;strong&gt;SafeBrowse&lt;/strong&gt; to sit on the action path between an agent and risky browser-adjacent surfaces. It does not replace the planner or the model. Instead, it evaluates what the agent is trying to do and returns typed verdicts like &lt;code&gt;ALLOW&lt;/code&gt;, &lt;code&gt;BLOCK&lt;/code&gt;, &lt;code&gt;QUARANTINE_ARTIFACT&lt;/code&gt;, or &lt;code&gt;USER_CONFIRM&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The short version:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Your model decides what it wants to do.&lt;br&gt;&lt;br&gt;
SafeBrowse decides what it is allowed to do.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Today, the &lt;strong&gt;Python client is live on PyPI&lt;/strong&gt; as &lt;a href="https://pypi.org/project/safebrowse-client/" rel="noopener noreferrer"&gt;&lt;code&gt;safebrowse-client&lt;/code&gt;&lt;/a&gt;, and the full project is here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/RobKang1234/safebrowse-sdk" rel="noopener noreferrer"&gt;https://github.com/RobKang1234/safebrowse-sdk&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;PyPI: &lt;a href="https://pypi.org/project/safebrowse-client/" rel="noopener noreferrer"&gt;https://pypi.org/project/safebrowse-client/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why I built this
&lt;/h2&gt;

&lt;p&gt;A lot of agent safety discussion still sounds like "just use a better model" or "add more prompt instructions."&lt;/p&gt;

&lt;p&gt;That helps, but it does not solve the actual runtime problem.&lt;/p&gt;

&lt;p&gt;A browsing agent can still get into trouble through:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;prompt injection hidden in normal web pages&lt;/li&gt;
&lt;li&gt;poisoned PDFs or downloaded artifacts&lt;/li&gt;
&lt;li&gt;connector or tool onboarding abuse&lt;/li&gt;
&lt;li&gt;OAuth callback abuse&lt;/li&gt;
&lt;li&gt;durable memory poisoning&lt;/li&gt;
&lt;li&gt;long-context social engineering that looks operationally plausible&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those are not just model-quality problems. They are &lt;strong&gt;control-boundary problems&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;So SafeBrowse keeps the product boundary narrow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;adapters observe and propose actions&lt;/li&gt;
&lt;li&gt;SafeBrowse evaluates and constrains&lt;/li&gt;
&lt;li&gt;the planner or model stays external&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What SafeBrowse does
&lt;/h2&gt;

&lt;p&gt;SafeBrowse currently includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a TypeScript core runtime&lt;/li&gt;
&lt;li&gt;a localhost daemon&lt;/li&gt;
&lt;li&gt;a thin Python client&lt;/li&gt;
&lt;li&gt;a Playwright reference adapter&lt;/li&gt;
&lt;li&gt;policy and knowledge-base tooling&lt;/li&gt;
&lt;li&gt;a live threat lab and comparison dashboard&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The runtime evaluates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;page observations&lt;/li&gt;
&lt;li&gt;actions like navigation or sink transitions&lt;/li&gt;
&lt;li&gt;downloaded artifacts&lt;/li&gt;
&lt;li&gt;tool / connector onboarding&lt;/li&gt;
&lt;li&gt;OAuth callback flows&lt;/li&gt;
&lt;li&gt;durable memory writes&lt;/li&gt;
&lt;li&gt;replay and forensic logging&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The most important hardening in the current branch is around connector and OAuth abuse:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;verified registry-backed connector preparation&lt;/li&gt;
&lt;li&gt;exact redirect and callback-origin verification&lt;/li&gt;
&lt;li&gt;approval-bound onboarding&lt;/li&gt;
&lt;li&gt;callback verification with state binding&lt;/li&gt;
&lt;li&gt;artifact-to-tool taint propagation&lt;/li&gt;
&lt;li&gt;replay bundles with policy provenance&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why this still matters with OpenAI or Claude
&lt;/h2&gt;

&lt;p&gt;Hosted model platforms already have useful safety features. I am not claiming otherwise.&lt;/p&gt;

&lt;p&gt;But SafeBrowse is useful for a different reason: it is &lt;strong&gt;app-side enforcement&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Model-native safety helps with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;stronger refusal behavior&lt;/li&gt;
&lt;li&gt;better resistance to obvious jailbreaks&lt;/li&gt;
&lt;li&gt;moderation / guardrail layers&lt;/li&gt;
&lt;li&gt;tool approval primitives&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;SafeBrowse adds:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;deterministic allow/block decisions&lt;/li&gt;
&lt;li&gt;verified connector registry checks&lt;/li&gt;
&lt;li&gt;OAuth callback and origin validation&lt;/li&gt;
&lt;li&gt;artifact lineage and quarantine behavior&lt;/li&gt;
&lt;li&gt;memory-write policy&lt;/li&gt;
&lt;li&gt;replayable forensic logs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Better models reduce how often the agent &lt;em&gt;wants&lt;/em&gt; to do the wrong thing.&lt;/p&gt;

&lt;p&gt;SafeBrowse reduces what the agent is &lt;em&gt;allowed&lt;/em&gt; to do when it still wants the wrong thing.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I tested
&lt;/h2&gt;

&lt;p&gt;I built a live threat lab that runs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a &lt;strong&gt;raw agent&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;an &lt;strong&gt;SDK-protected agent&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;against the &lt;strong&gt;same model backend&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For the frozen model-backed snapshot in the repo, both agents used the same local Qwen backend. The point was to measure the middleware difference, not hide behind a model swap.&lt;/p&gt;

&lt;p&gt;Frozen batch summary:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;completed comparisons: &lt;code&gt;22&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;raw-agent compromises: &lt;code&gt;21&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;SDK bypasses: &lt;code&gt;0&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here are a few representative rows:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Threat&lt;/th&gt;
&lt;th&gt;Raw Agent&lt;/th&gt;
&lt;th&gt;Agent + SDK&lt;/th&gt;
&lt;th&gt;Verdict&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Visible direct override&lt;/td&gt;
&lt;td&gt;Compromised&lt;/td&gt;
&lt;td&gt;Contained&lt;/td&gt;
&lt;td&gt;&lt;code&gt;BLOCK&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hidden instruction layer&lt;/td&gt;
&lt;td&gt;Compromised&lt;/td&gt;
&lt;td&gt;Stayed read-only&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ALLOW&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Poisoned PDF handoff&lt;/td&gt;
&lt;td&gt;Compromised&lt;/td&gt;
&lt;td&gt;Quarantined&lt;/td&gt;
&lt;td&gt;&lt;code&gt;QUARANTINE_ARTIFACT&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Schema-poisoned trusted connector&lt;/td&gt;
&lt;td&gt;Compromised&lt;/td&gt;
&lt;td&gt;Contained&lt;/td&gt;
&lt;td&gt;&lt;code&gt;BLOCK&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Appendix-to-connector chain&lt;/td&gt;
&lt;td&gt;Compromised&lt;/td&gt;
&lt;td&gt;Contained&lt;/td&gt;
&lt;td&gt;&lt;code&gt;BLOCK&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Benign research page&lt;/td&gt;
&lt;td&gt;Stayed read-only&lt;/td&gt;
&lt;td&gt;Stayed read-only&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ALLOW&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The connector cases were the most interesting. In early versions, euphemistic onboarding text and schema-poisoned manifests could still push the agent toward unsafe callback flows. The hardened &lt;code&gt;v2&lt;/code&gt; path closes those by treating registry trust, approval binding, callback origin, and state as runtime-enforced constraints instead of model-accepted hints.&lt;/p&gt;

&lt;h2&gt;
  
  
  How people use it
&lt;/h2&gt;

&lt;p&gt;The Python package is intentionally thin.&lt;/p&gt;

&lt;p&gt;It is &lt;strong&gt;not&lt;/strong&gt; the full policy engine in Python. It is a client for the SafeBrowse daemon.&lt;/p&gt;

&lt;p&gt;A typical flow looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;your browser agent reads a page&lt;/li&gt;
&lt;li&gt;your app sends the observation to SafeBrowse&lt;/li&gt;
&lt;li&gt;your model proposes a next step&lt;/li&gt;
&lt;li&gt;your app asks SafeBrowse to evaluate that action&lt;/li&gt;
&lt;li&gt;your browser only executes if SafeBrowse allows it&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Quick start
&lt;/h2&gt;

&lt;p&gt;Install the Python client:&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
bash
pip install safebrowse-client
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>ai</category>
      <category>security</category>
      <category>agents</category>
      <category>agentsbrowsingsecurity</category>
    </item>
  </channel>
</rss>
