<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Anol Deb Sharma</title>
    <description>The latest articles on DEV Community by Anol Deb Sharma (@adebbyte).</description>
    <link>https://dev.to/adebbyte</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3672679%2Fdce9c41b-7df0-4725-8e27-b3d8e78457f0.png</url>
      <title>DEV Community: Anol Deb Sharma</title>
      <link>https://dev.to/adebbyte</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/adebbyte"/>
    <language>en</language>
    <item>
      <title>Why AI safety should be enforced structurally, not trained in</title>
      <dc:creator>Anol Deb Sharma</dc:creator>
      <pubDate>Sun, 21 Dec 2025 12:27:15 +0000</pubDate>
      <link>https://dev.to/adebbyte/why-ai-safety-should-be-enforced-structurally-not-trained-in-jd9</link>
      <guid>https://dev.to/adebbyte/why-ai-safety-should-be-enforced-structurally-not-trained-in-jd9</guid>
      <description>&lt;p&gt;Most current AI safety work assumes an unsafe system and tries to train better behavior into it.&lt;/p&gt;

&lt;p&gt;We add more data.&lt;br&gt;
We add more constraints.&lt;br&gt;
We add more fine-tuning, filters, reward shaping, and guardrails.&lt;/p&gt;

&lt;p&gt;This approach treats safety as something learned, rather than something enforced.&lt;/p&gt;

&lt;p&gt;I want to argue that this is a fundamental mistake.&lt;/p&gt;

&lt;p&gt;The core problem&lt;/p&gt;

&lt;p&gt;Learning systems are, by design, adaptive.&lt;/p&gt;

&lt;p&gt;If safety exists only as a learned behavior:&lt;/p&gt;

&lt;p&gt;it can be overridden,&lt;/p&gt;

&lt;p&gt;it can be forgotten,&lt;/p&gt;

&lt;p&gt;it can be optimized against,&lt;/p&gt;

&lt;p&gt;and it can fail silently.&lt;/p&gt;

&lt;p&gt;This is not a hypothetical concern. We already see:&lt;/p&gt;

&lt;p&gt;reward hacking,&lt;/p&gt;

&lt;p&gt;goal drift,&lt;/p&gt;

&lt;p&gt;brittle alignment,&lt;/p&gt;

&lt;p&gt;systems that appear aligned until conditions change.&lt;/p&gt;

&lt;p&gt;In other words, we are asking learning systems to reliably preserve properties that should be invariants.&lt;/p&gt;

&lt;p&gt;An analogy from software systems&lt;/p&gt;

&lt;p&gt;In software engineering, we do not “train” memory safety into a program.&lt;/p&gt;

&lt;p&gt;We enforce it:&lt;/p&gt;

&lt;p&gt;via type systems,&lt;/p&gt;

&lt;p&gt;via memory models,&lt;/p&gt;

&lt;p&gt;via access control,&lt;/p&gt;

&lt;p&gt;via architectural boundaries.&lt;/p&gt;

&lt;p&gt;You cannot accidentally write outside a protected memory region because the structure of the system disallows it.&lt;/p&gt;

&lt;p&gt;AI safety deserves the same treatment.&lt;/p&gt;

&lt;p&gt;Structural safety vs behavioral safety&lt;/p&gt;

&lt;p&gt;Behavioral safety says:&lt;/p&gt;

&lt;p&gt;“The system behaves safely because it has learned to.”&lt;/p&gt;

&lt;p&gt;Structural safety says:&lt;/p&gt;

&lt;p&gt;“The system cannot behave unsafely because it is not architecturally allowed to.”&lt;/p&gt;

&lt;p&gt;These are very different guarantees.&lt;/p&gt;

&lt;p&gt;Behavioral safety is probabilistic.&lt;br&gt;
Structural safety is enforceable.&lt;/p&gt;

&lt;p&gt;What does “structural safety” mean for AI systems?&lt;/p&gt;

&lt;p&gt;Some concrete examples:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Auditable internal state&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If a system’s internal reasoning cannot be inspected, safety evaluation is guesswork.&lt;/p&gt;

&lt;p&gt;Auditability should not be optional or post-hoc.&lt;br&gt;
It should be a first-class design requirement:&lt;/p&gt;

&lt;p&gt;persistent internal state,&lt;/p&gt;

&lt;p&gt;traceable decision pathways,&lt;/p&gt;

&lt;p&gt;explicit representations of confidence and uncertainty.&lt;/p&gt;

&lt;p&gt;If you cannot inspect why a system acted, you cannot meaningfully govern it.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Bounded self-revision&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Self-modifying systems are inevitable if we want long-horizon learning.&lt;/p&gt;

&lt;p&gt;But unrestricted self-modification is indistinguishable from loss of control.&lt;/p&gt;

&lt;p&gt;Structural safety means:&lt;/p&gt;

&lt;p&gt;defining which parts of the system may change,&lt;/p&gt;

&lt;p&gt;when they may change,&lt;/p&gt;

&lt;p&gt;and under what conditions change is allowed.&lt;/p&gt;

&lt;p&gt;This is closer to governance than training.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Explicit autonomy envelopes&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Rather than a binary “autonomous vs not autonomous” switch, autonomy should be gradual and conditional.&lt;/p&gt;

&lt;p&gt;An autonomy envelope:&lt;/p&gt;

&lt;p&gt;expands when the system demonstrates reliability,&lt;/p&gt;

&lt;p&gt;contracts when uncertainty or error increases,&lt;/p&gt;

&lt;p&gt;can freeze behavior entirely when trust collapses.&lt;/p&gt;

&lt;p&gt;This is not learned morality.&lt;br&gt;
It is a control system.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Governance layers that can veto actions&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Safety mechanisms should be able to block actions, not merely advise against them.&lt;/p&gt;

&lt;p&gt;A system that can explain why an action is unsafe but still execute it has no real safety boundary.&lt;/p&gt;

&lt;p&gt;Governance must be upstream of action execution, not downstream of evaluation.&lt;/p&gt;

&lt;p&gt;Why training alone is insufficient&lt;/p&gt;

&lt;p&gt;Training is optimization.&lt;/p&gt;

&lt;p&gt;Optimization pressure eventually finds shortcuts.&lt;/p&gt;

&lt;p&gt;If safety constraints exist only in the reward function or data distribution, they are part of what the system learns to navigate, not necessarily preserve.&lt;/p&gt;

&lt;p&gt;This is why:&lt;/p&gt;

&lt;p&gt;alignment degrades under distribution shift,&lt;/p&gt;

&lt;p&gt;systems behave well in evals but fail in the wild,&lt;/p&gt;

&lt;p&gt;interpretability often becomes retrospective rather than preventative.&lt;/p&gt;

&lt;p&gt;A different research direction&lt;/p&gt;

&lt;p&gt;Instead of asking:&lt;/p&gt;

&lt;p&gt;“How do we train systems to be safe?”&lt;/p&gt;

&lt;p&gt;We might ask:&lt;/p&gt;

&lt;p&gt;“How do we design systems that cannot violate safety constraints by construction?”&lt;/p&gt;

&lt;p&gt;This reframes AI safety from:&lt;/p&gt;

&lt;p&gt;dataset curation,&lt;/p&gt;

&lt;p&gt;prompt engineering,&lt;/p&gt;

&lt;p&gt;post-hoc analysis,&lt;/p&gt;

&lt;p&gt;into:&lt;/p&gt;

&lt;p&gt;architecture,&lt;/p&gt;

&lt;p&gt;invariants,&lt;/p&gt;

&lt;p&gt;enforceable constraints.&lt;/p&gt;

&lt;p&gt;What I’m exploring&lt;/p&gt;

&lt;p&gt;I’ve been working on a research prototype that treats:&lt;/p&gt;

&lt;p&gt;auditability,&lt;/p&gt;

&lt;p&gt;self-explanation,&lt;/p&gt;

&lt;p&gt;bounded self-revision,&lt;/p&gt;

&lt;p&gt;and autonomy governance&lt;/p&gt;

&lt;p&gt;as architectural primitives, not learned behaviors.&lt;/p&gt;

&lt;p&gt;The goal is not performance or scale, but clarity:&lt;/p&gt;

&lt;p&gt;making internal state inspectable,&lt;/p&gt;

&lt;p&gt;making change auditable,&lt;/p&gt;

&lt;p&gt;making unsafe actions structurally impossible.&lt;/p&gt;

&lt;p&gt;This work is early, imperfect, and exploratory—but it has convinced me that safety by design is not only possible, but necessary.&lt;/p&gt;

&lt;p&gt;Open questions&lt;/p&gt;

&lt;p&gt;I don’t think the field has converged on answers yet, so I’ll end with questions rather than conclusions:&lt;/p&gt;

&lt;p&gt;What safety properties should be invariants rather than learned?&lt;/p&gt;

&lt;p&gt;How do we formally define “bounded autonomy”?&lt;/p&gt;

&lt;p&gt;Can we make governance mechanisms composable and testable?&lt;/p&gt;

&lt;p&gt;What failure modes emerge only in self-modifying systems?&lt;/p&gt;

&lt;p&gt;If you’re thinking about AI safety from a systems or architectural perspective, I’d be very interested in your thoughts.&lt;/p&gt;

&lt;p&gt;Thanks for reading.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>aisaftey</category>
      <category>architecture</category>
      <category>computerscience</category>
    </item>
  </channel>
</rss>
