<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Printo Tom</title>
    <description>The latest articles on DEV Community by Printo Tom (@printo_tom).</description>
    <link>https://dev.to/printo_tom</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3472218%2F8e04e452-db8b-46bb-bc94-b389abc2ae09.png</url>
      <title>DEV Community: Printo Tom</title>
      <link>https://dev.to/printo_tom</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/printo_tom"/>
    <language>en</language>
    <item>
      <title>Rate Limiting in C# — Don't Let Your API Get Hammered</title>
      <dc:creator>Printo Tom</dc:creator>
      <pubDate>Wed, 27 May 2026 10:01:24 +0000</pubDate>
      <link>https://dev.to/printo_tom/rate-limiting-in-c-dont-let-your-api-get-hammered-4hjj</link>
      <guid>https://dev.to/printo_tom/rate-limiting-in-c-dont-let-your-api-get-hammered-4hjj</guid>
      <description>&lt;p&gt;If you run a public API without rate limiting, it's only a matter of time before a runaway client, a misconfigured retry loop, or a well-intentioned load test brings your service to its knees. .NET 7 shipped a first-class rate-limiting API — no third-party middleware required. This post walks through every knob you can turn.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Prerequisite:&lt;/strong&gt; the built-in rate limiter lives in &lt;code&gt;System.Threading.RateLimiting&lt;/code&gt; and the ASP.NET Core middleware in &lt;code&gt;Microsoft.AspNetCore.RateLimiting&lt;/code&gt;. Both ship in the box from .NET 7 onwards.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Why rate limiting matters
&lt;/h2&gt;

&lt;p&gt;Rate limiting protects three things simultaneously: your infrastructure from overload, your downstream dependencies from fan-out abuse, and your legitimate users from a noisy neighbour hogging capacity. It also plugs a class of denial-of-service vectors that auth alone can't stop.&lt;/p&gt;




&lt;h2&gt;
  
  
  The four built-in algorithms
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Fixed window
&lt;/h3&gt;

&lt;p&gt;Permits N requests per fixed time window (e.g. 100 requests per minute, window resets on the clock boundary). Simple, low memory, but can allow 2× burst at window boundaries.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;System.Threading.RateLimiting&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;limiter&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;FixedWindowRateLimiter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;FixedWindowRateLimiterOptions&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;PermitLimit&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;Window&lt;/span&gt;               &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TimeSpan&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;FromMinutes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;QueueProcessingOrder&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;QueueProcessingOrder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;OldestFirst&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;QueueLimit&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;   &lt;span class="c1"&gt;// reject immediately when full&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Sliding window
&lt;/h3&gt;

&lt;p&gt;Divides the window into segments and tracks usage per segment. Smoother than fixed window — eliminates the boundary burst at the cost of slightly more memory.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;limiter&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;SlidingWindowRateLimiter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;SlidingWindowRateLimiterOptions&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;PermitLimit&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;Window&lt;/span&gt;               &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TimeSpan&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;FromMinutes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;SegmentsPerWindow&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;     &lt;span class="c1"&gt;// 10-second granularity&lt;/span&gt;
        &lt;span class="n"&gt;QueueProcessingOrder&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;QueueProcessingOrder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;OldestFirst&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;QueueLimit&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Token bucket
&lt;/h3&gt;

&lt;p&gt;A bucket fills with tokens at a steady rate up to a maximum. Each request consumes one token. Allows short bursts up to the bucket capacity while enforcing a long-run average. Ideal for APIs where short spikes are acceptable.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;limiter&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;TokenBucketRateLimiter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;TokenBucketRateLimiterOptions&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;TokenLimit&lt;/span&gt;               &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;// max burst&lt;/span&gt;
        &lt;span class="n"&gt;ReplenishmentPeriod&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TimeSpan&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;FromSeconds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;10&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;TokensPerPeriod&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;// ~1/s average&lt;/span&gt;
        &lt;span class="n"&gt;AutoReplenishment&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;QueueProcessingOrder&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;QueueProcessingOrder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;OldestFirst&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;QueueLimit&lt;/span&gt;               &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Concurrency limiter
&lt;/h3&gt;

&lt;p&gt;Limits simultaneous in-flight requests rather than request rate. Useful for protecting expensive operations like report generation or ML inference where time-in-system matters more than throughput.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;limiter&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;ConcurrencyLimiter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;ConcurrencyLimiterOptions&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;PermitLimit&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;QueueProcessingOrder&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;QueueProcessingOrder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;OldestFirst&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;QueueLimit&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Wiring it up in ASP.NET Core
&lt;/h2&gt;

&lt;p&gt;Register policies in &lt;code&gt;Program.cs&lt;/code&gt;, then apply them with the &lt;code&gt;[EnableRateLimiting]&lt;/code&gt; attribute or inline via &lt;code&gt;RequireRateLimiting()&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;builder&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;WebApplication&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;CreateBuilder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Services&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddRateLimiter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;options&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddFixedWindowLimiter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;policyName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"fixed"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;opt&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;opt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PermitLimit&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;100&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="n"&gt;opt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Window&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TimeSpan&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;FromMinutes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;opt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;QueueLimit&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddTokenBucketLimiter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;policyName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"burst"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;opt&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;opt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TokenLimit&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;50&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="n"&gt;opt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ReplenishmentPeriod&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TimeSpan&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;FromSeconds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;10&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;opt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TokensPerPeriod&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="n"&gt;opt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AutoReplenishment&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Build&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;UseRateLimiter&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;   &lt;span class="c1"&gt;// must come before MapControllers&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Apply to a minimal API endpoint or controller action:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Minimal API&lt;/span&gt;
&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;MapGet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/products"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;GetProducts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;RequireRateLimiting&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"fixed"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Controller&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;EnableRateLimiting&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"burst"&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;HttpGet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"search"&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="n"&gt;IActionResult&lt;/span&gt; &lt;span class="nf"&gt;Search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Per-user and per-endpoint policies
&lt;/h2&gt;

&lt;p&gt;A single global policy rarely fits real-world needs. Use &lt;code&gt;AddPolicy&lt;/code&gt; with a partition key derived from the request context:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddPolicy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"per-user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;httpContext&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt;
    &lt;span class="n"&gt;RateLimitPartition&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetTokenBucketLimiter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;partitionKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;httpContext&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Identity&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;
                      &lt;span class="p"&gt;??&lt;/span&gt; &lt;span class="n"&gt;httpContext&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Connection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RemoteIpAddress&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;ToString&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                      &lt;span class="p"&gt;??&lt;/span&gt; &lt;span class="s"&gt;"anonymous"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;factory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;TokenBucketRateLimiterOptions&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;TokenLimit&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;ReplenishmentPeriod&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TimeSpan&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;FromMinutes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;TokensPerPeriod&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;AutoReplenishment&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt;
        &lt;span class="p"&gt;}));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Tip:&lt;/strong&gt; prefer authenticated user ID over IP address as the partition key — NAT and proxies can share a single IP across hundreds of users, leading to false positives at scale.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Custom rejection responses
&lt;/h2&gt;

&lt;p&gt;By default, the middleware returns &lt;code&gt;503 Service Unavailable&lt;/code&gt;. The RFC-correct status for rate limiting is &lt;code&gt;429 Too Many Requests&lt;/code&gt; with a &lt;code&gt;Retry-After&lt;/code&gt; header:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;OnRejected&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HttpContext&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StatusCode&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;StatusCodes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Status429TooManyRequests&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Lease&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;TryGetMetadata&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;MetadataName&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RetryAfter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;out&lt;/span&gt; &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;retryAfter&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HttpContext&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Headers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="s"&gt;"Retry-After"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;retryAfter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TotalSeconds&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;ToString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;System&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Globalization&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CultureInfo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;InvariantCulture&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HttpContext&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WriteAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="s"&gt;"Rate limit exceeded. Please slow down."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Distributed scenarios &amp;amp; Redis
&lt;/h2&gt;

&lt;p&gt;The built-in limiters are in-process only — each pod maintains its own counters. In a horizontally scaled deployment, use a Redis-backed limiter via the &lt;code&gt;RedisRateLimiting&lt;/code&gt; community library, which wraps the same &lt;code&gt;RateLimiter&lt;/code&gt; abstraction:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;dotnet add package RedisRateLimiting
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Services&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddStackExchangeRedisCache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt;
    &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Configuration&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Configuration&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"Redis:Connection"&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;

&lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddPolicy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"distributed"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;httpContext&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt;
    &lt;span class="n"&gt;RedisRateLimitPartition&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetSlidingWindowRateLimiter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;partitionKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;httpContext&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Identity&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt; &lt;span class="p"&gt;??&lt;/span&gt; &lt;span class="s"&gt;"anon"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;factory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;RedisSlidingWindowRateLimiterOptions&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;ConnectionMultiplexerFactory&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt;
                &lt;span class="n"&gt;httpContext&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RequestServices&lt;/span&gt;
                    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetRequiredService&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;IConnectionMultiplexer&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;,&lt;/span&gt;
            &lt;span class="n"&gt;PermitLimit&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;Window&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TimeSpan&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;FromMinutes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Client-side resilience with Polly
&lt;/h2&gt;

&lt;p&gt;If your code &lt;em&gt;consumes&lt;/em&gt; a rate-limited API, use Polly's &lt;code&gt;RateLimiter&lt;/code&gt; strategy combined with &lt;code&gt;Retry&lt;/code&gt; to handle 429s gracefully:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;dotnet add package Polly.Extensions.Http
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="n"&gt;services&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AddHttpClient&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;IProductsClient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ProductsClient&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;()&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddResilienceHandler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"products-pipeline"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;builder&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddRateLimiter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;SlidingWindowRateLimiter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;SlidingWindowRateLimiterOptions&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="n"&gt;PermitLimit&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;Window&lt;/span&gt;            &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TimeSpan&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;FromSeconds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;10&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                    &lt;span class="n"&gt;SegmentsPerWindow&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;
                &lt;span class="p"&gt;}));&lt;/span&gt;

            &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddRetry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;HttpRetryStrategyOptions&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;MaxRetryAttempts&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;Delay&lt;/span&gt;            &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TimeSpan&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;FromSeconds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="n"&gt;BackoffType&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DelayBackoffType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Exponential&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;ShouldHandle&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ValueTask&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;FromResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Outcome&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Result&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="n"&gt;StatusCode&lt;/span&gt; &lt;span class="p"&gt;==&lt;/span&gt;
                        &lt;span class="n"&gt;HttpStatusCode&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TooManyRequests&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;});&lt;/span&gt;
        &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Choosing the right algorithm
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Algorithm&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;th&gt;Watch out for&lt;/th&gt;
&lt;th&gt;Memory cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Fixed window&lt;/td&gt;
&lt;td&gt;Simple quotas, billing tiers&lt;/td&gt;
&lt;td&gt;Boundary burst (2× spike)&lt;/td&gt;
&lt;td&gt;Very low&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sliding window&lt;/td&gt;
&lt;td&gt;Smooth public APIs&lt;/td&gt;
&lt;td&gt;Segment count × partitions&lt;/td&gt;
&lt;td&gt;Low–medium&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Token bucket&lt;/td&gt;
&lt;td&gt;Burst-tolerant consumer APIs&lt;/td&gt;
&lt;td&gt;Tuning burst vs average&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Concurrency&lt;/td&gt;
&lt;td&gt;Expensive ops (ML, reports)&lt;/td&gt;
&lt;td&gt;Doesn't bound throughput&lt;/td&gt;
&lt;td&gt;Very low&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Distributed gotcha:&lt;/strong&gt; in-process limiters per pod means a cluster of 4 replicas effectively multiplies your limit by 4. Always use a Redis-backed partitioned limiter for multi-replica deployments where correctness matters.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Wrapping up
&lt;/h2&gt;

&lt;p&gt;.NET 7+ gives you production-grade rate limiting with zero external dependencies for single-node scenarios. The four algorithms cover the full spectrum from simple quotas to burst-tolerant consumer clients. Add Redis for distributed enforcement, Polly for client-side resilience, and always return &lt;code&gt;429&lt;/code&gt; with a &lt;code&gt;Retry-After&lt;/code&gt; header — your API consumers will thank you.&lt;/p&gt;

&lt;p&gt;Questions or patterns I missed? Drop them in the comments.&lt;/p&gt;

</description>
      <category>csharp</category>
      <category>dotnet</category>
      <category>aspnetcore</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Designing the Future of Payments — Why XML Still Matters in the Age of APIs</title>
      <dc:creator>Printo Tom</dc:creator>
      <pubDate>Sat, 23 May 2026 06:22:48 +0000</pubDate>
      <link>https://dev.to/printo_tom/designing-the-future-of-payments-why-xml-still-matters-in-the-age-of-apis-4nic</link>
      <guid>https://dev.to/printo_tom/designing-the-future-of-payments-why-xml-still-matters-in-the-age-of-apis-4nic</guid>
      <description>&lt;h3&gt;
  
  
  Introduction
&lt;/h3&gt;

&lt;p&gt;In the fast‑moving world of fintech, APIs have become the poster child for innovation. They’re sleek, lightweight, and developer‑friendly. Yet beneath the surface of every instant transfer, compliance check, and cross‑border transaction lies a structured XML message — quietly ensuring that money moves safely, legally, and consistently.  &lt;/p&gt;

&lt;p&gt;XML isn’t fading away; it’s evolving. It remains the &lt;strong&gt;heartbeat of global payments&lt;/strong&gt;, and projects like &lt;strong&gt;XMLPayments&lt;/strong&gt; prove that legacy technologies can coexist with modern architectures to create something truly future‑ready.  &lt;/p&gt;




&lt;h2&gt;
  
  
  🌐 The Evolution of Payment Standards
&lt;/h2&gt;

&lt;p&gt;The financial industry has undergone a dramatic shift — from &lt;strong&gt;SOAP/XML&lt;/strong&gt; to &lt;strong&gt;REST/JSON&lt;/strong&gt;, from monolithic systems to microservices, and from manual reconciliation to real‑time orchestration. But XML continues to dominate regulated ecosystems for one simple reason: &lt;strong&gt;trust&lt;/strong&gt;.  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Schema validation&lt;/strong&gt; guarantees data integrity.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auditability&lt;/strong&gt; ensures every transaction can be traced.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Interoperability&lt;/strong&gt; allows banks, insurers, and clearing houses to communicate seamlessly.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;APIs may simplify integration, but XML ensures &lt;strong&gt;compliance and consistency&lt;/strong&gt; — the two pillars of financial reliability.  &lt;/p&gt;




&lt;h2&gt;
  
  
  🧩 Bridging Legacy and Modern Systems
&lt;/h2&gt;

&lt;p&gt;The challenge isn’t choosing between XML and APIs; it’s connecting them. XMLPayments acts as a &lt;strong&gt;bridge&lt;/strong&gt; between legacy payment rails and modern API ecosystems.  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Legacy systems still rely on XML for SWIFT, SEPA, and ISO 20022.
&lt;/li&gt;
&lt;li&gt;Modern fintech platforms demand RESTful APIs and JSON payloads.
&lt;/li&gt;
&lt;li&gt;XMLPayments connects both worlds through &lt;strong&gt;schema‑driven orchestration&lt;/strong&gt; and &lt;strong&gt;real‑time transformation&lt;/strong&gt;.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This hybrid approach allows enterprises to modernize without breaking compliance — a critical advantage in regulated environments.  &lt;/p&gt;




&lt;h2&gt;
  
  
  ⚙️ Innovation Layer: Schema‑Driven Orchestration
&lt;/h2&gt;

&lt;p&gt;At the core of XMLPayments lies an orchestration engine that validates, transforms, and routes XML messages dynamically.  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Validation:&lt;/strong&gt; Ensures every transaction meets schema and regulatory standards.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transformation:&lt;/strong&gt; Converts XML to JSON for API consumption.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Routing:&lt;/strong&gt; Directs payments to the correct clearing or compliance endpoint.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result is a seamless flow between legacy and modern systems — where &lt;strong&gt;trust meets agility&lt;/strong&gt;.  &lt;/p&gt;




&lt;h2&gt;
  
  
  🤖 Copilot’s Contribution
&lt;/h2&gt;

&lt;p&gt;Modernization is rarely linear. GitHub Copilot became the catalyst that accelerated XMLPayments’ evolution:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Suggested &lt;strong&gt;schema validators&lt;/strong&gt; and &lt;strong&gt;conversion functions&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;Generated &lt;strong&gt;unit tests&lt;/strong&gt; for XML‑to‑JSON transformations.
&lt;/li&gt;
&lt;li&gt;Helped document orchestration flows with inline comments.
&lt;/li&gt;
&lt;li&gt;Proposed &lt;strong&gt;error‑handling patterns&lt;/strong&gt; for async operations.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Copilot transformed repetitive coding into creative problem‑solving, enabling faster iteration and cleaner architecture.  &lt;/p&gt;




&lt;h2&gt;
  
  
  🚀 Vision: XML as the Foundation for Hybrid Financial Ecosystems
&lt;/h2&gt;

&lt;p&gt;The future of payments isn’t about replacing XML; it’s about &lt;strong&gt;reimagining it&lt;/strong&gt;.  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;XML provides the &lt;strong&gt;structure&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;APIs provide the &lt;strong&gt;accessibility&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;AI provides the &lt;strong&gt;intelligence&lt;/strong&gt;.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Together, they form a &lt;strong&gt;hybrid ecosystem&lt;/strong&gt; where legacy reliability meets modern innovation. XMLPayments embodies this vision — a framework that evolves with technology while preserving trust.  &lt;/p&gt;

&lt;p&gt;Imagine a world where:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;XML schemas validate transactions in milliseconds.
&lt;/li&gt;
&lt;li&gt;APIs expose those transactions securely to partners.
&lt;/li&gt;
&lt;li&gt;AI agents monitor compliance and detect anomalies in real time.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s not a distant dream — it’s the direction XMLPayments is already heading.  &lt;/p&gt;

</description>
      <category>xml</category>
      <category>fintech</category>
      <category>api</category>
      <category>github</category>
    </item>
    <item>
      <title>From Legacy to Live — Reviving XMLPayments with GitHub Copilot</title>
      <dc:creator>Printo Tom</dc:creator>
      <pubDate>Sat, 23 May 2026 06:19:41 +0000</pubDate>
      <link>https://dev.to/printo_tom/-from-legacy-to-live-reviving-xmlpayments-with-github-copilot-427c</link>
      <guid>https://dev.to/printo_tom/-from-legacy-to-live-reviving-xmlpayments-with-github-copilot-427c</guid>
      <description>&lt;h3&gt;
  
  
  Introduction
&lt;/h3&gt;

&lt;p&gt;Every developer has that one project that started with excitement but stalled before completion. For me, it was &lt;strong&gt;XMLPayments&lt;/strong&gt; — a prototype designed to orchestrate XML-based financial flows. The GitHub Finish‑Up‑A‑Thon Challenge gave me the push I needed to finally polish it up, and GitHub Copilot became my silent co‑developer.  &lt;/p&gt;

&lt;p&gt;This is the story of how XMLPayments went from &lt;strong&gt;legacy fragments&lt;/strong&gt; to a &lt;strong&gt;live orchestration engine&lt;/strong&gt;.  &lt;/p&gt;




&lt;h2&gt;
  
  
  🕰️ Before: The Stalled Prototype
&lt;/h2&gt;

&lt;p&gt;The original XMLPayments repo was functional but fragile:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fragmented XML flows with no orchestration.
&lt;/li&gt;
&lt;li&gt;Manual reconciliation that took days.
&lt;/li&gt;
&lt;li&gt;Brittle scripts prone to breaking under load.
&lt;/li&gt;
&lt;li&gt;Documentation incomplete, onboarding unclear.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It was a proof of concept, but not production‑ready.  &lt;/p&gt;




&lt;h2&gt;
  
  
  🚀 After: A Polished Framework
&lt;/h2&gt;

&lt;p&gt;Reviving the project meant transforming it into something usable:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Automated orchestration&lt;/strong&gt; of XML flows.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real‑time compliance dashboards&lt;/strong&gt; for auditors.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CI/CD pipelines&lt;/strong&gt; for deployment and testing.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Developer‑friendly onboarding&lt;/strong&gt; with examples and diagrams.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now, XMLPayments isn’t just a repo — it’s a framework ready to deploy.  &lt;/p&gt;




&lt;h2&gt;
  
  
  🤖 Copilot in Action
&lt;/h2&gt;

&lt;p&gt;GitHub Copilot played a crucial role in the revival:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Generated &lt;strong&gt;async handlers&lt;/strong&gt; for XML ingestion.
&lt;/li&gt;
&lt;li&gt;Suggested &lt;strong&gt;error handling patterns&lt;/strong&gt; for resilience.
&lt;/li&gt;
&lt;li&gt;Autocompleted &lt;strong&gt;schema validation functions&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;Helped write &lt;strong&gt;unit tests&lt;/strong&gt; that covered edge cases.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Copilot didn’t just save time — it unlocked momentum.  &lt;/p&gt;




&lt;h2&gt;
  
  
  🏗️ Architecture Snapshot
&lt;/h2&gt;

&lt;p&gt;The revived XMLPayments repo now follows a &lt;strong&gt;microservice design&lt;/strong&gt;:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Event‑driven ingestion&lt;/strong&gt; of XML files.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Validation layer&lt;/strong&gt; enforcing schema compliance.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Persistence layer&lt;/strong&gt; for audit trails.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitoring dashboard&lt;/strong&gt; for real‑time visibility.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This architecture ensures scalability, compliance, and developer usability.  &lt;/p&gt;




&lt;h2&gt;
  
  
  📈 Impact
&lt;/h2&gt;

&lt;p&gt;The transformation was tangible:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reconciliation time reduced from &lt;strong&gt;days to seconds&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;Developers can onboard in minutes instead of hours.
&lt;/li&gt;
&lt;li&gt;Compliance reporting is automated and auditable.
&lt;/li&gt;
&lt;li&gt;The repo is now &lt;strong&gt;production‑ready&lt;/strong&gt; and open for contributions.
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>xmlpayments</category>
      <category>github</category>
      <category>githubcopilot</category>
      <category>finishupathon</category>
    </item>
    <item>
      <title>XMLPayments — The Hidden Backbone of Modern Financial Orchestration</title>
      <dc:creator>Printo Tom</dc:creator>
      <pubDate>Sat, 23 May 2026 06:16:35 +0000</pubDate>
      <link>https://dev.to/printo_tom/xmlpayments-the-hidden-backbone-of-modern-financial-orchestration-387b</link>
      <guid>https://dev.to/printo_tom/xmlpayments-the-hidden-backbone-of-modern-financial-orchestration-387b</guid>
      <description>&lt;h3&gt;
  
  
  Introduction
&lt;/h3&gt;

&lt;p&gt;When people talk about fintech innovation, they usually highlight APIs, JSON, and mobile-first experiences. Yet beneath the surface, trillions of dollars still move through &lt;strong&gt;XML-based payment instructions&lt;/strong&gt; every single day. XML is the quiet backbone of financial orchestration — ensuring compliance, traceability, and interoperability across borders.  &lt;/p&gt;

&lt;p&gt;This article dives deep into why XML remains indispensable, how I built &lt;strong&gt;XMLPayments&lt;/strong&gt; to modernize it, and how GitHub Copilot helped me finish what I started.  &lt;/p&gt;




&lt;h2&gt;
  
  
  🌍 The Legacy That Never Died
&lt;/h2&gt;

&lt;p&gt;XML isn’t just a relic of the early internet. In financial services, it’s the &lt;strong&gt;lingua franca&lt;/strong&gt; of trust. Standards like &lt;strong&gt;ISO 20022&lt;/strong&gt; and &lt;strong&gt;SEPA pain.001/pain.008&lt;/strong&gt; rely on XML schemas to ensure every payment instruction is valid, auditable, and compliant.  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Banks use XML for SWIFT messages.
&lt;/li&gt;
&lt;li&gt;Insurance firms rely on XML for reconciliation.
&lt;/li&gt;
&lt;li&gt;Enterprises depend on XML for cross-border compliance.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without XML, global payments would collapse under inconsistency.  &lt;/p&gt;




&lt;h2&gt;
  
  
  ⚙️ Schema‑Driven Reliability
&lt;/h2&gt;

&lt;p&gt;At the heart of XMLPayments is &lt;strong&gt;schema enforcement&lt;/strong&gt;. Every transaction is validated against strict rules before it moves downstream.  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Validation:&lt;/strong&gt; Ensures no malformed data enters the pipeline.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transformation:&lt;/strong&gt; Converts XML into normalized internal formats.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Routing:&lt;/strong&gt; Directs payments to the correct clearing house or compliance system.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This guarantees that every transaction is &lt;strong&gt;trustworthy and traceable&lt;/strong&gt;.  &lt;/p&gt;




&lt;h2&gt;
  
  
  ⚡ Async Architecture for Scale
&lt;/h2&gt;

&lt;p&gt;Financial systems don’t just need reliability — they need speed. XMLPayments leverages &lt;strong&gt;.NET async programming&lt;/strong&gt; (&lt;code&gt;Task.WhenAll()&lt;/code&gt;) to process thousands of transactions in parallel.  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Parallel Execution:&lt;/strong&gt; Multiple payment flows handled simultaneously.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reduced Latency:&lt;/strong&gt; Faster reconciliation and reporting.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resilience:&lt;/strong&gt; Failures isolated without halting the entire pipeline.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This architecture transforms XML from “slow and legacy” into &lt;strong&gt;real-time orchestration&lt;/strong&gt;.  &lt;/p&gt;




&lt;h2&gt;
  
  
  🤖 Copilot’s Role in Modernization
&lt;/h2&gt;

&lt;p&gt;GitHub Copilot became my silent co‑developer:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Suggested &lt;strong&gt;refactors&lt;/strong&gt; for legacy XML parsers.
&lt;/li&gt;
&lt;li&gt;Generated &lt;strong&gt;schema‑aware unit tests&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;Accelerated &lt;strong&gt;documentation&lt;/strong&gt; with inline comments.
&lt;/li&gt;
&lt;li&gt;Helped design &lt;strong&gt;error handling patterns&lt;/strong&gt; for async flows.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Copilot didn’t just save time — it unlocked creativity by removing repetitive coding barriers.  &lt;/p&gt;




&lt;h2&gt;
  
  
  📊 Outcome
&lt;/h2&gt;

&lt;p&gt;The result is a &lt;strong&gt;resilient orchestration layer&lt;/strong&gt; that:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bridges legacy XML systems with modern APIs.
&lt;/li&gt;
&lt;li&gt;Reduces reconciliation time from days to seconds.
&lt;/li&gt;
&lt;li&gt;Provides compliance dashboards for auditors.
&lt;/li&gt;
&lt;li&gt;Enables enterprises to modernize without breaking trust.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;XMLPayments proves that XML isn’t outdated — it’s the &lt;strong&gt;hidden backbone&lt;/strong&gt; of financial orchestration.  &lt;/p&gt;

</description>
      <category>xml</category>
      <category>fintech</category>
      <category>dotnet</category>
      <category>github</category>
    </item>
    <item>
      <title>Reviving My Gemma Agentic Framework: From Prototype to Polished Repo</title>
      <dc:creator>Printo Tom</dc:creator>
      <pubDate>Sat, 23 May 2026 06:06:46 +0000</pubDate>
      <link>https://dev.to/printo_tom/reviving-my-gemma-agentic-framework-from-prototype-to-polished-repo-p2g</link>
      <guid>https://dev.to/printo_tom/reviving-my-gemma-agentic-framework-from-prototype-to-polished-repo-p2g</guid>
      <description>&lt;h3&gt;
  
  
  Introduction
&lt;/h3&gt;

&lt;p&gt;During my exploration of agentic AI systems, I started building a framework around &lt;strong&gt;Gemma models&lt;/strong&gt; to demonstrate how lightweight LLMs can orchestrate tasks in enterprise workflows. The idea was strong, but the repo stalled before reaching a usable state. The GitHub Finish-Up-A-Thon Challenge gave me the perfect push to finish what I started.&lt;/p&gt;

&lt;h3&gt;
  
  
  Before Snapshot
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Repo link (before):&lt;/strong&gt; &lt;a href="https://github.com/printotomp/Gemma-agentic-framework.git" rel="noopener noreferrer"&gt;https://github.com/printotomp/Gemma-agentic-framework.git&lt;/a&gt; &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;State of the project:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Initial scaffolding for agent orchestration
&lt;/li&gt;
&lt;li&gt;Basic task routing, but no persistence or error handling
&lt;/li&gt;
&lt;li&gt;Documentation incomplete, no examples for developers
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Why it stalled:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Competing priorities and lack of time to polish usability
&lt;/li&gt;
&lt;li&gt;Architecture decisions left unresolved
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  How GitHub Copilot Helped
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Suggested async patterns (&lt;code&gt;Task.WhenAll&lt;/code&gt;) for parallel agent execution
&lt;/li&gt;
&lt;li&gt;Generated boilerplate for missing modules (logging, error handling)
&lt;/li&gt;
&lt;li&gt;Helped write unit tests faster
&lt;/li&gt;
&lt;li&gt;Improved documentation with inline comments and example snippets
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4zmsrppohs7hw49nvwtm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4zmsrppohs7hw49nvwtm.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  After Snapshot
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Repo link (after):&lt;/strong&gt; &lt;a href="https://github.com/printotomp/Gemma-agentic-framework.git" rel="noopener noreferrer"&gt;https://github.com/printotomp/Gemma-agentic-framework.git&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What’s new:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Completed orchestration layer with persistence
&lt;/li&gt;
&lt;li&gt;Added developer-friendly examples (e.g., “build-your-first-agent”)
&lt;/li&gt;
&lt;li&gt;Wrote comprehensive tests for reliability
&lt;/li&gt;
&lt;li&gt;Improved README and onboarding guide
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Usability improvements:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Clearer architecture diagrams
&lt;/li&gt;
&lt;li&gt;One-click setup with GitHub Actions CI/CD
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Creative additions:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Acronym-based design principle (WET: &lt;em&gt;Write Everything Twice&lt;/em&gt;) for independence in microservice design
&lt;/li&gt;
&lt;li&gt;Demo workflow showing Gemma agents coordinating tasks
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Completion Arc
&lt;/h3&gt;

&lt;p&gt;This challenge wasn’t just about finishing code — it was about rediscovering momentum. The “before and after” journey shows how Copilot can transform abandoned ideas into finished frameworks ready for the community.&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;Thanks to GitHub and Copilot, I finally shipped something I had left behind. The Finish-Up-A-Thon reminded me that completion is just as important as innovation.  &lt;/p&gt;




&lt;h2&gt;
  
  
  🏆 Judging Criteria Checklist
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Underlying technology:&lt;/strong&gt; Gemma models, .NET async programming, microservice architecture
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Usability &amp;amp; UX:&lt;/strong&gt; Clear onboarding, examples, CI/CD pipeline
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Originality &amp;amp; Creativity:&lt;/strong&gt; Agentic orchestration with WET principle
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Completion Arc:&lt;/strong&gt; Before vs. after repo transformation
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>githubchallenge</category>
      <category>githubcopilot</category>
      <category>finishupathon</category>
      <category>gemma</category>
    </item>
    <item>
      <title>First Look at Google AI Studio + Gemini at I/O 2026</title>
      <dc:creator>Printo Tom</dc:creator>
      <pubDate>Wed, 20 May 2026 11:07:47 +0000</pubDate>
      <link>https://dev.to/printo_tom/first-look-at-google-ai-studio-gemini-at-io-2026-d5i</link>
      <guid>https://dev.to/printo_tom/first-look-at-google-ai-studio-gemini-at-io-2026-d5i</guid>
      <description>&lt;h1&gt;
  
  
  🚀 First Look at Google AI Studio + Gemini at I/O 2026
&lt;/h1&gt;

&lt;p&gt;Google I/O 2026 wasn’t just about flashy demos — it was about &lt;strong&gt;making AI practical for developers&lt;/strong&gt;. For me, the standout announcement was the evolution of &lt;strong&gt;Gemini&lt;/strong&gt; into a full ecosystem, anchored by &lt;strong&gt;Google AI Studio&lt;/strong&gt;.  &lt;/p&gt;

&lt;h2&gt;
  
  
  🌟 Why It Matters
&lt;/h2&gt;

&lt;p&gt;Until now, experimenting with large models often meant juggling APIs, SDKs, and cloud credits. With AI Studio, Google is positioning Gemini as the &lt;strong&gt;fastest way to start building&lt;/strong&gt; — lowering the barrier for developers who want to prototype, test, and deploy AI-powered apps.  &lt;/p&gt;

&lt;p&gt;This shift feels less like “another model release” and more like &lt;strong&gt;a platform moment&lt;/strong&gt;. Gemini isn’t just a model anymore; it’s the connective tissue across Google’s ecosystem — from Docs and Gmail to Firebase and Cloud.  &lt;/p&gt;

&lt;h2&gt;
  
  
  🛠️ Hands-On Impressions
&lt;/h2&gt;

&lt;p&gt;I spent time exploring AI Studio during I/O, and here’s what stood out:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Instant Playground&lt;/strong&gt;: You can spin up a Gemini-powered app in minutes, no complex setup.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tight Integration&lt;/strong&gt;: Firebase and Cloud hooks are built-in, meaning you can go from prototype to production without duct tape.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transparency&lt;/strong&gt;: Google emphasized responsible AI, with clear usage dashboards and guardrails.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It reminded me of the early days of Firebase — simple, approachable, and developer-first.  &lt;/p&gt;

&lt;h2&gt;
  
  
  🔍 My Take
&lt;/h2&gt;

&lt;p&gt;The most underrated aspect of this release is &lt;strong&gt;accessibility for small teams&lt;/strong&gt;. While enterprises will benefit from Gemini’s scale, indie developers now have a way to experiment without heavy infrastructure. This democratization could spark the next wave of AI-driven startups.  &lt;/p&gt;

&lt;h2&gt;
  
  
  📚 Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://ai.google.dev" rel="noopener noreferrer"&gt;Google AI Studio&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://io.google/2026" rel="noopener noreferrer"&gt;I/O 2026 Keynote Replay&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/challenges/google-io-writing-2026-05-19"&gt;DEV Challenge Launch Post&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;💡 &lt;strong&gt;Community Question&lt;/strong&gt;: What’s the first app or workflow you’d build with Gemini in AI Studio?  &lt;/p&gt;

&lt;p&gt;👉 Suggested tags: &lt;code&gt;#GoogleIO2026 #Gemini #AIStudio #GoogleAI #Cloud #Firebase #AI #DeveloperTools&lt;/code&gt;  &lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>googleiochallenge</category>
      <category>googleio2026</category>
      <category>gemini</category>
    </item>
    <item>
      <title>Hermes Agent: Why Open Agentic Systems Matter</title>
      <dc:creator>Printo Tom</dc:creator>
      <pubDate>Tue, 19 May 2026 05:20:52 +0000</pubDate>
      <link>https://dev.to/printo_tom/hermes-agent-why-open-agentic-systems-matter-46ma</link>
      <guid>https://dev.to/printo_tom/hermes-agent-why-open-agentic-systems-matter-46ma</guid>
      <description>&lt;h2&gt;
  
  
  🚀 What Is Hermes Agent?
&lt;/h2&gt;

&lt;p&gt;If you’ve been following the agentic space, Hermes Agent probably doesn’t need much introduction. For everyone else: it’s an open-source agent framework from Nous Research that you can run on your own infrastructure — from a $5 VPS to a GPU cluster.  &lt;/p&gt;

&lt;p&gt;The magic? Hermes isn’t just another orchestration layer. It’s self-improving. It learns from experience, nudges itself to persist knowledge, and builds a deeper model of &lt;em&gt;you&lt;/em&gt; across sessions. That’s a big leap forward compared to most agent frameworks.&lt;/p&gt;

&lt;h2&gt;
  
  
  🛠️ Why Hermes Agent Stands Out
&lt;/h2&gt;

&lt;p&gt;Here’s what makes Hermes different in plain terms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;It learns as it goes&lt;/strong&gt; → not just executing tasks, but refining skills.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It remembers you&lt;/strong&gt; → building continuity across conversations.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It’s model-agnostic&lt;/strong&gt; → plug in OpenAI, Hugging Face, NVIDIA NIM, or even your own endpoint.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It’s infrastructure-flexible&lt;/strong&gt; → run it locally, in the cloud, or even chat with it via Telegram while it works remotely.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That combination makes Hermes feel less like a “tool runner” and more like a &lt;em&gt;partner that grows with you&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  🔍 Hermes vs. Other Agentic Frameworks
&lt;/h2&gt;

&lt;p&gt;A quick comparison to put things in perspective:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Hermes Agent&lt;/th&gt;
&lt;th&gt;LangChain / CrewAI&lt;/th&gt;
&lt;th&gt;AutoGPT / BabyAGI&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Self-Improvement&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Built-in learning loop&lt;/td&gt;
&lt;td&gt;❌ Static orchestration&lt;/td&gt;
&lt;td&gt;❌ Memory hacks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Infrastructure Flexibility&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ VPS, GPU, serverless&lt;/td&gt;
&lt;td&gt;⚠️ Hybrid setups&lt;/td&gt;
&lt;td&gt;⚠️ Local-first&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Model Agnostic&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ 200+ models supported&lt;/td&gt;
&lt;td&gt;⚠️ Often OpenAI-heavy&lt;/td&gt;
&lt;td&gt;⚠️ OpenAI-centric&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Persistence&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Deep user modeling&lt;/td&gt;
&lt;td&gt;⚠️ External memory add-ons&lt;/td&gt;
&lt;td&gt;⚠️ Shallow persistence&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Hermes Agent’s differentiator is clear: it’s designed to evolve.&lt;/p&gt;

&lt;h2&gt;
  
  
  🌍 Why Open Agentic Systems Matter
&lt;/h2&gt;

&lt;p&gt;Closed ecosystems lock you into specific APIs, models, or infrastructure. Hermes flips that script:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Freedom to experiment&lt;/strong&gt; → no lock-in.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Community-driven&lt;/strong&gt; → improvements are open and transparent.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Future-proof&lt;/strong&gt; → as models evolve, Hermes adapts without forcing you to rebuild.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In short: openness ensures resilience and innovation. And that matters when agents are becoming the backbone of productivity, research, and creativity.&lt;/p&gt;

&lt;h2&gt;
  
  
  ✨ My Takeaway
&lt;/h2&gt;

&lt;p&gt;Hermes Agent isn’t just another framework. It’s a statement:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Open instead of proprietary
&lt;/li&gt;
&lt;li&gt;Adaptive instead of static
&lt;/li&gt;
&lt;li&gt;Self-improving instead of brittle
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For developers, that means freedom to build without constraints. For the community, it means a shared foundation to push agentic AI forward.&lt;/p&gt;

&lt;h2&gt;
  
  
  📌 Final Thoughts
&lt;/h2&gt;

&lt;p&gt;If you’re curious about agentic systems, Hermes Agent is worth exploring. Whether you’re building a research pipeline, a productivity assistant, or experimenting with creative agents, Hermes offers a playground where the agent itself grows alongside your ideas.&lt;/p&gt;

&lt;p&gt;The future of AI development won’t just be about smarter models — it will be about agents that &lt;em&gt;learn, persist, and adapt&lt;/em&gt;. Hermes Agent is one of the first open steps in that direction.&lt;/p&gt;

&lt;p&gt;💬 What’s your take? Do you see open agentic systems like Hermes shaping the future, or will proprietary ecosystems dominate? Let’s spark a discussion.&lt;/p&gt;

</description>
      <category>hermesagentchallenge</category>
      <category>devchallenge</category>
      <category>agents</category>
      <category>opensource</category>
    </item>
    <item>
      <title>When AI Meets Reality: Why “Hello World” Isn’t Enough for LLM Systems</title>
      <dc:creator>Printo Tom</dc:creator>
      <pubDate>Tue, 19 May 2026 05:11:16 +0000</pubDate>
      <link>https://dev.to/printo_tom/when-ai-meets-reality-why-hello-world-isnt-enough-for-llm-systems-27ea</link>
      <guid>https://dev.to/printo_tom/when-ai-meets-reality-why-hello-world-isnt-enough-for-llm-systems-27ea</guid>
      <description>&lt;p&gt;Most AI tutorials stop at “Hello World.” You wire up a model, send a prompt, get a response, and feel like you’ve built something. But the moment you try to ship that into production, the ground shifts beneath your feet.&lt;/p&gt;

&lt;p&gt;I learned this the hard way. After years of building fraud detection and pricing platforms, I’ve seen what happens when AI systems collide with real‑world state changes, concurrency, and regulatory scrutiny. Spoiler: it’s not pretty.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Mirage of Staging
&lt;/h2&gt;

&lt;p&gt;Staging environments are polite liars. They don’t tell you how load will spike, how data will mutate mid‑transaction, or how context drift will break your assumptions. In production, milliseconds matter. A competitor reprices, a stock threshold flips, and suddenly your “correct” model output is wrong for the world it lands in.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson:&lt;/strong&gt; Treat context as a snapshot contract. Immutable, versioned, and validated before any downstream commit. If the snapshot is stale, abort. Re‑orchestrate. Don’t trust staging to teach you this — production will.&lt;/p&gt;

&lt;h2&gt;
  
  
  Failure Modes Define Architecture
&lt;/h2&gt;

&lt;p&gt;Fraud vs. pricing taught me the most important architectural lesson: not all signals are equal.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Fraud:&lt;/strong&gt; high‑frequency, asymmetric cost of false negatives → fail‑closed defaults.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pricing:&lt;/strong&gt; lower frequency, asymmetric cost of false positives → fail‑open defaults.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Copy‑pasting validation strategies across domains is malpractice. Map your failure modes first. Let the asymmetry drive your fallback design.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prompts Are Contracts Too
&lt;/h2&gt;

&lt;p&gt;We version APIs. We version schemas. We rarely version prompts. That’s how a “minor tweak” silently broke a fraud classifier pipeline for six hours. The fix was simple: git‑tracked prompts, version IDs in every call, and audit logs that tie outputs back to prompt versions.&lt;/p&gt;

&lt;p&gt;Audit trails aren’t just for compliance. They’re the only way to answer the inevitable question: &lt;em&gt;did the model drift, did the prompt drift, or did the world drift?&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Trust Layer Is Load‑Bearing
&lt;/h2&gt;

&lt;p&gt;Most teams skip it. Schema enforcement, confidence routing, semantic drift detection — all postponed until the first incident. By then, retrofitting costs months. Build it upfront. It’s not a safety net; it’s part of the foundation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Build Boring AI
&lt;/h2&gt;

&lt;p&gt;The model is not the system. The system earns the right to touch production state through contracts, validation, bounded context, and auditability. Every shortcut you take here will come back as a pager at 2am.&lt;/p&gt;

&lt;p&gt;If you want to sleep at night, build boring AI systems. Your future self will thank you.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>llm</category>
      <category>mlops</category>
    </item>
    <item>
      <title>Introducing the AI Workflow Starter Kit: Build, Fork, and Extend AI Workflows Faster</title>
      <dc:creator>Printo Tom</dc:creator>
      <pubDate>Fri, 15 May 2026 05:30:00 +0000</pubDate>
      <link>https://dev.to/printo_tom/introducing-the-ai-workflow-starter-kit-build-fork-and-extend-ai-workflows-faster-2bb0</link>
      <guid>https://dev.to/printo_tom/introducing-the-ai-workflow-starter-kit-build-fork-and-extend-ai-workflows-faster-2bb0</guid>
      <description>&lt;p&gt;Intro:&lt;br&gt;&lt;br&gt;
Developers often struggle to connect LLMs to real‑world workflows without reinventing the wheel. That’s why I built the AI Workflow Starter Kit — a modular, fork‑friendly repo that makes it easy to launch AI‑powered bots, assistants, and automation pipelines.&lt;/p&gt;

&lt;p&gt;🔑 What’s Inside&lt;br&gt;
Connectors → Slack, Teams, Google Drive, Notion, Email&lt;/p&gt;

&lt;p&gt;Demo workflows → FAQ Bot, Contract Analyzer, Data Summarizer&lt;/p&gt;

&lt;p&gt;Core utilities → LLM orchestration, embeddings, async task handling&lt;/p&gt;

&lt;p&gt;Config files → JSON/YAML for quick customization&lt;/p&gt;

&lt;p&gt;Deployment scripts → Docker + CI/CD ready&lt;/p&gt;

&lt;p&gt;🌟 Why Fork This Repo&lt;br&gt;
Immediate utility → Start with working demos.&lt;/p&gt;

&lt;p&gt;Easy customization → Config‑driven design, modular structure.&lt;/p&gt;

&lt;p&gt;Community growth → CONTRIBUTING.md, roadmap, seeded issues.&lt;/p&gt;

&lt;p&gt;Professional polish → Badges, changelog, MIT license.&lt;/p&gt;

&lt;p&gt;🚀 Getting Started&lt;br&gt;
Clone or fork the repo.&lt;/p&gt;

&lt;p&gt;Edit configs to match your workflow.&lt;/p&gt;

&lt;p&gt;Run deploy.sh to launch locally or in the cloud.&lt;/p&gt;

&lt;p&gt;Extend with new connectors or workflows — and share back!&lt;/p&gt;

&lt;p&gt;👉 Explore the repo here: &lt;a href="https://github.com/printotomp/ai-workflow-starter-kit.git" rel="noopener noreferrer"&gt;https://github.com/printotomp/ai-workflow-starter-kit.git&lt;/a&gt;&lt;br&gt;
I’d love to see how you fork and extend it. Contributions welcome — let’s make AI workflows accessible and collaborative!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>github</category>
      <category>workflowautomation</category>
    </item>
    <item>
      <title>Launching *Claude for Legal*: A Toolkit for Modern Legal Workflows</title>
      <dc:creator>Printo Tom</dc:creator>
      <pubDate>Thu, 14 May 2026 13:24:36 +0000</pubDate>
      <link>https://dev.to/printo_tom/launching-claude-for-legal-a-toolkit-for-modern-legal-workflows-4551</link>
      <guid>https://dev.to/printo_tom/launching-claude-for-legal-a-toolkit-for-modern-legal-workflows-4551</guid>
      <description>&lt;p&gt;&lt;strong&gt;Intro:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Legal teams today juggle everything from vendor agreements and privacy impact assessments to litigation prep and law school training. I wanted to create a repo that brings all of these workflows together in one place — practical, extensible, and open for the community. That’s how &lt;em&gt;Claude for Legal&lt;/em&gt; was born.&lt;/p&gt;




&lt;h3&gt;
  
  
  🔑 What’s Inside
&lt;/h3&gt;

&lt;p&gt;This repo is a &lt;strong&gt;comprehensive collection of agents, skills, and connectors&lt;/strong&gt; designed for legal professionals, students, and researchers.  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;⚖️ &lt;strong&gt;Practice‑area plugins&lt;/strong&gt;: In‑house commercial, corporate, employment, privacy, product, regulatory, AI governance, IP, litigation, clinics, and law school.
&lt;/li&gt;
&lt;li&gt;🤖 &lt;strong&gt;Named agents&lt;/strong&gt;: Vendor Agreement Reviewer, DSAR Responder, Claim Chart Builder, Termination Reviewer, NDA Triager, and many more.
&lt;/li&gt;
&lt;li&gt;🔌 &lt;strong&gt;MCP connectors&lt;/strong&gt;: Integrations with Slack, Google Drive, DocuSign, iManage, Everlaw, CourtListener, and other legal‑specific systems.
&lt;/li&gt;
&lt;li&gt;📚 &lt;strong&gt;Managed agent cookbooks&lt;/strong&gt;: Renewal watcher, docket watcher, regulatory feed monitor, diligence grid, launch radar — ready for scheduled deployment.
&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  ⚡ Benefits
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Accelerates legal analysis&lt;/strong&gt; while keeping attorney review at the center.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Structured workflows&lt;/strong&gt; with guardrails for compliance, privilege, and risk management.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Learning tools&lt;/strong&gt; for students and clinics — IRAC graders, case briefers, bar prep coaches, and Socratic drills.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Verified citations&lt;/strong&gt; through research connectors like CourtListener and Trellis.
&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  🚀 Getting Started
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Install as a Claude Cowork or Claude Code plugin.
&lt;/li&gt;
&lt;li&gt;Run the &lt;strong&gt;cold‑start interview&lt;/strong&gt; to tailor each plugin to your practice.
&lt;/li&gt;
&lt;li&gt;Connect a research tool for authoritative citations.
&lt;/li&gt;
&lt;li&gt;Explore the scheduled agents for automated monitoring and reporting.
&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  🌟 Why It Matters
&lt;/h3&gt;

&lt;p&gt;The law is evolving fast — privacy, AI governance, regulatory feeds, and litigation workflows all demand agility. This repo helps legal teams and students &lt;strong&gt;move faster without cutting corners&lt;/strong&gt;, combining automation with professional responsibility.&lt;/p&gt;




&lt;p&gt;👉 &lt;strong&gt;Explore the repo here:&lt;/strong&gt; &lt;a href="https://github.com/printotomp/claude-legal-assistant-.git" rel="noopener noreferrer"&gt;https://github.com/printotomp/claude-legal-assistant-.git&lt;/a&gt;&lt;br&gt;&lt;br&gt;
I’d love for you to go through it, try the plugins, and share feedback. Contributions are welcome — let’s build the future of legal AI together!&lt;/p&gt;

</description>
      <category>legaltech</category>
      <category>ai</category>
      <category>opensource</category>
      <category>github</category>
    </item>
    <item>
      <title>The AI system that worked in staging destroyed us in production. Here's what we missed.</title>
      <dc:creator>Printo Tom</dc:creator>
      <pubDate>Thu, 14 May 2026 05:30:00 +0000</pubDate>
      <link>https://dev.to/printo_tom/the-ai-system-that-worked-in-staging-destroyed-us-in-production-heres-what-we-missed-28p9</link>
      <guid>https://dev.to/printo_tom/the-ai-system-that-worked-in-staging-destroyed-us-in-production-heres-what-we-missed-28p9</guid>
      <description>&lt;p&gt;I've been a software and enterprise architect for over twelve years. I've shipped pricing platforms, fraud detection systems, and order management infrastructure at scale — most recently at one of the UK's largest retailers. I say that not to flex, but to explain why I'm writing this post with a specific kind of frustration.&lt;/p&gt;

&lt;p&gt;Because almost every article I read about AI in enterprise sounds like it was written by someone who has never been paged at 2am because an LLM-backed pricing rule marked 40,000 product lines as zero.&lt;/p&gt;

&lt;p&gt;So here's what actually happens when you put AI into systems where the decisions have consequences.&lt;/p&gt;

&lt;h2&gt;
  
  
  The staging trap
&lt;/h2&gt;

&lt;p&gt;Staging environments lie. They lie about load, they lie about data shape, and — critically for AI systems — they lie about context drift.&lt;/p&gt;

&lt;p&gt;Context drift is when the world changes between the moment you assembled the input to your model and the moment the model's output takes effect. In a pricing engine, that gap can be milliseconds. In those milliseconds: a competitor might have repriced, a promotional rule might have fired, a stock threshold might have been crossed.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What this looks like in practice: your orchestrator assembles context — product cost, margin floor, competitor price, stock level — and sends it to the model. The model reasons and returns a recommended price. Validation passes. But by the time you write to the pricing store, the stock level has changed and the margin floor has been updated by a concurrent batch job. The model's recommendation was correct for a world that no longer exists.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The fix isn't faster models. It's a &lt;strong&gt;snapshot contract&lt;/strong&gt;: a bounded, versioned, immutable view of state captured at orchestration time and passed all the way through to the action layer. Every downstream system confirms against the snapshot version before committing. If the snapshot is stale, you abort and re-orchestrate.&lt;/p&gt;

&lt;p&gt;This pattern is borrowed directly from event sourcing. Most AI architects I've met have never heard of it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fraud signals don't behave like pricing signals — and that matters architecturally
&lt;/h2&gt;

&lt;p&gt;One of the most useful things I've done is build &lt;em&gt;both&lt;/em&gt; a fraud detection system and a pricing platform, because the contrast forces architectural clarity.&lt;/p&gt;

&lt;p&gt;Fraud signals are high-frequency, low-latency, and the cost of a false negative is asymmetric — you can recover from a false positive (apologise to a good customer) but you can't unwind a fraudulent transaction. This pushes the architecture toward &lt;strong&gt;fail-closed defaults&lt;/strong&gt;: when confidence is low, decline and escalate.&lt;/p&gt;

&lt;p&gt;Pricing signals are lower frequency, higher context, and the cost structure is different — a bad price for 10 minutes on a low-velocity SKU costs less than a declined checkout. This pushes toward &lt;strong&gt;fail-open defaults&lt;/strong&gt; with aggressive post-hoc monitoring.&lt;/p&gt;

&lt;p&gt;The point is that "AI system" is not a single architecture. The trust posture of your validation layer, your fallback strategy, your human-in-the-loop gates — all of these should be derived from the asymmetry of your failure modes, not from a generic best-practice blog post (including this one).&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Before you design the system, map your failure modes. A false positive in fraud is not the same as a false positive in pricing. Your architecture should know the difference.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The prompt is a contract. Treat it like one.
&lt;/h2&gt;

&lt;p&gt;Your codebase versions your APIs. It versions your database schemas. It does not version your prompts — and that is a production incident waiting to happen.&lt;/p&gt;

&lt;p&gt;We learned this the hard way. A well-intentioned tweak to the system prompt of a fraud classification model changed the output structure enough to break the downstream parser. Silently. For six hours. Because the validation layer was checking for the presence of a field, not its semantic content.&lt;/p&gt;

&lt;p&gt;Prompt versioning isn't complicated. It's a git-tracked file, a version identifier injected into every API call, and a log entry that records which version produced which output.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"prompt_version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"fraud-classifier-v2.4.1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"claude-sonnet-4-20250514"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"input_snapshot_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"snap_01JV..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"output"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"validation_result"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"pass"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"action_taken"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"flag_for_review"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every LLM-influenced decision that touches production state should produce a record like this. Not for debugging — for auditability. In retail, in finance, in any regulated domain, the question "why did the system do that?" will be asked by someone whose salary is higher than yours. You want a clean answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  The layer nobody builds until they need it
&lt;/h2&gt;

&lt;p&gt;Teams build the orchestration layer. They build the reasoning layer (the model call). They often skip the trust and validation layer, tell themselves they'll add it later, and then spend six months retrofitting it after their first production incident.&lt;/p&gt;

&lt;p&gt;The trust layer is not a safety net. It's load-bearing infrastructure. It includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Schema enforcement&lt;/strong&gt; — structured output validation before anything downstream sees the result. Not "does the JSON parse" but "does this output satisfy the business constraints it was supposed to satisfy."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Confidence routing&lt;/strong&gt; — when the model signals uncertainty, the output should not go to production. Route to a fallback rule, a human queue, or a conservative default.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic drift detection&lt;/strong&gt; — over time, the distribution of what your model produces drifts. Not because the model changed, but because the world feeding it changed. Monitor output distributions the same way you'd monitor latency percentiles.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I'd tell myself three years ago
&lt;/h2&gt;

&lt;p&gt;The model is not the system. The model is one component inside a system that has to earn the right to touch production state. It earns that right through versioned contracts, explicit validation, bounded context, and audit trails.&lt;/p&gt;

&lt;p&gt;Every shortcut you take on those four things will come back as a production incident. I know because I've taken most of them.&lt;/p&gt;

&lt;p&gt;Build boring AI systems. Your on-call rotation will thank you.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;If you've been through something similar — or disagree with any of this — I'd genuinely like to hear it in the comments.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>llm</category>
      <category>lessonslearned</category>
    </item>
    <item>
      <title>Choosing the Right Gemma 4 Model: A Practical Guide</title>
      <dc:creator>Printo Tom</dc:creator>
      <pubDate>Mon, 11 May 2026 06:00:00 +0000</pubDate>
      <link>https://dev.to/printo_tom/choosing-the-right-gemma-4-model-a-practical-guide-325p</link>
      <guid>https://dev.to/printo_tom/choosing-the-right-gemma-4-model-a-practical-guide-325p</guid>
      <description>&lt;p&gt;Gemma 4 isn’t just one model — it’s three distinct flavors. Picking the right one can make or break your project.** With Google’s latest open model family, developers now have access to native multimodal capabilities, advanced reasoning, and a massive 128K context window. But the real power lies in choosing the right variant for your use case.&lt;/p&gt;

&lt;p&gt;🧩 The Three Flavors of Gemma 4&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Small (2B / 4B):&lt;br&gt;
Built for ultra‑mobile, edge, and browser deployment. Perfect for IoT projects, mobile apps, or even running on a Raspberry Pi. If you want AI that lives close to the user, this is your pick.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Dense (31B):&lt;br&gt;
A powerhouse that bridges server‑grade performance with local execution. Ideal for enterprise prototypes, chatbots, or applications that need strong reasoning without relying on cloud‑only solutions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Mixture‑of‑Experts (26B MoE):&lt;br&gt;
Highly efficient and designed for advanced reasoning at scale. Best suited for research, high‑throughput tasks, or scenarios where efficiency matters as much as raw capability.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;⚙️ Practical Scenarios&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Smart Home IoT Assistant → Small Model &lt;br&gt;
Runs locally, respects privacy, and handles multimodal inputs like voice + sensor data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Enterprise Knowledge Bot → Dense Model&lt;br&gt;
Balances performance with practicality, enabling long‑context reasoning for business workflows.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Research Reasoning Engine → MoE Model &lt;br&gt;
Efficiently processes complex queries, making it ideal for labs or academic projects.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;💡 Key Insight&lt;/p&gt;

&lt;p&gt;Choosing a model isn’t about “bigger is better.” It’s about &lt;strong&gt;fit for purpose&lt;/strong&gt;. A Raspberry Pi project thrives on the Small model, while a multimodal research tool demands the MoE. Intentional selection shows you understand both the technology and the problem you’re solving.&lt;/p&gt;

&lt;p&gt;📣 Final Thoughts&lt;/p&gt;

&lt;p&gt;Gemma 4 opens the door to local AI that’s powerful, flexible, and accessible. The real challenge — and opportunity — is matching the right model to the right context. Experiment, build, and share your journey with the community. That’s how we’ll unlock the full potential of open AI.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>gemma</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
