DEV Community: Printo Tom

Rate Limiting in C# — Don't Let Your API Get Hammered

Printo Tom — Wed, 27 May 2026 10:01:24 +0000

If you run a public API without rate limiting, it's only a matter of time before a runaway client, a misconfigured retry loop, or a well-intentioned load test brings your service to its knees. .NET 7 shipped a first-class rate-limiting API — no third-party middleware required. This post walks through every knob you can turn.

Prerequisite: the built-in rate limiter lives in System.Threading.RateLimiting and the ASP.NET Core middleware in Microsoft.AspNetCore.RateLimiting. Both ship in the box from .NET 7 onwards.

Why rate limiting matters

Rate limiting protects three things simultaneously: your infrastructure from overload, your downstream dependencies from fan-out abuse, and your legitimate users from a noisy neighbour hogging capacity. It also plugs a class of denial-of-service vectors that auth alone can't stop.

The four built-in algorithms

1. Fixed window

Permits N requests per fixed time window (e.g. 100 requests per minute, window resets on the clock boundary). Simple, low memory, but can allow 2× burst at window boundaries.

using System.Threading.RateLimiting;

var limiter = new FixedWindowRateLimiter(
    new FixedWindowRateLimiterOptions
    {
        PermitLimit          = 100,
        Window               = TimeSpan.FromMinutes(1),
        QueueProcessingOrder = QueueProcessingOrder.OldestFirst,
        QueueLimit           = 0   // reject immediately when full
    });

2. Sliding window

Divides the window into segments and tracks usage per segment. Smoother than fixed window — eliminates the boundary burst at the cost of slightly more memory.

var limiter = new SlidingWindowRateLimiter(
    new SlidingWindowRateLimiterOptions
    {
        PermitLimit          = 100,
        Window               = TimeSpan.FromMinutes(1),
        SegmentsPerWindow    = 6,     // 10-second granularity
        QueueProcessingOrder = QueueProcessingOrder.OldestFirst,
        QueueLimit           = 0
    });

3. Token bucket

A bucket fills with tokens at a steady rate up to a maximum. Each request consumes one token. Allows short bursts up to the bucket capacity while enforcing a long-run average. Ideal for APIs where short spikes are acceptable.

var limiter = new TokenBucketRateLimiter(
    new TokenBucketRateLimiterOptions
    {
        TokenLimit               = 50,   // max burst
        ReplenishmentPeriod      = TimeSpan.FromSeconds(10),
        TokensPerPeriod          = 10,   // ~1/s average
        AutoReplenishment        = true,
        QueueProcessingOrder     = QueueProcessingOrder.OldestFirst,
        QueueLimit               = 0
    });

4. Concurrency limiter

Limits simultaneous in-flight requests rather than request rate. Useful for protecting expensive operations like report generation or ML inference where time-in-system matters more than throughput.

var limiter = new ConcurrencyLimiter(
    new ConcurrencyLimiterOptions
    {
        PermitLimit          = 20,
        QueueProcessingOrder = QueueProcessingOrder.OldestFirst,
        QueueLimit           = 5
    });

Wiring it up in ASP.NET Core

Register policies in Program.cs, then apply them with the [EnableRateLimiting] attribute or inline via RequireRateLimiting().

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddRateLimiter(options =>
{
    options.AddFixedWindowLimiter(policyName: "fixed", opt =>
    {
        opt.PermitLimit = 100;
        opt.Window      = TimeSpan.FromMinutes(1);
        opt.QueueLimit  = 0;
    });

    options.AddTokenBucketLimiter(policyName: "burst", opt =>
    {
        opt.TokenLimit          = 50;
        opt.ReplenishmentPeriod = TimeSpan.FromSeconds(10);
        opt.TokensPerPeriod     = 10;
        opt.AutoReplenishment   = true;
    });
});

var app = builder.Build();
app.UseRateLimiter();   // must come before MapControllers

Apply to a minimal API endpoint or controller action:

// Minimal API
app.MapGet("/products", GetProducts)
   .RequireRateLimiting("fixed");

// Controller
[EnableRateLimiting("burst")]
[HttpGet("search")]
public IActionResult Search(string query) { ... }

Per-user and per-endpoint policies

A single global policy rarely fits real-world needs. Use AddPolicy with a partition key derived from the request context:

options.AddPolicy("per-user", httpContext =>
    RateLimitPartition.GetTokenBucketLimiter(
        partitionKey: httpContext.User.Identity?.Name
                      ?? httpContext.Connection.RemoteIpAddress?.ToString()
                      ?? "anonymous",
        factory: _ => new TokenBucketRateLimiterOptions
        {
            TokenLimit          = 200,
            ReplenishmentPeriod = TimeSpan.FromMinutes(1),
            TokensPerPeriod     = 200,
            AutoReplenishment   = true
        }));

Tip: prefer authenticated user ID over IP address as the partition key — NAT and proxies can share a single IP across hundreds of users, leading to false positives at scale.

Custom rejection responses

By default, the middleware returns 503 Service Unavailable. The RFC-correct status for rate limiting is 429 Too Many Requests with a Retry-After header:

options.OnRejected = async (context, token) =>
{
    context.HttpContext.Response.StatusCode = StatusCodes.Status429TooManyRequests;

    if (context.Lease.TryGetMetadata(
            MetadataName.RetryAfter, out var retryAfter))
    {
        context.HttpContext.Response.Headers.Append(
            "Retry-After",
            ((int)retryAfter.TotalSeconds).ToString(
                System.Globalization.CultureInfo.InvariantCulture));
    }

    await context.HttpContext.Response.WriteAsync(
        "Rate limit exceeded. Please slow down.", token);
};

Distributed scenarios & Redis

The built-in limiters are in-process only — each pod maintains its own counters. In a horizontally scaled deployment, use a Redis-backed limiter via the RedisRateLimiting community library, which wraps the same RateLimiter abstraction:

dotnet add package RedisRateLimiting

builder.Services.AddStackExchangeRedisCache(o =>
    o.Configuration = builder.Configuration["Redis:Connection"]);

options.AddPolicy("distributed", httpContext =>
    RedisRateLimitPartition.GetSlidingWindowRateLimiter(
        partitionKey: httpContext.User.Identity?.Name ?? "anon",
        factory: _ => new RedisSlidingWindowRateLimiterOptions
        {
            ConnectionMultiplexerFactory =
                httpContext.RequestServices
                    .GetRequiredService<IConnectionMultiplexer>,
            PermitLimit = 500,
            Window      = TimeSpan.FromMinutes(1)
        }));

Client-side resilience with Polly

If your code consumes a rate-limited API, use Polly's RateLimiter strategy combined with Retry to handle 429s gracefully:

dotnet add package Polly.Extensions.Http

services.AddHttpClient<IProductsClient, ProductsClient>()
        .AddResilienceHandler("products-pipeline", builder =>
        {
            builder.AddRateLimiter(new SlidingWindowRateLimiter(
                new SlidingWindowRateLimiterOptions
                {
                    PermitLimit       = 50,
                    Window            = TimeSpan.FromSeconds(10),
                    SegmentsPerWindow = 5
                }));

            builder.AddRetry(new HttpRetryStrategyOptions
            {
                MaxRetryAttempts = 3,
                Delay            = TimeSpan.FromSeconds(2),
                BackoffType      = DelayBackoffType.Exponential,
                ShouldHandle     = args => ValueTask.FromResult(
                    args.Outcome.Result?.StatusCode ==
                        HttpStatusCode.TooManyRequests)
            });
        });

Choosing the right algorithm

Algorithm	Best for	Watch out for	Memory cost
Fixed window	Simple quotas, billing tiers	Boundary burst (2× spike)	Very low
Sliding window	Smooth public APIs	Segment count × partitions	Low–medium
Token bucket	Burst-tolerant consumer APIs	Tuning burst vs average	Low
Concurrency	Expensive ops (ML, reports)	Doesn't bound throughput	Very low

Distributed gotcha: in-process limiters per pod means a cluster of 4 replicas effectively multiplies your limit by 4. Always use a Redis-backed partitioned limiter for multi-replica deployments where correctness matters.

Wrapping up

.NET 7+ gives you production-grade rate limiting with zero external dependencies for single-node scenarios. The four algorithms cover the full spectrum from simple quotas to burst-tolerant consumer clients. Add Redis for distributed enforcement, Polly for client-side resilience, and always return 429 with a Retry-After header — your API consumers will thank you.

Questions or patterns I missed? Drop them in the comments.

Designing the Future of Payments — Why XML Still Matters in the Age of APIs

Printo Tom — Sat, 23 May 2026 06:22:48 +0000

Introduction

In the fast‑moving world of fintech, APIs have become the poster child for innovation. They’re sleek, lightweight, and developer‑friendly. Yet beneath the surface of every instant transfer, compliance check, and cross‑border transaction lies a structured XML message — quietly ensuring that money moves safely, legally, and consistently.

XML isn’t fading away; it’s evolving. It remains the heartbeat of global payments, and projects like XMLPayments prove that legacy technologies can coexist with modern architectures to create something truly future‑ready.

🌐 The Evolution of Payment Standards

The financial industry has undergone a dramatic shift — from SOAP/XML to REST/JSON, from monolithic systems to microservices, and from manual reconciliation to real‑time orchestration. But XML continues to dominate regulated ecosystems for one simple reason: trust.

Schema validation guarantees data integrity.
Auditability ensures every transaction can be traced.
Interoperability allows banks, insurers, and clearing houses to communicate seamlessly.

APIs may simplify integration, but XML ensures compliance and consistency — the two pillars of financial reliability.

🧩 Bridging Legacy and Modern Systems

The challenge isn’t choosing between XML and APIs; it’s connecting them. XMLPayments acts as a bridge between legacy payment rails and modern API ecosystems.

Legacy systems still rely on XML for SWIFT, SEPA, and ISO 20022.
Modern fintech platforms demand RESTful APIs and JSON payloads.
XMLPayments connects both worlds through schema‑driven orchestration and real‑time transformation.

This hybrid approach allows enterprises to modernize without breaking compliance — a critical advantage in regulated environments.

⚙️ Innovation Layer: Schema‑Driven Orchestration

At the core of XMLPayments lies an orchestration engine that validates, transforms, and routes XML messages dynamically.

Validation: Ensures every transaction meets schema and regulatory standards.
Transformation: Converts XML to JSON for API consumption.
Routing: Directs payments to the correct clearing or compliance endpoint.

The result is a seamless flow between legacy and modern systems — where trust meets agility.

🤖 Copilot’s Contribution

Modernization is rarely linear. GitHub Copilot became the catalyst that accelerated XMLPayments’ evolution:

Suggested schema validators and conversion functions.
Generated unit tests for XML‑to‑JSON transformations.
Helped document orchestration flows with inline comments.
Proposed error‑handling patterns for async operations.

Copilot transformed repetitive coding into creative problem‑solving, enabling faster iteration and cleaner architecture.

🚀 Vision: XML as the Foundation for Hybrid Financial Ecosystems

The future of payments isn’t about replacing XML; it’s about reimagining it.

XML provides the structure.
APIs provide the accessibility.
AI provides the intelligence.

Together, they form a hybrid ecosystem where legacy reliability meets modern innovation. XMLPayments embodies this vision — a framework that evolves with technology while preserving trust.

Imagine a world where:

XML schemas validate transactions in milliseconds.
APIs expose those transactions securely to partners.
AI agents monitor compliance and detect anomalies in real time.

That’s not a distant dream — it’s the direction XMLPayments is already heading.

From Legacy to Live — Reviving XMLPayments with GitHub Copilot

Printo Tom — Sat, 23 May 2026 06:19:41 +0000

Introduction

Every developer has that one project that started with excitement but stalled before completion. For me, it was XMLPayments — a prototype designed to orchestrate XML-based financial flows. The GitHub Finish‑Up‑A‑Thon Challenge gave me the push I needed to finally polish it up, and GitHub Copilot became my silent co‑developer.

This is the story of how XMLPayments went from legacy fragments to a live orchestration engine.

🕰️ Before: The Stalled Prototype

The original XMLPayments repo was functional but fragile:

Fragmented XML flows with no orchestration.
Manual reconciliation that took days.
Brittle scripts prone to breaking under load.
Documentation incomplete, onboarding unclear.

It was a proof of concept, but not production‑ready.

🚀 After: A Polished Framework

Reviving the project meant transforming it into something usable:

Automated orchestration of XML flows.
Real‑time compliance dashboards for auditors.
CI/CD pipelines for deployment and testing.
Developer‑friendly onboarding with examples and diagrams.

Now, XMLPayments isn’t just a repo — it’s a framework ready to deploy.

🤖 Copilot in Action

GitHub Copilot played a crucial role in the revival:

Generated async handlers for XML ingestion.
Suggested error handling patterns for resilience.
Autocompleted schema validation functions.
Helped write unit tests that covered edge cases.

Copilot didn’t just save time — it unlocked momentum.

🏗️ Architecture Snapshot

The revived XMLPayments repo now follows a microservice design:

Event‑driven ingestion of XML files.
Validation layer enforcing schema compliance.
Persistence layer for audit trails.
Monitoring dashboard for real‑time visibility.

This architecture ensures scalability, compliance, and developer usability.

📈 Impact

The transformation was tangible:

Reconciliation time reduced from days to seconds.
Developers can onboard in minutes instead of hours.
Compliance reporting is automated and auditable.
The repo is now production‑ready and open for contributions.

XMLPayments — The Hidden Backbone of Modern Financial Orchestration

Printo Tom — Sat, 23 May 2026 06:16:35 +0000

Introduction

When people talk about fintech innovation, they usually highlight APIs, JSON, and mobile-first experiences. Yet beneath the surface, trillions of dollars still move through XML-based payment instructions every single day. XML is the quiet backbone of financial orchestration — ensuring compliance, traceability, and interoperability across borders.

This article dives deep into why XML remains indispensable, how I built XMLPayments to modernize it, and how GitHub Copilot helped me finish what I started.

🌍 The Legacy That Never Died

XML isn’t just a relic of the early internet. In financial services, it’s the lingua franca of trust. Standards like ISO 20022 and SEPA pain.001/pain.008 rely on XML schemas to ensure every payment instruction is valid, auditable, and compliant.

Banks use XML for SWIFT messages.
Insurance firms rely on XML for reconciliation.
Enterprises depend on XML for cross-border compliance.

Without XML, global payments would collapse under inconsistency.

⚙️ Schema‑Driven Reliability

At the heart of XMLPayments is schema enforcement. Every transaction is validated against strict rules before it moves downstream.

Validation: Ensures no malformed data enters the pipeline.
Transformation: Converts XML into normalized internal formats.
Routing: Directs payments to the correct clearing house or compliance system.

This guarantees that every transaction is trustworthy and traceable.

⚡ Async Architecture for Scale

Financial systems don’t just need reliability — they need speed. XMLPayments leverages .NET async programming (Task.WhenAll()) to process thousands of transactions in parallel.

Parallel Execution: Multiple payment flows handled simultaneously.
Reduced Latency: Faster reconciliation and reporting.
Resilience: Failures isolated without halting the entire pipeline.

This architecture transforms XML from “slow and legacy” into real-time orchestration.

🤖 Copilot’s Role in Modernization

GitHub Copilot became my silent co‑developer:

Suggested refactors for legacy XML parsers.
Generated schema‑aware unit tests.
Accelerated documentation with inline comments.
Helped design error handling patterns for async flows.

Copilot didn’t just save time — it unlocked creativity by removing repetitive coding barriers.

📊 Outcome

The result is a resilient orchestration layer that:

Bridges legacy XML systems with modern APIs.
Reduces reconciliation time from days to seconds.
Provides compliance dashboards for auditors.
Enables enterprises to modernize without breaking trust.

XMLPayments proves that XML isn’t outdated — it’s the hidden backbone of financial orchestration.

Reviving My Gemma Agentic Framework: From Prototype to Polished Repo

Printo Tom — Sat, 23 May 2026 06:06:46 +0000

Introduction

During my exploration of agentic AI systems, I started building a framework around Gemma models to demonstrate how lightweight LLMs can orchestrate tasks in enterprise workflows. The idea was strong, but the repo stalled before reaching a usable state. The GitHub Finish-Up-A-Thon Challenge gave me the perfect push to finish what I started.

Before Snapshot

Repo link (before): https://github.com/printotomp/Gemma-agentic-framework.git
State of the project:
- Initial scaffolding for agent orchestration
- Basic task routing, but no persistence or error handling
- Documentation incomplete, no examples for developers
Why it stalled:
- Competing priorities and lack of time to polish usability
- Architecture decisions left unresolved

How GitHub Copilot Helped

Suggested async patterns (Task.WhenAll) for parallel agent execution
Generated boilerplate for missing modules (logging, error handling)
Helped write unit tests faster
Improved documentation with inline comments and example snippets

After Snapshot

Repo link (after): https://github.com/printotomp/Gemma-agentic-framework.git
What’s new:
- Completed orchestration layer with persistence
- Added developer-friendly examples (e.g., “build-your-first-agent”)
- Wrote comprehensive tests for reliability
- Improved README and onboarding guide
Usability improvements:
- Clearer architecture diagrams
- One-click setup with GitHub Actions CI/CD
Creative additions:
- Acronym-based design principle (WET: Write Everything Twice) for independence in microservice design
- Demo workflow showing Gemma agents coordinating tasks

Completion Arc

This challenge wasn’t just about finishing code — it was about rediscovering momentum. The “before and after” journey shows how Copilot can transform abandoned ideas into finished frameworks ready for the community.

Conclusion

Thanks to GitHub and Copilot, I finally shipped something I had left behind. The Finish-Up-A-Thon reminded me that completion is just as important as innovation.

🏆 Judging Criteria Checklist

Underlying technology: Gemma models, .NET async programming, microservice architecture
Usability & UX: Clear onboarding, examples, CI/CD pipeline
Originality & Creativity: Agentic orchestration with WET principle
Completion Arc: Before vs. after repo transformation

First Look at Google AI Studio + Gemini at I/O 2026

Printo Tom — Wed, 20 May 2026 11:07:47 +0000

🚀 First Look at Google AI Studio + Gemini at I/O 2026

Google I/O 2026 wasn’t just about flashy demos — it was about making AI practical for developers. For me, the standout announcement was the evolution of Gemini into a full ecosystem, anchored by Google AI Studio.

🌟 Why It Matters

Until now, experimenting with large models often meant juggling APIs, SDKs, and cloud credits. With AI Studio, Google is positioning Gemini as the fastest way to start building — lowering the barrier for developers who want to prototype, test, and deploy AI-powered apps.

This shift feels less like “another model release” and more like a platform moment. Gemini isn’t just a model anymore; it’s the connective tissue across Google’s ecosystem — from Docs and Gmail to Firebase and Cloud.

🛠️ Hands-On Impressions

I spent time exploring AI Studio during I/O, and here’s what stood out:

Instant Playground: You can spin up a Gemini-powered app in minutes, no complex setup.
Tight Integration: Firebase and Cloud hooks are built-in, meaning you can go from prototype to production without duct tape.
Transparency: Google emphasized responsible AI, with clear usage dashboards and guardrails.

It reminded me of the early days of Firebase — simple, approachable, and developer-first.

🔍 My Take

The most underrated aspect of this release is accessibility for small teams. While enterprises will benefit from Gemini’s scale, indie developers now have a way to experiment without heavy infrastructure. This democratization could spark the next wave of AI-driven startups.

📚 Resources

💡 Community Question: What’s the first app or workflow you’d build with Gemini in AI Studio?

👉 Suggested tags: #GoogleIO2026 #Gemini #AIStudio #GoogleAI #Cloud #Firebase #AI #DeveloperTools

Hermes Agent: Why Open Agentic Systems Matter

Printo Tom — Tue, 19 May 2026 05:20:52 +0000

🚀 What Is Hermes Agent?

If you’ve been following the agentic space, Hermes Agent probably doesn’t need much introduction. For everyone else: it’s an open-source agent framework from Nous Research that you can run on your own infrastructure — from a $5 VPS to a GPU cluster.

The magic? Hermes isn’t just another orchestration layer. It’s self-improving. It learns from experience, nudges itself to persist knowledge, and builds a deeper model of you across sessions. That’s a big leap forward compared to most agent frameworks.

🛠️ Why Hermes Agent Stands Out

Here’s what makes Hermes different in plain terms:

It learns as it goes → not just executing tasks, but refining skills.
It remembers you → building continuity across conversations.
It’s model-agnostic → plug in OpenAI, Hugging Face, NVIDIA NIM, or even your own endpoint.
It’s infrastructure-flexible → run it locally, in the cloud, or even chat with it via Telegram while it works remotely.

That combination makes Hermes feel less like a “tool runner” and more like a partner that grows with you.

🔍 Hermes vs. Other Agentic Frameworks

A quick comparison to put things in perspective:

Feature	Hermes Agent	LangChain / CrewAI	AutoGPT / BabyAGI
Self-Improvement	✅ Built-in learning loop	❌ Static orchestration	❌ Memory hacks
Infrastructure Flexibility	✅ VPS, GPU, serverless	⚠️ Hybrid setups	⚠️ Local-first
Model Agnostic	✅ 200+ models supported	⚠️ Often OpenAI-heavy	⚠️ OpenAI-centric
Persistence	✅ Deep user modeling	⚠️ External memory add-ons	⚠️ Shallow persistence

Hermes Agent’s differentiator is clear: it’s designed to evolve.

🌍 Why Open Agentic Systems Matter

Closed ecosystems lock you into specific APIs, models, or infrastructure. Hermes flips that script:

Freedom to experiment → no lock-in.
Community-driven → improvements are open and transparent.
Future-proof → as models evolve, Hermes adapts without forcing you to rebuild.

In short: openness ensures resilience and innovation. And that matters when agents are becoming the backbone of productivity, research, and creativity.

✨ My Takeaway

Hermes Agent isn’t just another framework. It’s a statement:

Open instead of proprietary
Adaptive instead of static
Self-improving instead of brittle

For developers, that means freedom to build without constraints. For the community, it means a shared foundation to push agentic AI forward.

📌 Final Thoughts

If you’re curious about agentic systems, Hermes Agent is worth exploring. Whether you’re building a research pipeline, a productivity assistant, or experimenting with creative agents, Hermes offers a playground where the agent itself grows alongside your ideas.

The future of AI development won’t just be about smarter models — it will be about agents that learn, persist, and adapt. Hermes Agent is one of the first open steps in that direction.

💬 What’s your take? Do you see open agentic systems like Hermes shaping the future, or will proprietary ecosystems dominate? Let’s spark a discussion.

When AI Meets Reality: Why “Hello World” Isn’t Enough for LLM Systems

Printo Tom — Tue, 19 May 2026 05:11:16 +0000

Most AI tutorials stop at “Hello World.” You wire up a model, send a prompt, get a response, and feel like you’ve built something. But the moment you try to ship that into production, the ground shifts beneath your feet.

I learned this the hard way. After years of building fraud detection and pricing platforms, I’ve seen what happens when AI systems collide with real‑world state changes, concurrency, and regulatory scrutiny. Spoiler: it’s not pretty.

The Mirage of Staging

Staging environments are polite liars. They don’t tell you how load will spike, how data will mutate mid‑transaction, or how context drift will break your assumptions. In production, milliseconds matter. A competitor reprices, a stock threshold flips, and suddenly your “correct” model output is wrong for the world it lands in.

Lesson: Treat context as a snapshot contract. Immutable, versioned, and validated before any downstream commit. If the snapshot is stale, abort. Re‑orchestrate. Don’t trust staging to teach you this — production will.

Failure Modes Define Architecture

Fraud vs. pricing taught me the most important architectural lesson: not all signals are equal.

Fraud: high‑frequency, asymmetric cost of false negatives → fail‑closed defaults.
Pricing: lower frequency, asymmetric cost of false positives → fail‑open defaults.

Copy‑pasting validation strategies across domains is malpractice. Map your failure modes first. Let the asymmetry drive your fallback design.

Prompts Are Contracts Too

We version APIs. We version schemas. We rarely version prompts. That’s how a “minor tweak” silently broke a fraud classifier pipeline for six hours. The fix was simple: git‑tracked prompts, version IDs in every call, and audit logs that tie outputs back to prompt versions.

Audit trails aren’t just for compliance. They’re the only way to answer the inevitable question: did the model drift, did the prompt drift, or did the world drift?

The Trust Layer Is Load‑Bearing

Most teams skip it. Schema enforcement, confidence routing, semantic drift detection — all postponed until the first incident. By then, retrofitting costs months. Build it upfront. It’s not a safety net; it’s part of the foundation.

Build Boring AI

The model is not the system. The system earns the right to touch production state through contracts, validation, bounded context, and auditability. Every shortcut you take here will come back as a pager at 2am.

If you want to sleep at night, build boring AI systems. Your future self will thank you.

Introducing the AI Workflow Starter Kit: Build, Fork, and Extend AI Workflows Faster

Printo Tom — Fri, 15 May 2026 05:30:00 +0000

Intro:

Developers often struggle to connect LLMs to real‑world workflows without reinventing the wheel. That’s why I built the AI Workflow Starter Kit — a modular, fork‑friendly repo that makes it easy to launch AI‑powered bots, assistants, and automation pipelines.

🔑 What’s Inside
Connectors → Slack, Teams, Google Drive, Notion, Email

Demo workflows → FAQ Bot, Contract Analyzer, Data Summarizer

Core utilities → LLM orchestration, embeddings, async task handling

Config files → JSON/YAML for quick customization

Deployment scripts → Docker + CI/CD ready

🌟 Why Fork This Repo
Immediate utility → Start with working demos.

Easy customization → Config‑driven design, modular structure.

Community growth → CONTRIBUTING.md, roadmap, seeded issues.

Professional polish → Badges, changelog, MIT license.

🚀 Getting Started
Clone or fork the repo.

Edit configs to match your workflow.

Run deploy.sh to launch locally or in the cloud.

Extend with new connectors or workflows — and share back!

👉 Explore the repo here: https://github.com/printotomp/ai-workflow-starter-kit.git
I’d love to see how you fork and extend it. Contributions welcome — let’s make AI workflows accessible and collaborative!

Launching Claude for Legal: A Toolkit for Modern Legal Workflows

Printo Tom — Thu, 14 May 2026 13:24:36 +0000

Intro:

Legal teams today juggle everything from vendor agreements and privacy impact assessments to litigation prep and law school training. I wanted to create a repo that brings all of these workflows together in one place — practical, extensible, and open for the community. That’s how Claude for Legal was born.

🔑 What’s Inside

This repo is a comprehensive collection of agents, skills, and connectors designed for legal professionals, students, and researchers.

⚖️ Practice‑area plugins: In‑house commercial, corporate, employment, privacy, product, regulatory, AI governance, IP, litigation, clinics, and law school.
🤖 Named agents: Vendor Agreement Reviewer, DSAR Responder, Claim Chart Builder, Termination Reviewer, NDA Triager, and many more.
🔌 MCP connectors: Integrations with Slack, Google Drive, DocuSign, iManage, Everlaw, CourtListener, and other legal‑specific systems.
📚 Managed agent cookbooks: Renewal watcher, docket watcher, regulatory feed monitor, diligence grid, launch radar — ready for scheduled deployment.

⚡ Benefits

Accelerates legal analysis while keeping attorney review at the center.
Structured workflows with guardrails for compliance, privilege, and risk management.
Learning tools for students and clinics — IRAC graders, case briefers, bar prep coaches, and Socratic drills.
Verified citations through research connectors like CourtListener and Trellis.

🚀 Getting Started

Install as a Claude Cowork or Claude Code plugin.
Run the cold‑start interview to tailor each plugin to your practice.
Connect a research tool for authoritative citations.
Explore the scheduled agents for automated monitoring and reporting.

🌟 Why It Matters

The law is evolving fast — privacy, AI governance, regulatory feeds, and litigation workflows all demand agility. This repo helps legal teams and students move faster without cutting corners, combining automation with professional responsibility.

👉 Explore the repo here: https://github.com/printotomp/claude-legal-assistant-.git

I’d love for you to go through it, try the plugins, and share feedback. Contributions are welcome — let’s build the future of legal AI together!

The AI system that worked in staging destroyed us in production. Here's what we missed.

Printo Tom — Thu, 14 May 2026 05:30:00 +0000

I've been a software and enterprise architect for over twelve years. I've shipped pricing platforms, fraud detection systems, and order management infrastructure at scale — most recently at one of the UK's largest retailers. I say that not to flex, but to explain why I'm writing this post with a specific kind of frustration.

Because almost every article I read about AI in enterprise sounds like it was written by someone who has never been paged at 2am because an LLM-backed pricing rule marked 40,000 product lines as zero.

So here's what actually happens when you put AI into systems where the decisions have consequences.

The staging trap

Staging environments lie. They lie about load, they lie about data shape, and — critically for AI systems — they lie about context drift.

Context drift is when the world changes between the moment you assembled the input to your model and the moment the model's output takes effect. In a pricing engine, that gap can be milliseconds. In those milliseconds: a competitor might have repriced, a promotional rule might have fired, a stock threshold might have been crossed.

What this looks like in practice: your orchestrator assembles context — product cost, margin floor, competitor price, stock level — and sends it to the model. The model reasons and returns a recommended price. Validation passes. But by the time you write to the pricing store, the stock level has changed and the margin floor has been updated by a concurrent batch job. The model's recommendation was correct for a world that no longer exists.

The fix isn't faster models. It's a snapshot contract: a bounded, versioned, immutable view of state captured at orchestration time and passed all the way through to the action layer. Every downstream system confirms against the snapshot version before committing. If the snapshot is stale, you abort and re-orchestrate.

This pattern is borrowed directly from event sourcing. Most AI architects I've met have never heard of it.

Fraud signals don't behave like pricing signals — and that matters architecturally

One of the most useful things I've done is build both a fraud detection system and a pricing platform, because the contrast forces architectural clarity.

Fraud signals are high-frequency, low-latency, and the cost of a false negative is asymmetric — you can recover from a false positive (apologise to a good customer) but you can't unwind a fraudulent transaction. This pushes the architecture toward fail-closed defaults: when confidence is low, decline and escalate.

Pricing signals are lower frequency, higher context, and the cost structure is different — a bad price for 10 minutes on a low-velocity SKU costs less than a declined checkout. This pushes toward fail-open defaults with aggressive post-hoc monitoring.

The point is that "AI system" is not a single architecture. The trust posture of your validation layer, your fallback strategy, your human-in-the-loop gates — all of these should be derived from the asymmetry of your failure modes, not from a generic best-practice blog post (including this one).

Before you design the system, map your failure modes. A false positive in fraud is not the same as a false positive in pricing. Your architecture should know the difference.

The prompt is a contract. Treat it like one.

Your codebase versions your APIs. It versions your database schemas. It does not version your prompts — and that is a production incident waiting to happen.

We learned this the hard way. A well-intentioned tweak to the system prompt of a fraud classification model changed the output structure enough to break the downstream parser. Silently. For six hours. Because the validation layer was checking for the presence of a field, not its semantic content.

Prompt versioning isn't complicated. It's a git-tracked file, a version identifier injected into every API call, and a log entry that records which version produced which output.

{
  "prompt_version": "fraud-classifier-v2.4.1",
  "model": "claude-sonnet-4-20250514",
  "input_snapshot_id": "snap_01JV...",
  "output": { ... },
  "validation_result": "pass",
  "action_taken": "flag_for_review"
}

Every LLM-influenced decision that touches production state should produce a record like this. Not for debugging — for auditability. In retail, in finance, in any regulated domain, the question "why did the system do that?" will be asked by someone whose salary is higher than yours. You want a clean answer.

The layer nobody builds until they need it

Teams build the orchestration layer. They build the reasoning layer (the model call). They often skip the trust and validation layer, tell themselves they'll add it later, and then spend six months retrofitting it after their first production incident.

The trust layer is not a safety net. It's load-bearing infrastructure. It includes:

Schema enforcement — structured output validation before anything downstream sees the result. Not "does the JSON parse" but "does this output satisfy the business constraints it was supposed to satisfy."
Confidence routing — when the model signals uncertainty, the output should not go to production. Route to a fallback rule, a human queue, or a conservative default.
Semantic drift detection — over time, the distribution of what your model produces drifts. Not because the model changed, but because the world feeding it changed. Monitor output distributions the same way you'd monitor latency percentiles.

What I'd tell myself three years ago

The model is not the system. The model is one component inside a system that has to earn the right to touch production state. It earns that right through versioned contracts, explicit validation, bounded context, and audit trails.

Every shortcut you take on those four things will come back as a production incident. I know because I've taken most of them.

Build boring AI systems. Your on-call rotation will thank you.

If you've been through something similar — or disagree with any of this — I'd genuinely like to hear it in the comments.

Choosing the Right Gemma 4 Model: A Practical Guide

Printo Tom — Mon, 11 May 2026 06:00:00 +0000

Gemma 4 isn’t just one model — it’s three distinct flavors. Picking the right one can make or break your project.** With Google’s latest open model family, developers now have access to native multimodal capabilities, advanced reasoning, and a massive 128K context window. But the real power lies in choosing the right variant for your use case.

🧩 The Three Flavors of Gemma 4

Small (2B / 4B):
Built for ultra‑mobile, edge, and browser deployment. Perfect for IoT projects, mobile apps, or even running on a Raspberry Pi. If you want AI that lives close to the user, this is your pick.
Dense (31B):
A powerhouse that bridges server‑grade performance with local execution. Ideal for enterprise prototypes, chatbots, or applications that need strong reasoning without relying on cloud‑only solutions.
Mixture‑of‑Experts (26B MoE):
Highly efficient and designed for advanced reasoning at scale. Best suited for research, high‑throughput tasks, or scenarios where efficiency matters as much as raw capability.

⚙️ Practical Scenarios

Smart Home IoT Assistant → Small Model
Runs locally, respects privacy, and handles multimodal inputs like voice + sensor data.
Enterprise Knowledge Bot → Dense Model
Balances performance with practicality, enabling long‑context reasoning for business workflows.
Research Reasoning Engine → MoE Model
Efficiently processes complex queries, making it ideal for labs or academic projects.

💡 Key Insight

Choosing a model isn’t about “bigger is better.” It’s about fit for purpose. A Raspberry Pi project thrives on the Small model, while a multimodal research tool demands the MoE. Intentional selection shows you understand both the technology and the problem you’re solving.

📣 Final Thoughts

Gemma 4 opens the door to local AI that’s powerful, flexible, and accessible. The real challenge — and opportunity — is matching the right model to the right context. Experiment, build, and share your journey with the community. That’s how we’ll unlock the full potential of open AI.

DEV Community: Printo Tom

Rate Limiting in C# — Don't Let Your API Get Hammered

Why rate limiting matters

The four built-in algorithms

1. Fixed window

2. Sliding window

3. Token bucket

4. Concurrency limiter

Wiring it up in ASP.NET Core

Per-user and per-endpoint policies

Custom rejection responses

Distributed scenarios & Redis

Client-side resilience with Polly

Choosing the right algorithm

Wrapping up

Designing the Future of Payments — Why XML Still Matters in the Age of APIs

Introduction

🌐 The Evolution of Payment Standards

🧩 Bridging Legacy and Modern Systems

⚙️ Innovation Layer: Schema‑Driven Orchestration

🤖 Copilot’s Contribution

🚀 Vision: XML as the Foundation for Hybrid Financial Ecosystems

From Legacy to Live — Reviving XMLPayments with GitHub Copilot

Introduction

🕰️ Before: The Stalled Prototype

🚀 After: A Polished Framework

🤖 Copilot in Action

🏗️ Architecture Snapshot

📈 Impact

XMLPayments — The Hidden Backbone of Modern Financial Orchestration

Introduction

🌍 The Legacy That Never Died

⚙️ Schema‑Driven Reliability

⚡ Async Architecture for Scale

🤖 Copilot’s Role in Modernization

📊 Outcome

Reviving My Gemma Agentic Framework: From Prototype to Polished Repo

Introduction

Before Snapshot

How GitHub Copilot Helped

After Snapshot

Completion Arc

Conclusion

🏆 Judging Criteria Checklist

First Look at Google AI Studio + Gemini at I/O 2026

🚀 First Look at Google AI Studio + Gemini at I/O 2026

🌟 Why It Matters

🛠️ Hands-On Impressions

🔍 My Take

📚 Resources

Hermes Agent: Why Open Agentic Systems Matter

🚀 What Is Hermes Agent?

🛠️ Why Hermes Agent Stands Out

🔍 Hermes vs. Other Agentic Frameworks

🌍 Why Open Agentic Systems Matter

✨ My Takeaway

📌 Final Thoughts

When AI Meets Reality: Why “Hello World” Isn’t Enough for LLM Systems

The Mirage of Staging

Failure Modes Define Architecture

Prompts Are Contracts Too

The Trust Layer Is Load‑Bearing

Build Boring AI

Introducing the AI Workflow Starter Kit: Build, Fork, and Extend AI Workflows Faster

Launching *Claude for Legal*: A Toolkit for Modern Legal Workflows

🔑 What’s Inside

⚡ Benefits

🚀 Getting Started

🌟 Why It Matters

The AI system that worked in staging destroyed us in production. Here's what we missed.

The staging trap

Fraud signals don't behave like pricing signals — and that matters architecturally

The prompt is a contract. Treat it like one.

The layer nobody builds until they need it

What I'd tell myself three years ago

Choosing the Right Gemma 4 Model: A Practical Guide

Launching Claude for Legal: A Toolkit for Modern Legal Workflows