<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ayi NEDJIMI</title>
    <description>The latest articles on DEV Community by Ayi NEDJIMI (@ayinedjimi-consultants).</description>
    <link>https://dev.to/ayinedjimi-consultants</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3944946%2Fd4fe3b16-5ce2-4120-8cb6-84fdeb9e3f23.png</url>
      <title>DEV Community: Ayi NEDJIMI</title>
      <link>https://dev.to/ayinedjimi-consultants</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ayinedjimi-consultants"/>
    <language>en</language>
    <item>
      <title>Trivy vs Grype vs Snyk: Container Vulnerability Scanners Shootout</title>
      <dc:creator>Ayi NEDJIMI</dc:creator>
      <pubDate>Wed, 10 Jun 2026 10:05:22 +0000</pubDate>
      <link>https://dev.to/ayinedjimi-consultants/trivy-vs-grype-vs-snyk-container-vulnerability-scanners-shootout-1j20</link>
      <guid>https://dev.to/ayinedjimi-consultants/trivy-vs-grype-vs-snyk-container-vulnerability-scanners-shootout-1j20</guid>
      <description>&lt;p&gt;Every container image you ship is a potential attack surface. Running nginx with an unpatched OpenSSL? A base image with a known privilege-escalation CVE? Your scanner should catch it before your attacker does. The real question is which tool is worth integrating into your pipeline without drowning your team in noise.&lt;/p&gt;

&lt;p&gt;I tested Trivy, Grype, and Snyk across real-world images — from a lean Alpine-based Go service to a bloated Python data science container. Here's what I found.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Contenders at a Glance
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Trivy&lt;/strong&gt; (Aqua Security) is a general-purpose vulnerability scanner covering container images, filesystems, Git repositories, Kubernetes configs, and IaC. It's fast, fully open-source, and ships as a single binary with no daemon required.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Grype&lt;/strong&gt; (Anchore) is narrower in focus — containers and filesystems. It's also open-source, uses the Syft SBOM engine under the hood for package enumeration, and integrates cleanly with the broader Anchore ecosystem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Snyk&lt;/strong&gt; takes a commercial-platform approach (with a free tier). It wraps vulnerability scanning in a developer-experience layer: fix suggestions, pull request checks, IDE plugins, and a web dashboard. Useful, but with trade-offs discussed below.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installation and First Scan
&lt;/h2&gt;

&lt;p&gt;All three are easy to run locally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Trivy — single binary, no auth needed&lt;/span&gt;
brew &lt;span class="nb"&gt;install &lt;/span&gt;aquasecurity/trivy/trivy
trivy image python:3.11-slim

&lt;span class="c"&gt;# Grype — also single binary, no auth needed&lt;/span&gt;
brew &lt;span class="nb"&gt;install &lt;/span&gt;anchore/grype/grype
grype python:3.11-slim

&lt;span class="c"&gt;# Snyk — requires an account and token&lt;/span&gt;
npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; snyk
snyk auth
snyk container &lt;span class="nb"&gt;test &lt;/span&gt;python:3.11-slim
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For CI/CD, Trivy and Grype both work without authentication. Snyk requires a token, adding friction in ephemeral build environments unless you already use Snyk's platform. Here's a minimal GitHub Actions step for Trivy that blocks the build on fixable HIGH or CRITICAL CVEs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Scan container image&lt;/span&gt;
  &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aquasecurity/trivy-action@master&lt;/span&gt;
  &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image-ref&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ env.IMAGE_TAG }}&lt;/span&gt;
    &lt;span class="na"&gt;format&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;table'&lt;/span&gt;
    &lt;span class="na"&gt;exit-code&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;1'&lt;/span&gt;
    &lt;span class="na"&gt;ignore-unfixed&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;CRITICAL,HIGH'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Grype has an equivalent action (&lt;code&gt;anchore/scan-action&lt;/code&gt;). Snyk offers a GitHub App that handles scanning at the repository level via a dashboard, but the CI step still requires an API token injected as a secret.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Each Scanner Actually Finds
&lt;/h2&gt;

&lt;p&gt;I scanned three images and compared raw finding counts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;python:3.11-slim&lt;/code&gt; — a common base image&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;node:18-alpine&lt;/code&gt; — leaner, Alpine-based&lt;/li&gt;
&lt;li&gt;A custom Go API image built on &lt;code&gt;golang:1.21-alpine&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Trivy&lt;/strong&gt; reported the most CVEs overall. On &lt;code&gt;python:3.11-slim&lt;/code&gt;, it returned 78 vulnerabilities (14 HIGH, 2 CRITICAL on the test date) across OS packages. With &lt;code&gt;--scanners vuln,secret&lt;/code&gt;, it also flags hardcoded credentials in image layers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Grype&lt;/strong&gt; found 71 on the same image. The delta stems from differences in how each tool maps package versions to CVE databases. Both pull from NVD, GitHub Advisory, and OS-specific sources (Debian Security Tracker, Alpine SecDB, Red Hat OVAL), but their normalisation logic diverges on edge cases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Snyk&lt;/strong&gt; found 65, but surfaced the most actionable output: a recommended base image (&lt;code&gt;python:3.11.9-slim-bookworm&lt;/code&gt;) that would eliminate 42 findings in a single change. That kind of signal cuts real triage time.&lt;/p&gt;

&lt;p&gt;For custom triage, parsing JSON output programmatically beats reading tables. Here's a Python snippet that reads Trivy's JSON output and prints only fixable HIGH/CRITICAL findings:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;parse_trivy_report&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;report&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;report&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]):&lt;/span&gt;
        &lt;span class="n"&gt;target&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Target&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;vuln&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Vulnerabilities&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]):&lt;/span&gt;
            &lt;span class="n"&gt;severity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vuln&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Severity&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;fixed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vuln&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FixedVersion&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;severity&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HIGH&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CRITICAL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;fixed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;severity&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;vuln&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;VulnerabilityID&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                    &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;vuln&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;PkgName&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;vuln&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;InstalledVersion&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                    &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-&amp;gt; fix available: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;fixed&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;parse_trivy_report&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run it with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;trivy image &lt;span class="nt"&gt;--format&lt;/span&gt; json &lt;span class="nt"&gt;--output&lt;/span&gt; report.json my-image:latest
python parse_trivy.py report.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Grype and Snyk both produce structured JSON as well, though with different schemas. Trivy's is the most consistent for scripting.&lt;/p&gt;

&lt;h2&gt;
  
  
  Speed, DB Updates, and Offline Use
&lt;/h2&gt;

&lt;p&gt;Trivy and Grype both download and cache their vulnerability databases locally. On first run, expect a 20-50 MB download. Subsequent runs use the cache and pull only diffs.&lt;/p&gt;

&lt;p&gt;Approximate scan times on a 2024 M3 MacBook Pro:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scanner&lt;/th&gt;
&lt;th&gt;Cold (no cache)&lt;/th&gt;
&lt;th&gt;Warm (cached)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Trivy&lt;/td&gt;
&lt;td&gt;~35s&lt;/td&gt;
&lt;td&gt;~8s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Grype&lt;/td&gt;
&lt;td&gt;~40s&lt;/td&gt;
&lt;td&gt;~10s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Snyk&lt;/td&gt;
&lt;td&gt;~25s&lt;/td&gt;
&lt;td&gt;~20s*&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;*Snyk sends image metadata to their servers for analysis on every scan — no true local cache. This has a privacy implication: proprietary package names, internal library versions, and image layer metadata all leave your environment. For regulated industries, that's a non-starter.&lt;/p&gt;

&lt;p&gt;Trivy and Grype run entirely air-gapped after the DB download. This matters in finance, healthcare, and government contexts. If you're building a formal container security posture, tooling choice belongs in your threat model — &lt;a href="https://ayinedjimi-consultants.fr/checklists" rel="noopener noreferrer"&gt;our container security hardening checklists&lt;/a&gt; cover this decision point with specific control mappings.&lt;/p&gt;

&lt;h2&gt;
  
  
  Suppressing False Positives
&lt;/h2&gt;

&lt;p&gt;Every scanner flags CVEs for package features you don't compile, vulnerabilities that require local shell access in an environment with no shell, or disputed findings. Managing that noise is where workflow matters.&lt;/p&gt;

&lt;p&gt;Trivy uses a &lt;code&gt;.trivyignore&lt;/code&gt; file at the repo root:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;&lt;span class="c"&gt;# HTTP/2 rapid reset — not exposed externally in this service
&lt;/span&gt;&lt;span class="n"&gt;CVE&lt;/span&gt;-&lt;span class="m"&gt;2023&lt;/span&gt;-&lt;span class="m"&gt;44487&lt;/span&gt;
&lt;span class="c"&gt;# Disputed, no practical impact in our configuration
&lt;/span&gt;&lt;span class="n"&gt;CVE&lt;/span&gt;-&lt;span class="m"&gt;2024&lt;/span&gt;-&lt;span class="m"&gt;12345&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Grype uses &lt;code&gt;.grype.yaml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;ignore&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;vulnerability&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CVE-2023-44487&lt;/span&gt;
    &lt;span class="na"&gt;reason&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HTTP/2&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;rapid&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;reset&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;—&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;not&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;exposed&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;in&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;this&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;service"&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;fix-state&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;not-fixed&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;fix-state: not-fixed&lt;/code&gt; rule is particularly useful: it silences all unfixed CVEs globally, letting you focus on what is actually actionable. Trivy's &lt;code&gt;--ignore-unfixed&lt;/code&gt; flag does the same thing.&lt;/p&gt;

&lt;p&gt;Snyk handles suppressions through its web UI, which is friendlier for non-security engineers but requires everyone to be in the Snyk platform to see the reasoning behind each ignored finding.&lt;/p&gt;

&lt;h2&gt;
  
  
  Which One Should You Use?
&lt;/h2&gt;

&lt;p&gt;After running all three in CI for several months, here is my honest recommendation:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use Trivy&lt;/strong&gt; as your default. It covers containers, IaC, secrets, and Kubernetes manifests in one binary. The GitHub Actions integration is stable, the JSON output is scriptable, and there is no vendor dependency. This is the right choice for most teams.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use Grype&lt;/strong&gt; if SBOM generation is a compliance requirement. The Syft + Grype combination produces a reproducible, auditable artifact trail — a container's software bill of materials plus its known vulnerabilities, tied together. If your security team or customers are asking for SBOMs, this pipeline is the cleanest way to deliver them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use Snyk&lt;/strong&gt; if your team has few security specialists and needs fix guidance embedded in developer tools. The "upgrade this base image to fix 40% of your findings" recommendation saves real time. The free tier works for open-source projects; serious production use pushes you toward a paid plan quickly.&lt;/p&gt;

&lt;p&gt;In practice, many teams run Trivy in the pipeline (fast, free, CI-native) and offer Snyk IDE plugins to developers who want inline fix suggestions. That combination covers pipeline enforcement and developer experience without full vendor lock-in.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Takeaway
&lt;/h2&gt;

&lt;p&gt;Container scanning is table stakes now. The real problem is avoiding alert fatigue. Trivy is the best default for most teams: fast, open-source, no auth required, wide coverage. Grype wins on the SBOM compliance path. Snyk earns its seat when developer adoption and fix guidance matter more than control.&lt;/p&gt;

&lt;p&gt;Whichever tool you integrate, block on CRITICAL findings with a fix available, suppress known-irrelevant CVEs explicitly, and review your ignore list quarterly. Noise without action is worse than no scanning at all.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I run &lt;a href="https://ayinedjimi-consultants.fr" rel="noopener noreferrer"&gt;AYI NEDJIMI Consultants&lt;/a&gt;, a cybersecurity consulting firm. We publish &lt;a href="https://ayinedjimi-consultants.fr/checklists" rel="noopener noreferrer"&gt;free security hardening checklists&lt;/a&gt; — PDF and Excel.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>security</category>
      <category>devops</category>
      <category>python</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Wazuh vs Elastic SIEM vs Splunk Free: Open-Source SIEM Compared (2026)</title>
      <dc:creator>Ayi NEDJIMI</dc:creator>
      <pubDate>Tue, 09 Jun 2026 10:03:21 +0000</pubDate>
      <link>https://dev.to/ayinedjimi-consultants/wazuh-vs-elastic-siem-vs-splunk-free-open-source-siem-compared-2026-ep2</link>
      <guid>https://dev.to/ayinedjimi-consultants/wazuh-vs-elastic-siem-vs-splunk-free-open-source-siem-compared-2026-ep2</guid>
      <description>&lt;p&gt;Your servers are generating thousands of log lines per minute, and you have no idea which ones matter. A SIEM (Security Information and Event Management) platform is supposed to solve that — correlate events, detect threats, and alert your team before damage is done. The problem: commercial SIEM licenses can cost $50k/year and up. The free alternatives are genuinely capable, but they make very different trade-offs.&lt;/p&gt;

&lt;p&gt;This article compares three realistic options for teams without an enterprise security budget: Wazuh, the Elastic SIEM stack, and Splunk Free.&lt;/p&gt;

&lt;h2&gt;
  
  
  What We're Actually Comparing
&lt;/h2&gt;

&lt;p&gt;"Free" means different things for each platform:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Wazuh&lt;/strong&gt;: Fully open-source (GPLv2 for the manager, Apache 2.0 for the dashboard). No license caps, no feature gating. Community-supported with commercial support options.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Elastic SIEM&lt;/strong&gt;: The security analytics layer on top of Elasticsearch + Kibana. The SIEM features themselves are free, but Elastic's licensing changed in 2021 — you're running SSPL unless you use the managed cloud offering. Self-hosted is fine, just know what you're agreeing to.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Splunk Free&lt;/strong&gt;: Hard-capped at 500 MB/day ingestion. Above that, it stops indexing. Not viable for anything beyond a small homelab or a single application.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That 500 MB cap effectively eliminates Splunk Free from most real deployments, so this comparison becomes primarily Wazuh vs Elastic SIEM, with Splunk Free as a distant third for specific niche use cases.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture Differences
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Wazuh&lt;/strong&gt; is agent-based. You install a Wazuh agent on each host you want to monitor. Agents ship logs, file integrity events, process activity, and vulnerability scan results back to the Wazuh manager. The manager runs correlation rules and sends alerts. The UI is a fork of Kibana, skinned for security workflows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Query Wazuh alerts via its REST API
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="n"&gt;WAZUH_API&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://your-wazuh-manager:55000&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;authenticate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;password&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;WAZUH_API&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/security/user/authenticate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;password&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;verify&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raise_for_status&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;token&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_recent_alerts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;level_min&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;limit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sort&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-timestamp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;q&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rule.level&amp;gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;level_min&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;WAZUH_API&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/alerts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;verify&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raise_for_status&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{}).&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;affected_items&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;authenticate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wazuh-wui&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-password&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;alerts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_recent_alerts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;level_min&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;alert&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;alerts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;agent_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;alert&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{}).&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unknown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;alert&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;rule&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;level&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;alert&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;rule&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; — &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;agent_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Elastic SIEM&lt;/strong&gt; works differently. You use Beats (Filebeat, Winlogbeat, Packetbeat) or Elastic Agent to ship logs into Elasticsearch. Kibana's Security app then provides detection rules, timelines, and case management on top of that raw data. Detection rules are written in EQL (Event Query Language) or KQL.&lt;/p&gt;

&lt;p&gt;The architecture is more flexible but also more complex. You're assembling a stack (Elasticsearch + Kibana + ingestion layer), whereas Wazuh ships as a more cohesive bundle.&lt;/p&gt;

&lt;h2&gt;
  
  
  Detection Quality Out of the Box
&lt;/h2&gt;

&lt;p&gt;This is where Wazuh has a genuine advantage for teams that want something working on day one. Wazuh ships with 3,000+ detection rules covering Linux syscalls, Windows event IDs, web application attacks, Docker/Kubernetes activity, and common CVEs. The rules are maintained by the community and mapped to MITRE ATT&amp;amp;CK.&lt;/p&gt;

&lt;p&gt;Elastic's detection rules are also MITRE-mapped and actively maintained, but they live in a separate GitHub repository and need to be imported. Once imported, the quality is comparable — there's just more setup friction.&lt;/p&gt;

&lt;p&gt;For custom rules, Elastic wins on expressiveness. EQL is designed for threat hunting and lets you correlate sequences of events across time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Run an EQL sequence query with the Elasticsearch Python client
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;elasticsearch&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Elasticsearch&lt;/span&gt;

&lt;span class="n"&gt;es&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Elasticsearch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://your-elastic:9200&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;key-id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;api-key-value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Detect process injection: cmd.exe from a non-shell parent,
# then outbound TLS connection within 30 seconds
&lt;/span&gt;&lt;span class="n"&gt;eql_query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sequence by host.name with maxspan=30s&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  [process where process.name == &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cmd.exe&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;   and process.parent.name != &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;explorer.exe&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;   and process.parent.name != &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;powershell.exe&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;]&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  [network where destination.port == 443&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;   and process.name == &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cmd.exe&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;es&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;eql&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;logs-*&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;eql_query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;size&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;hit&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hits&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sequences&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hit&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;events&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;host&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_source&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;host&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Suspicious cmd.exe network activity on &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;dst&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_source&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;destination&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{}).&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ip&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;N/A&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;proc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_source&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;process&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;proc&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; -&amp;gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;dst&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This kind of event-sequence correlation is harder to express in Wazuh's XML rule format, though Wazuh 4.x added composite rules that partially close the gap.&lt;/p&gt;

&lt;h2&gt;
  
  
  Scalability and Operational Cost
&lt;/h2&gt;

&lt;p&gt;At small scale (under 500 agents, 10k events per second), both platforms work fine on modest hardware. Wazuh's recommended minimum for production is 4 CPU cores and 8 GB RAM for the manager.&lt;/p&gt;

&lt;p&gt;Elastic's RAM appetite is well-known. Elasticsearch JVM heap should be 50% of available RAM, capped at 31 GB for GC performance reasons. A production Elastic stack that handles real log volumes comfortably needs 16-32 GB RAM and fast SSDs. Multi-node clusters multiply that.&lt;/p&gt;

&lt;p&gt;Wazuh also uses Elasticsearch under the hood (the indexer component is a rebranded OpenSearch fork), but operational complexity is lower because Wazuh manages it for you. You don't tune JVM settings manually unless something breaks.&lt;/p&gt;

&lt;p&gt;For compliance use cases (PCI-DSS, HIPAA, SOC 2), Wazuh includes built-in compliance dashboards mapped to specific control frameworks. Elastic requires you to build these views yourself or pay for the Elastic Security subscription. &lt;a href="https://ayinedjimi-consultants.fr/checklists" rel="noopener noreferrer"&gt;Our security hardening checklists&lt;/a&gt; include the specific Wazuh rules and agent policies needed to satisfy each control — saved a few teams days of mapping work.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Honest Comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Criterion&lt;/th&gt;
&lt;th&gt;Wazuh&lt;/th&gt;
&lt;th&gt;Elastic SIEM&lt;/th&gt;
&lt;th&gt;Splunk Free&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;License&lt;/td&gt;
&lt;td&gt;Open-source&lt;/td&gt;
&lt;td&gt;SSPL (self-hosted)&lt;/td&gt;
&lt;td&gt;Free, 500 MB/day cap&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Setup time&lt;/td&gt;
&lt;td&gt;2-4 hours&lt;/td&gt;
&lt;td&gt;4-8 hours&lt;/td&gt;
&lt;td&gt;1-2 hours&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Default rules&lt;/td&gt;
&lt;td&gt;3,000+&lt;/td&gt;
&lt;td&gt;~650 (must import)&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Advanced queries&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;EQL - excellent&lt;/td&gt;
&lt;td&gt;SPL - excellent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RAM requirements&lt;/td&gt;
&lt;td&gt;Moderate&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Moderate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Compliance dashboards&lt;/td&gt;
&lt;td&gt;Built-in&lt;/td&gt;
&lt;td&gt;Manual&lt;/td&gt;
&lt;td&gt;Manual&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scalability ceiling&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Very high&lt;/td&gt;
&lt;td&gt;Hard-capped&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Choose Wazuh if&lt;/strong&gt;: you want a cohesive security platform with agent-based monitoring, built-in compliance reports, and lower operational overhead. It's the pragmatic choice for teams of 1-5 people responsible for security.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choose Elastic SIEM if&lt;/strong&gt;: you already run Elasticsearch for other workloads, have large log volumes that need full-text search flexibility, or your team has threat hunters who need EQL for complex correlation queries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Avoid Splunk Free&lt;/strong&gt; for anything beyond a demo environment. The 500 MB/day cap hits immediately in any real deployment.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Takeaway
&lt;/h2&gt;

&lt;p&gt;There's no universally better option — the right SIEM depends on your team size, existing infrastructure, and what "free" actually means operationally. Wazuh wins on out-of-box security coverage and setup simplicity. Elastic SIEM wins on query power and ecosystem depth. Splunk Free is a demo, not a production tool.&lt;/p&gt;

&lt;p&gt;For most small-to-medium teams starting from scratch, Wazuh is where to begin. You can have detection rules running, agents deployed, and dashboards live in an afternoon. Once you've outgrown it or need more advanced correlation, the migration path to Elastic is well-documented.&lt;/p&gt;

&lt;p&gt;Start with working detection, not the most powerful platform you can't configure.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I run &lt;a href="https://ayinedjimi-consultants.fr" rel="noopener noreferrer"&gt;AYI NEDJIMI Consultants&lt;/a&gt;, a cybersecurity consulting firm. We publish &lt;a href="https://ayinedjimi-consultants.fr/checklists" rel="noopener noreferrer"&gt;free security hardening checklists&lt;/a&gt; — PDF and Excel.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>security</category>
      <category>devops</category>
      <category>python</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Terraform vs Pulumi vs Ansible: IaC for small teams</title>
      <dc:creator>Ayi NEDJIMI</dc:creator>
      <pubDate>Mon, 08 Jun 2026 10:05:14 +0000</pubDate>
      <link>https://dev.to/ayinedjimi-consultants/terraform-vs-pulumi-vs-ansible-iac-for-small-teams-3p3m</link>
      <guid>https://dev.to/ayinedjimi-consultants/terraform-vs-pulumi-vs-ansible-iac-for-small-teams-3p3m</guid>
      <description>&lt;p&gt;You're a team of three engineers. One of you just broke staging by running a script manually. Nobody knows what the current state of infra actually is. IaC is the obvious fix — but Terraform, Pulumi, and Ansible all claim to solve this. Which one?&lt;/p&gt;

&lt;p&gt;The short answer: they're not the same kind of tool, and the right choice depends on your team's profile more than the tools' feature lists.&lt;/p&gt;

&lt;h2&gt;
  
  
  What each tool actually does
&lt;/h2&gt;

&lt;p&gt;It's worth being precise about categories because "IaC" gets used too loosely:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Terraform&lt;/strong&gt; (HashiCorp, now BSL licensed): declarative, state-file-based cloud provisioning. You describe desired state; Terraform computes the diff and applies it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pulumi&lt;/strong&gt;: same concept as Terraform, but you write real code (Python, Go, TypeScript) instead of HCL.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ansible&lt;/strong&gt;: imperative, agentless configuration management. Strong for day-2 ops (installing packages, configuring services), weaker for provisioning.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These aren't identical tools competing for the same niche.&lt;/p&gt;

&lt;h2&gt;
  
  
  When Ansible makes sense — and when it doesn't
&lt;/h2&gt;

&lt;p&gt;Ansible is solid for: configuring existing servers, deploying applications, running ordered multi-step procedures across a fleet.&lt;/p&gt;

&lt;p&gt;It's awkward for: managing cloud resources from scratch. The modules exist (for EC2, VPCs, RDS), but state management is painful. Ansible doesn't track what it created — each run answers "did this task succeed?", not "does this resource still exist?". You end up with orphaned resources and no clear inventory.&lt;/p&gt;

&lt;p&gt;For a small team provisioning cloud infra, Ansible alone isn't sufficient. The pattern that actually works: provision with Terraform, configure with Ansible.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# playbook.yml — configure a Python app on a freshly provisioned EC2 instance&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deploy Python app&lt;/span&gt;
  &lt;span class="na"&gt;hosts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;web&lt;/span&gt;
  &lt;span class="na"&gt;become&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;yes&lt;/span&gt;
  &lt;span class="na"&gt;tasks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Install system dependencies&lt;/span&gt;
      &lt;span class="na"&gt;apt&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;python3&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;python3-pip&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;nginx&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
        &lt;span class="na"&gt;state&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;present&lt;/span&gt;
        &lt;span class="na"&gt;update_cache&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;yes&lt;/span&gt;

    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deploy application files&lt;/span&gt;
      &lt;span class="na"&gt;copy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;src&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./app/&lt;/span&gt;
        &lt;span class="na"&gt;dest&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/opt/myapp/&lt;/span&gt;
        &lt;span class="na"&gt;owner&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;www-data&lt;/span&gt;

    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Install Python requirements&lt;/span&gt;
      &lt;span class="na"&gt;pip&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;requirements&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/opt/myapp/requirements.txt&lt;/span&gt;
        &lt;span class="na"&gt;executable&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pip3&lt;/span&gt;

    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Enable and start service&lt;/span&gt;
      &lt;span class="na"&gt;systemd&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;myapp&lt;/span&gt;
        &lt;span class="na"&gt;state&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;started&lt;/span&gt;
        &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;yes&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works well. But you need Terraform or Pulumi to create that instance in the first place.&lt;/p&gt;

&lt;h2&gt;
  
  
  Terraform: the safe default
&lt;/h2&gt;

&lt;p&gt;Terraform has been the standard for cloud provisioning for years. HCL is learnable in a day, the provider ecosystem covers AWS, GCP, Azure, Cloudflare, GitHub, Datadog, and hundreds more. The &lt;code&gt;plan/apply&lt;/code&gt; workflow gives you an explicit diff to review before any change lands.&lt;/p&gt;

&lt;p&gt;For a small team, the main advantages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Anyone can read and write HCL without a deep programming background&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;terraform plan&lt;/code&gt; output is reviewable in PRs — non-engineers can spot obvious regressions&lt;/li&gt;
&lt;li&gt;State locking via S3 + DynamoDB prevents concurrent applies from corrupting state&lt;/li&gt;
&lt;li&gt;The BSL license change only affects vendors building Terraform-as-a-service; internal use remains free&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The friction:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;HCL is not a real programming language. Complex logic via &lt;code&gt;for_each&lt;/code&gt;, dynamic blocks, and conditionals becomes unwieldy at scale&lt;/li&gt;
&lt;li&gt;No native test framework — you need Terratest or heavy &lt;code&gt;terraform validate&lt;/code&gt; discipline&lt;/li&gt;
&lt;li&gt;State drift is a real burden: manual cloud console changes require &lt;code&gt;terraform import&lt;/code&gt; or &lt;code&gt;state rm&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# main.tf — minimal VPC + EC2 for a Go API on AWS&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_vpc"&lt;/span&gt; &lt;span class="s2"&gt;"main"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;cidr_block&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"10.0.0.0/16"&lt;/span&gt;
  &lt;span class="nx"&gt;tags&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"main"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_subnet"&lt;/span&gt; &lt;span class="s2"&gt;"public"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;vpc_id&lt;/span&gt;            &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_vpc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;main&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;cidr_block&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"10.0.1.0/24"&lt;/span&gt;
  &lt;span class="nx"&gt;availability_zone&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"eu-west-1a"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_security_group"&lt;/span&gt; &lt;span class="s2"&gt;"api"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"api-sg"&lt;/span&gt;
  &lt;span class="nx"&gt;vpc_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_vpc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;main&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;

  &lt;span class="nx"&gt;ingress&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;from_port&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8080&lt;/span&gt;
    &lt;span class="nx"&gt;to_port&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8080&lt;/span&gt;
    &lt;span class="nx"&gt;protocol&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"tcp"&lt;/span&gt;
    &lt;span class="nx"&gt;cidr_blocks&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"0.0.0.0/0"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;egress&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;from_port&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="nx"&gt;to_port&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="nx"&gt;protocol&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"-1"&lt;/span&gt;
    &lt;span class="nx"&gt;cidr_blocks&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"0.0.0.0/0"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_instance"&lt;/span&gt; &lt;span class="s2"&gt;"api"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;ami&lt;/span&gt;                    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"ami-0c55b159cbfafe1f0"&lt;/span&gt;
  &lt;span class="nx"&gt;instance_type&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"t3.small"&lt;/span&gt;
  &lt;span class="nx"&gt;subnet_id&lt;/span&gt;              &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_subnet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;public&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;vpc_security_group_ids&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;aws_security_group&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="nx"&gt;tags&lt;/span&gt;                   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"go-api"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Readable, reviewable, and straightforward for any team member — including whoever joined last month.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pulumi: real code, real tradeoffs
&lt;/h2&gt;

&lt;p&gt;Pulumi replaces HCL with actual programming languages. The same stack in Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pulumi&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pulumi_aws&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;aws&lt;/span&gt;

&lt;span class="n"&gt;vpc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;aws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ec2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Vpc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;main&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;cidr_block&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;10.0.0.0/16&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;main&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;subnet&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;aws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ec2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Subnet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;public&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;vpc_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;vpc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;cidr_block&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;10.0.1.0/24&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;availability_zone&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;eu-west-1a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;sg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;aws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ec2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;SecurityGroup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;api-sg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;vpc_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;vpc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ingress&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;from_port&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;8080&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;to_port&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;8080&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;protocol&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tcp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cidr_blocks&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0.0.0.0/0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="n"&gt;egress&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;from_port&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;to_port&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;protocol&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cidr_blocks&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0.0.0.0/0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;instance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;aws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ec2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Instance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;go-api&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ami&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ami-0c55b159cbfafe1f0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instance_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;t3.small&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;subnet_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;subnet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;vpc_security_group_ids&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;sg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;go-api&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;pulumi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;export&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;instance_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;instance&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The real appeal: you can loop over a list of environments without &lt;code&gt;count&lt;/code&gt; gymnastics, write unit tests with &lt;code&gt;pytest&lt;/code&gt;, reuse Python functions, and import your own internal libraries. For infra with 10+ environments or heavily dynamic configuration, Pulumi's expressiveness is a genuine win.&lt;/p&gt;

&lt;p&gt;The tradeoffs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;State goes to Pulumi Cloud by default&lt;/strong&gt; — there's a free tier, but self-hosting requires more setup than Terraform's S3 backend&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Steeper onboarding&lt;/strong&gt;: new team members need to know the language &lt;em&gt;and&lt;/em&gt; Pulumi's async resource graph model. Debugging &lt;code&gt;Output[T]&lt;/code&gt; types is not obvious&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Smaller community&lt;/strong&gt;: Stack Overflow coverage and available modules are thinner than Terraform's&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Practical decision tree for small teams
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Use Ansible&lt;/strong&gt; when you're configuring existing servers, not provisioning from scratch.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use Terraform&lt;/strong&gt; when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your team has mixed backgrounds — not everyone writes Python or Go fluently&lt;/li&gt;
&lt;li&gt;Your infra is relatively stable: fewer than five environments, no heavy dynamic resource generation&lt;/li&gt;
&lt;li&gt;You want the largest provider ecosystem and the most community resources&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Use Pulumi&lt;/strong&gt; when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your team is developer-heavy and HCL feels like a cognitive tax&lt;/li&gt;
&lt;li&gt;You need real programming constructs: dynamic loops, proper unit tests, shared libraries across stacks&lt;/li&gt;
&lt;li&gt;You're comfortable with Pulumi Cloud pricing or willing to configure a self-hosted state backend&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For most teams under five engineers, Terraform is the right starting point. The plan/apply review workflow fits naturally into PRs, HCL is fast to learn, and you won't hit its limits until your infra grows significantly.&lt;/p&gt;

&lt;p&gt;One thing that matters regardless of tool: gate every &lt;code&gt;apply&lt;/code&gt; behind CI/CD with mandatory code review on infra changes. A misconfiguration merged without review is where incidents start. We maintain a &lt;a href="https://ayinedjimi-consultants.fr/checklists" rel="noopener noreferrer"&gt;security checklist for IaC pipelines&lt;/a&gt; covering state file access control, least-privilege IAM for CI runners, and safe secrets handling in both Terraform and Pulumi.&lt;/p&gt;

&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;Terraform, Pulumi, and Ansible aren't direct competitors — they solve overlapping but distinct problems. Ansible handles configuration; Terraform and Pulumi handle provisioning. Between the two provisioning tools, Terraform wins on simplicity and ecosystem breadth; Pulumi wins when your infra logic outgrows what HCL can express cleanly.&lt;/p&gt;

&lt;p&gt;Pick based on your team's actual skills and infra complexity, not on which tool had the best talk at the last DevOps conference.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I run &lt;a href="https://ayinedjimi-consultants.fr" rel="noopener noreferrer"&gt;AYI NEDJIMI Consultants&lt;/a&gt;, a cybersecurity consulting firm. We publish &lt;a href="https://ayinedjimi-consultants.fr/checklists" rel="noopener noreferrer"&gt;free security hardening checklists&lt;/a&gt; — PDF and Excel.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devops</category>
      <category>tutorial</category>
      <category>programming</category>
      <category>python</category>
    </item>
    <item>
      <title>Celery vs RQ vs Dramatiq: Python Task Queues Compared</title>
      <dc:creator>Ayi NEDJIMI</dc:creator>
      <pubDate>Sun, 07 Jun 2026 10:08:13 +0000</pubDate>
      <link>https://dev.to/ayinedjimi-consultants/celery-vs-rq-vs-dramatiq-python-task-queues-compared-426m</link>
      <guid>https://dev.to/ayinedjimi-consultants/celery-vs-rq-vs-dramatiq-python-task-queues-compared-426m</guid>
      <description>&lt;p&gt;When your API handler starts timing out because it's doing too much at once — sending emails, generating reports, hitting third-party APIs — you reach for a task queue. Python gives you three serious options: Celery, RQ, and Dramatiq. Picking the wrong one will cost you debugging hours you don't have.&lt;/p&gt;

&lt;p&gt;This is not a benchmark post with synthetic numbers. It's a practical comparison after running all three in real scenarios, covering where each fits, where each breaks, and which one you should reach for first.&lt;/p&gt;

&lt;h2&gt;
  
  
  What a task queue actually does
&lt;/h2&gt;

&lt;p&gt;A task queue moves work out of the request-response cycle. Instead of your web handler blocking on a slow operation, it serializes a message to a broker (Redis, RabbitMQ, or similar), and a separate worker process picks it up asynchronously.&lt;/p&gt;

&lt;p&gt;The main components:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Broker&lt;/strong&gt; — the message bus (Redis, RabbitMQ)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Worker&lt;/strong&gt; — the process that runs your task functions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Task&lt;/strong&gt; — a serialized unit of work with arguments&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Result backend&lt;/strong&gt; — optional storage for return values and status&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All three libraries cover these primitives. The differences show up in configuration overhead, failure handling, and what breaks silently at 2 AM.&lt;/p&gt;

&lt;h2&gt;
  
  
  Celery: powerful, with a price
&lt;/h2&gt;

&lt;p&gt;Celery has been the default Python task queue for over a decade. It supports Redis, RabbitMQ, SQS, and more. It ships with celery-beat for periodic tasks, canvas for chaining and fan-out, and has a large ecosystem of integrations.&lt;/p&gt;

&lt;p&gt;Here is a minimal but production-ready Celery task:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# tasks.py
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;celery&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Celery&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Celery&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;worker&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;broker&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;redis://localhost:6379/0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;backend&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;redis://localhost:6379/1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task_serializer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;accept_content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="nd"&gt;@app.task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bind&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_retries&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;default_retry_delay&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;send_notification&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_dispatch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ok&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;exc&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;retry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;exc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_dispatch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# email / push / SMS logic here
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# start worker&lt;/span&gt;
celery &lt;span class="nt"&gt;-A&lt;/span&gt; tasks worker &lt;span class="nt"&gt;--loglevel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;info &lt;span class="nt"&gt;--concurrency&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;4
&lt;span class="c"&gt;# start scheduler for periodic tasks&lt;/span&gt;
celery &lt;span class="nt"&gt;-A&lt;/span&gt; tasks beat &lt;span class="nt"&gt;--loglevel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;info
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What Celery gets right:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Broad broker support — useful when your infra doesn't include Redis&lt;/li&gt;
&lt;li&gt;Canvas: &lt;code&gt;group&lt;/code&gt;, &lt;code&gt;chain&lt;/code&gt;, &lt;code&gt;chord&lt;/code&gt; for complex multi-step workflows&lt;/li&gt;
&lt;li&gt;celery-beat for cron-style scheduling without external tooling&lt;/li&gt;
&lt;li&gt;Mature retry logic with configurable backoff&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What Celery gets wrong:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The default pickle serializer has a well-documented attack surface — always configure &lt;code&gt;task_serializer="json"&lt;/code&gt; and enable task signing if workers accept input from untrusted sources. A good starting point is this &lt;a href="https://ayinedjimi-consultants.fr/checklists" rel="noopener noreferrer"&gt;security hardening checklist&lt;/a&gt; covering worker configuration and serialization settings&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;prefetch_multiplier&lt;/code&gt; and the concurrency model require tuning, or you'll have workers sitting idle while your queue grows&lt;/li&gt;
&lt;li&gt;Configuration sprawls across &lt;code&gt;celeryconfig.py&lt;/code&gt;, environment variables, and scattered &lt;code&gt;app.conf.update&lt;/code&gt; calls&lt;/li&gt;
&lt;li&gt;Debugging failed tasks in production often means reading the Celery source code&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use Celery when you genuinely need its feature set: complex workflows with fan-out and aggregation, multiple broker options, or celery-beat for reliable scheduling. For simpler use cases, you're paying its complexity tax for nothing.&lt;/p&gt;

&lt;h2&gt;
  
  
  RQ: minimal and honestly good enough
&lt;/h2&gt;

&lt;p&gt;RQ (Redis Queue) takes the opposite approach: one broker (Redis), clean API, minimal configuration. There is no concurrency model to tune beyond worker count, no separate scheduler daemon for basic use cases, and no config file to manage.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# tasks.py
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process_upload&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_convert_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_convert_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# imagine ffmpeg or Pillow here
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.tmp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.mp4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# enqueue.py
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Redis&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;rq&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Queue&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;rq.job&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Retry&lt;/span&gt;

&lt;span class="n"&gt;redis_conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Redis&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Queue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;connection&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;redis_conn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;job&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enqueue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tasks.process_upload&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;file_path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/tmp/upload.tmp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;retry&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;Retry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;max&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;interval&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
    &lt;span class="n"&gt;job_timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Enqueued: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;rq worker &lt;span class="nt"&gt;--with-scheduler&lt;/span&gt; default
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;RQ strengths:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Setup takes minutes, not hours&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;rq info&lt;/code&gt; and &lt;code&gt;rq-dashboard&lt;/code&gt; make job inspection trivial&lt;/li&gt;
&lt;li&gt;Dead-letter queue behavior is obvious and inspectable&lt;/li&gt;
&lt;li&gt;The codebase is small enough to read in an afternoon — critical when debugging production issues&lt;/li&gt;
&lt;li&gt;Results are stored in Redis natively; no second backend to configure and monitor&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;RQ limitations:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Redis-only — no RabbitMQ support&lt;/li&gt;
&lt;li&gt;No native pipeline or chain API; you wire sequential jobs by hand or use callbacks&lt;/li&gt;
&lt;li&gt;The scheduler (&lt;code&gt;--with-scheduler&lt;/code&gt;) works but has less production history than celery-beat&lt;/li&gt;
&lt;li&gt;Community activity has been slower over the past year compared to Celery&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For the majority of web services — async emails, file processing, background API calls, report generation — RQ is the right default. It does the job, stays out of your way, and does not require deep operational knowledge to run safely.&lt;/p&gt;

&lt;h2&gt;
  
  
  Dramatiq: the most correctly designed
&lt;/h2&gt;

&lt;p&gt;Dramatiq is the youngest of the three and arguably the most thoughtfully designed. Its defaults are safer: at-least-once delivery semantics are explicit, the middleware system is composable, and retry behavior does not require diving into source code to understand.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# tasks.py
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dramatiq&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dramatiq.brokers.redis&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RedisBroker&lt;/span&gt;

&lt;span class="n"&gt;broker&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RedisBroker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;redis://localhost:6379&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;dramatiq&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_broker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;broker&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@dramatiq.actor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;max_retries&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;min_backoff&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1_000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;     &lt;span class="c1"&gt;# 1 second
&lt;/span&gt;    &lt;span class="n"&gt;max_backoff&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;300_000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;# 5 minutes
&lt;/span&gt;    &lt;span class="n"&gt;time_limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;30_000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;     &lt;span class="c1"&gt;# 30 second wall clock limit
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle_payment_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;payment.succeeded&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;_record_payment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;amount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_record_payment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payment_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# idempotency check -&amp;gt; DB write -&amp;gt; trigger fulfillment
&lt;/span&gt;    &lt;span class="k"&gt;pass&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;dramatiq tasks &lt;span class="nt"&gt;--processes&lt;/span&gt; 2 &lt;span class="nt"&gt;--threads&lt;/span&gt; 8
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Dramatiq strengths:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Middleware architecture: retries, time limits, rate limits, and age limits are composable and independently testable&lt;/li&gt;
&lt;li&gt;Message schema is explicit; silent deserialization failures are not a surprise you discover in production&lt;/li&gt;
&lt;li&gt;Supports both Redis and RabbitMQ with the same actor code&lt;/li&gt;
&lt;li&gt;Failure semantics are predictable — dead-letter messages stay in a well-defined location you can monitor&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Dramatiq limitations:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No built-in scheduler — you need &lt;code&gt;apscheduler&lt;/code&gt; or &lt;code&gt;dramatiq-cron&lt;/code&gt; as an external dependency&lt;/li&gt;
&lt;li&gt;Smaller ecosystem and community; fewer ready-made integrations with popular frameworks&lt;/li&gt;
&lt;li&gt;Pipelines and groups exist but are less mature and less documented than Celery canvas&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Dramatiq is the right tool when task correctness is non-negotiable: payment processing, audit log ingestion, compliance-sensitive workflows. Its design choices consistently favor "this task ran exactly as expected" over "this task ran fast."&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Celery&lt;/th&gt;
&lt;th&gt;RQ&lt;/th&gt;
&lt;th&gt;Dramatiq&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Brokers&lt;/td&gt;
&lt;td&gt;Redis, RabbitMQ, SQS, more&lt;/td&gt;
&lt;td&gt;Redis only&lt;/td&gt;
&lt;td&gt;Redis, RabbitMQ&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Setup complexity&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Retry handling&lt;/td&gt;
&lt;td&gt;Configurable&lt;/td&gt;
&lt;td&gt;Explicit&lt;/td&gt;
&lt;td&gt;Middleware-based&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scheduling&lt;/td&gt;
&lt;td&gt;celery-beat (built-in)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;--with-scheduler&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;External only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pipelines&lt;/td&gt;
&lt;td&gt;Full canvas&lt;/td&gt;
&lt;td&gt;Manual&lt;/td&gt;
&lt;td&gt;Basic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Maturity&lt;/td&gt;
&lt;td&gt;Very high&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Best for&lt;/td&gt;
&lt;td&gt;Complex workflows&lt;/td&gt;
&lt;td&gt;Simple async jobs&lt;/td&gt;
&lt;td&gt;Correctness-critical work&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;Start with &lt;strong&gt;RQ&lt;/strong&gt;. It handles 80% of production use cases with 20% of the operational complexity. If you hit a wall — you need RabbitMQ, canvas-style pipelines, or battle-tested scheduling — migrate to &lt;strong&gt;Celery&lt;/strong&gt;. If correctness and explicit failure handling matter more than ecosystem size, use &lt;strong&gt;Dramatiq&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Whichever you pick: keep workers stateless, use JSON serialization only, monitor your dead-letter queues, and set explicit timeouts on every task. A task that fails silently is worse than a task that fails loudly. Instrument your workers with proper logging from day one — you will be glad you did the first time something goes missing.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I run &lt;a href="https://ayinedjimi-consultants.fr" rel="noopener noreferrer"&gt;AYI NEDJIMI Consultants&lt;/a&gt;, a cybersecurity consulting firm. We publish &lt;a href="https://ayinedjimi-consultants.fr/checklists" rel="noopener noreferrer"&gt;free security hardening checklists&lt;/a&gt; — PDF and Excel.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>tutorial</category>
      <category>devops</category>
      <category>programming</category>
    </item>
    <item>
      <title>FastAPI vs Flask vs Django: which one for an AI backend in 2026</title>
      <dc:creator>Ayi NEDJIMI</dc:creator>
      <pubDate>Sat, 06 Jun 2026 10:04:17 +0000</pubDate>
      <link>https://dev.to/ayinedjimi-consultants/fastapi-vs-flask-vs-django-which-one-for-an-ai-backend-in-2026-2heo</link>
      <guid>https://dev.to/ayinedjimi-consultants/fastapi-vs-flask-vs-django-which-one-for-an-ai-backend-in-2026-2heo</guid>
      <description>&lt;p&gt;Picking the wrong web framework for your AI backend costs weeks of refactoring — usually when you're already under deadline pressure. In 2026, AI backends have specific requirements: streaming responses, async inference calls, structured input validation, and long-running requests. Here's how FastAPI, Flask, and Django actually hold up.&lt;/p&gt;

&lt;h2&gt;
  
  
  What an AI backend actually needs
&lt;/h2&gt;

&lt;p&gt;Before comparing frameworks, be precise about the requirements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Streaming&lt;/strong&gt;: LLM responses arrive token-by-token. The framework must support Server-Sent Events (SSE) or chunked HTTP without blocking other requests.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Async I/O&lt;/strong&gt;: Inference API calls — remote or local — take 2–20 seconds. Blocking them in a synchronous thread destroys throughput.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Structured validation&lt;/strong&gt;: Inputs (prompts, parameters, file uploads) must be validated and typed before they reach the model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Background tasks&lt;/strong&gt;: Post-processing, logging, fine-tuning triggers — these can't block the request lifecycle.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenAPI docs&lt;/strong&gt;: Useful for teams integrating with the API programmatically.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With these in mind, the comparison gets less abstract.&lt;/p&gt;

&lt;h2&gt;
  
  
  FastAPI: the default choice — and why
&lt;/h2&gt;

&lt;p&gt;FastAPI is the default for AI backends in 2026, and for good reason. It's built on Starlette (ASGI), handles async natively, uses Pydantic for request/response validation, and generates OpenAPI docs automatically. These three things together align almost perfectly with what AI services need.&lt;/p&gt;

&lt;p&gt;Here's a streaming endpoint that proxies a local language model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastAPI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi.responses&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StreamingResponse&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;PromptRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llama3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;stream_from_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;AsyncClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;POST&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:11434/api/generate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;aiter_lines&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;response&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;done&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                        &lt;span class="k"&gt;break&lt;/span&gt;

&lt;span class="nd"&gt;@app.post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/generate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;PromptRequest&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;StreamingResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nf"&gt;stream_from_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;media_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text/event-stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This handles concurrent load correctly because every &lt;code&gt;await&lt;/code&gt; releases the event loop. With Flask or Django in sync mode, you need a thread per request or multiple Gunicorn workers — both add overhead and cap concurrency hard.&lt;/p&gt;

&lt;p&gt;FastAPI’s weak points: it’s thinner than Django. You wire up auth, admin, and ORM integrations yourself. If your backend needs a full user management system or an admin interface, you’ll stitch together libraries.&lt;/p&gt;

&lt;h2&gt;
  
  
  Flask: fine for simple wrappers, painful beyond that
&lt;/h2&gt;

&lt;p&gt;Flask is a solid choice when your “AI backend” is really a thin HTTP wrapper around an external API — no database, no streaming, minimal state. Its simplicity is its strength.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;flask&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Flask&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;jsonify&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Flask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@app.post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/classify&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;classify&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;force&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;jsonify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text field required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}),&lt;/span&gt; &lt;span class="mi"&gt;400&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:11434/api/generate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llama3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Classify as SPAM or HAM. Reply with one word only. Text: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;jsonify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;label&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The cracks show when you need streaming or concurrency. Flask 2.x supports async views, but the underlying WSGI server (Gunicorn, uWSGI) wasn’t built for long-lived connections. You can work around it with gevent or eventlet, but you’re fighting the default model — and those patches have a history of subtle bugs under load.&lt;/p&gt;

&lt;p&gt;Flask also has no built-in validation. You end up writing &lt;code&gt;if "field" not in body&lt;/code&gt; checks everywhere, or you add Marshmallow or Pydantic manually — at which point you’ve built half of FastAPI on top of Flask.&lt;/p&gt;

&lt;p&gt;For a quick internal prototype or a one-off script exposed as HTTP, Flask is fine. For anything with real SLAs and concurrent users, the async limitations become a genuine bottleneck.&lt;/p&gt;

&lt;h2&gt;
  
  
  Django: overkill most of the time, but not always
&lt;/h2&gt;

&lt;p&gt;Django is the heaviest of the three and the hardest to justify for a pure AI backend. Its ORM, migrations, admin, and auth system are genuinely good — but they assume a database-centric application. Most AI backends don’t fit that shape.&lt;/p&gt;

&lt;p&gt;Django’s async support (introduced in 3.1, improved through 4.x and 5.x) is still uneven in the ORM layer. Write an async view, call &lt;code&gt;Model.objects.filter(...)&lt;/code&gt;, and you hit synchronous ORM calls that block the event loop unless you wrap them with &lt;code&gt;sync_to_async&lt;/code&gt;. That friction accumulates across a codebase.&lt;/p&gt;

&lt;p&gt;Where Django makes sense: you’re building a product that combines AI features with a full user-facing application — users log in, manage documents, trigger AI jobs, and view results in a dashboard. In that case, Django provides working auth, admin, and session management out of the box, and you route inference calls to a separate FastAPI service.&lt;/p&gt;

&lt;p&gt;A pattern that holds up well in production: Django handles user management and persistent storage, a FastAPI service handles model inference, and they communicate over an internal HTTP boundary. Each framework does what it was designed for, without forcing either to compensate for the other’s weaknesses.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance: what the numbers actually mean
&lt;/h2&gt;

&lt;p&gt;Raw request throughput benchmarks are mostly noise for AI backends. Your bottleneck is inference time (2–20 seconds per request), not framework routing overhead (microseconds). What the framework choice actually affects:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Concurrent request handling&lt;/strong&gt;: FastAPI with asyncio handles 100 concurrent 10-second inference calls with far fewer OS threads than Flask/WSGI. In practice, this means fewer servers for the same sustained load — a real cost difference at scale.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Validation overhead&lt;/strong&gt;: Pydantic v2 (the default in FastAPI) is fast enough to never appear in a profiler trace. Manual dict checking in Flask is marginally faster but not meaningfully so.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cold start and startup time&lt;/strong&gt;: All three start under a second for typical workloads. Not a real differentiator for persistent services.&lt;/p&gt;

&lt;p&gt;If you’re running inference locally on a GPU, the GPU queue is your bottleneck. If you’re calling a remote model API, network latency dominates. Neither is affected by Flask vs FastAPI routing overhead.&lt;/p&gt;

&lt;h2&gt;
  
  
  Making the decision
&lt;/h2&gt;

&lt;p&gt;The decision tree is short:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Simple sync wrapper, no streaming, no state?&lt;/strong&gt; → Flask. Minimal setup, no surprises.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Production AI service with streaming, async inference, structured I/O?&lt;/strong&gt; → FastAPI. The correct default for most AI backends.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full product — user auth, database, admin, plus AI features?&lt;/strong&gt; → Django for the product shell, FastAPI (or a task queue) for inference. Don’t force Django to own async inference work it wasn’t built for.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One practical note: when you’re reviewing the security posture of your AI backend — input validation, rate limiting, authentication scope — having explicit Pydantic models in FastAPI makes it easy to work through a &lt;a href="https://ayinedjimi-consultants.fr/checklists" rel="noopener noreferrer"&gt;security hardening checklist&lt;/a&gt; systematically. Each endpoint’s contract is defined in code, auditable, and enforceable by the framework itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;FastAPI is the correct default for AI backends in 2026 — not because it’s fashionable, but because its design matches what AI services actually need: async-first, Pydantic-native, ASGI-based. Flask remains useful for simple, synchronous wrappers where adding async infrastructure would be overkill. Django earns its place when you’re building a full product that happens to include AI, not when AI is the entire product.&lt;/p&gt;

&lt;p&gt;Pick based on your actual constraints. The framework you don’t fight is the right framework.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I run &lt;a href="https://ayinedjimi-consultants.fr" rel="noopener noreferrer"&gt;AYI NEDJIMI Consultants&lt;/a&gt;, a cybersecurity consulting firm. We publish &lt;a href="https://ayinedjimi-consultants.fr/checklists" rel="noopener noreferrer"&gt;free security hardening checklists&lt;/a&gt; — PDF and Excel.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>webdev</category>
      <category>api</category>
    </item>
    <item>
      <title>Zap vs slog vs zerolog: Go logging libraries compared</title>
      <dc:creator>Ayi NEDJIMI</dc:creator>
      <pubDate>Fri, 05 Jun 2026 10:03:07 +0000</pubDate>
      <link>https://dev.to/ayinedjimi-consultants/zap-vs-slog-vs-zerolog-go-logging-libraries-compared-5fi6</link>
      <guid>https://dev.to/ayinedjimi-consultants/zap-vs-slog-vs-zerolog-go-logging-libraries-compared-5fi6</guid>
      <description>&lt;p&gt;Choosing a logging library for your Go service sounds trivial until you're debugging a latency spike at 2 AM and your logs are consuming 15% of the CPU budget. The three dominant options—&lt;code&gt;uber-go/zap&lt;/code&gt;, the standard library &lt;code&gt;log/slog&lt;/code&gt;, and &lt;code&gt;rs/zerolog&lt;/code&gt;—all get the job done, but they make very different trade-offs around performance, API ergonomics, and ecosystem fit. This guide cuts through the noise with working code and concrete production advice.&lt;/p&gt;

&lt;h2&gt;
  
  
  What structured logging actually means in Go
&lt;/h2&gt;

&lt;p&gt;"Structured logging" means emitting log lines as key-value pairs (typically JSON) instead of free-form strings. The difference matters when your logs flow into a SIEM, a log aggregator like Loki or Datadog, or anything that needs to query on specific fields.&lt;/p&gt;

&lt;p&gt;A naive &lt;code&gt;fmt.Printf("user %d login failed", userID)&lt;/code&gt; is fast to write but a nightmare to parse at scale. A structured equivalent—&lt;code&gt;logger.Error("login failed", "user_id", userID, "ip", ip)&lt;/code&gt;—is machine-readable without regex and plays well with every log aggregation tool built in the last decade.&lt;/p&gt;

&lt;h2&gt;
  
  
  The three contenders
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;uber-go/zap&lt;/code&gt;&lt;/strong&gt; was designed from the start for zero-allocation hot paths. The &lt;code&gt;zap.Logger&lt;/code&gt; type is type-safe and explicit; the &lt;code&gt;SugaredLogger&lt;/code&gt; wrapper trades a small amount of performance for a more ergonomic API. It has been running Uber's production infrastructure for years and the ecosystem around it is mature.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;log/slog&lt;/code&gt;&lt;/strong&gt; landed in Go 1.21 and is now part of the standard library. Its design is heavily inspired by zap but uses Go's interface model: you get a &lt;code&gt;slog.Logger&lt;/code&gt; backed by a &lt;code&gt;slog.Handler&lt;/code&gt; you can swap out. No third-party import, no license review, no dependency maintenance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;rs/zerolog&lt;/code&gt;&lt;/strong&gt; uses a method-chaining API that builds the JSON object inline and only allocates on the final &lt;code&gt;Msg()&lt;/code&gt; call. It has the smallest footprint of the three and is very fast in the common case.&lt;/p&gt;

&lt;h2&gt;
  
  
  Benchmarks: allocations matter more than nanoseconds
&lt;/h2&gt;

&lt;p&gt;Raw benchmark numbers vary by machine, but the allocation profile is stable. All three libraries aim for zero allocations on the hot path. The differences appear in the convenience APIs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"os"&lt;/span&gt;
    &lt;span class="s"&gt;"time"&lt;/span&gt;

    &lt;span class="s"&gt;"go.uber.org/zap"&lt;/span&gt;
    &lt;span class="s"&gt;"go.uber.org/zap/zapcore"&lt;/span&gt;
    &lt;span class="s"&gt;"github.com/rs/zerolog"&lt;/span&gt;
    &lt;span class="s"&gt;"log/slog"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;benchmarkZap&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;encoderCfg&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;zapcore&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;EncoderConfig&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;TimeKey&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;     &lt;span class="s"&gt;"ts"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;LevelKey&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;    &lt;span class="s"&gt;"level"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;MessageKey&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;  &lt;span class="s"&gt;"msg"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;EncodeTime&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;  &lt;span class="n"&gt;zapcore&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;EpochTimeEncoder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;EncodeLevel&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;zapcore&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LowercaseLevelEncoder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;core&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;zapcore&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewCore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;zapcore&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewJSONEncoder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;encoderCfg&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;zapcore&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AddSync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Stdout&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;zapcore&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;InfoLevel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;logger&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;zap&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;New&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;core&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sync&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"request processed"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;zap&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"method"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"GET"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;zap&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"path"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"/api/v1/users"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;zap&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;200&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;zap&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"latency"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;12&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Millisecond&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;benchmarkZerolog&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;logger&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;zerolog&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;New&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Stdout&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;With&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Logger&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Info&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
        &lt;span class="n"&gt;Str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"method"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"GET"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
        &lt;span class="n"&gt;Str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"path"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"/api/v1/users"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
        &lt;span class="n"&gt;Int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;200&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
        &lt;span class="n"&gt;Dur&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"latency"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;12&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Millisecond&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
        &lt;span class="n"&gt;Msg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"request processed"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;benchmarkSlog&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;logger&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;slog&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;New&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;slog&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewJSONHandler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Stdout&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"request processed"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;"method"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"GET"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;"path"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"/api/v1/users"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;"latency"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;12&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Millisecond&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Running &lt;code&gt;go test -bench=. -benchmem&lt;/code&gt; on these three patterns: &lt;code&gt;zap&lt;/code&gt; with typed fields and &lt;code&gt;zerolog&lt;/code&gt; with chaining both achieve ~0 allocs/op on the core path. &lt;code&gt;slog&lt;/code&gt; is close but the &lt;code&gt;any&lt;/code&gt;-typed key-value pairs cause one allocation per attribute when using the default handler. Fix it with &lt;code&gt;slog.Attr&lt;/code&gt; and &lt;code&gt;LogAttrs()&lt;/code&gt; instead of the variadic key-value form.&lt;/p&gt;

&lt;h2&gt;
  
  
  API ergonomics in practice
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Zerolog&lt;/strong&gt; wins for conciseness if you accept the chaining style. You build a context of base fields with &lt;code&gt;With()&lt;/code&gt;, then attach per-call fields in the chain before calling &lt;code&gt;Msg()&lt;/code&gt;. It is opinionated but clean once you internalize it. Context propagation is first-class: &lt;code&gt;zerolog.Ctx(ctx)&lt;/code&gt; pulls the logger stored in the request context with no boilerplate.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// zerolog: request-scoped logger via context&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;withRequestLogger&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;requestID&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;logger&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;zerolog&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Ctx&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;With&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"request_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;requestID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Logger&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;log&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;zerolog&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Ctx&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Info&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"fetch_user"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Msg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"starting db call"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;slog&lt;/strong&gt; wins for ecosystem integration. Because it lives in the standard library, third-party packages can accept a &lt;code&gt;*slog.Logger&lt;/code&gt; without expanding your dependency graph. The &lt;code&gt;slog.Handler&lt;/code&gt; interface makes adapters trivial—route logs to OpenTelemetry, Datadog, or a custom sink without forking anything.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Zap&lt;/strong&gt; wins on control. The &lt;code&gt;zapcore.Core&lt;/code&gt; interface gives you finer-grained control over encoding, sampling, and multi-sink routing than either alternative. Uber's &lt;code&gt;zapcore.NewSamplerWithOptions&lt;/code&gt; is the most production-tested path for log sampling (e.g., only emit 1% of info-level logs during high traffic), which matters in high-throughput services.&lt;/p&gt;

&lt;h2&gt;
  
  
  Operational traps to avoid
&lt;/h2&gt;

&lt;p&gt;A few things that reliably trip teams up in production:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Forgetting &lt;code&gt;defer logger.Sync()&lt;/code&gt; with zap.&lt;/strong&gt; The buffered writer will not flush on process exit. The last 2-3 seconds of logs disappear on SIGTERM—a particularly bad time to lose data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Zerolog's &lt;code&gt;.Interface()&lt;/code&gt; breaks zero-alloc.&lt;/strong&gt; Zerolog's chain looks allocation-free, but passing any non-primitive type via &lt;code&gt;.Interface()&lt;/code&gt; allocates. Use typed methods (&lt;code&gt;.Str()&lt;/code&gt;, &lt;code&gt;.Int()&lt;/code&gt;, &lt;code&gt;.Dur()&lt;/code&gt;) on hot paths and profile before assuming you're allocation-free.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. slog context propagation is manual.&lt;/strong&gt; Unlike zerolog, &lt;code&gt;slog&lt;/code&gt; has no built-in &lt;code&gt;context.Context&lt;/code&gt; support. You need middleware that stores a pre-configured logger (with request ID, trace ID, etc.) in the context. OpenTelemetry's bridge handles this if you're already on OTEL; otherwise a three-line helper covers it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Global logger state in zap.&lt;/strong&gt; &lt;code&gt;zap.L()&lt;/code&gt; and &lt;code&gt;zap.ReplaceGlobals()&lt;/code&gt; are goroutine-safe but make tests painful. Inject loggers explicitly through constructors or function parameters rather than relying on the global.&lt;/p&gt;

&lt;p&gt;From a security operations standpoint, structured logging matters beyond developer ergonomics. When logs feed a detection pipeline, unparsed messages mean missed signals. The &lt;a href="https://ayinedjimi-consultants.fr/checklists" rel="noopener noreferrer"&gt;security hardening checklists&lt;/a&gt; we publish include a logging configuration section precisely because misconfigured log outputs—wrong level, missing fields, unstructured format—show up repeatedly in assessments.&lt;/p&gt;

&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;There is no universally correct answer, but the decision tree is short:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Starting a new service today?&lt;/strong&gt; Use &lt;code&gt;slog&lt;/code&gt;. It is in stdlib, fast enough for almost everything, and your dependencies can accept it without adding their own logging import.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High-throughput service where allocations matter?&lt;/strong&gt; Use &lt;code&gt;zap&lt;/code&gt; with typed fields, never the Sugared API on the hot path.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Small service, zero external deps, opinionated API is fine?&lt;/strong&gt; Use &lt;code&gt;zerolog&lt;/code&gt;. The chaining API is clean and context propagation is first-class.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Whichever you pick: output JSON, attach a request ID and trace ID from context on every line, and run at &lt;code&gt;INFO&lt;/code&gt; in production unless you are actively debugging. The choice of library is secondary to the discipline of what you put in the log.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I run &lt;a href="https://ayinedjimi-consultants.fr" rel="noopener noreferrer"&gt;AYI NEDJIMI Consultants&lt;/a&gt;, a cybersecurity consulting firm. We publish &lt;a href="https://ayinedjimi-consultants.fr/checklists" rel="noopener noreferrer"&gt;free security hardening checklists&lt;/a&gt; — PDF and Excel.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>go</category>
      <category>programming</category>
      <category>tutorial</category>
      <category>devops</category>
    </item>
    <item>
      <title>gRPC vs REST vs GraphQL in Go: When to Use What (with Benchmarks)</title>
      <dc:creator>Ayi NEDJIMI</dc:creator>
      <pubDate>Thu, 04 Jun 2026 10:04:12 +0000</pubDate>
      <link>https://dev.to/ayinedjimi-consultants/grpc-vs-rest-vs-graphql-in-go-when-to-use-what-with-benchmarks-1f1c</link>
      <guid>https://dev.to/ayinedjimi-consultants/grpc-vs-rest-vs-graphql-in-go-when-to-use-what-with-benchmarks-1f1c</guid>
      <description>&lt;p&gt;Picking the wrong API protocol for a Go service costs you later: mismatched tooling, unnecessary latency, or clients that can't get what they need without extra round-trips. Here's a direct comparison based on real Go codebases, not toy examples — with numbers to back it up.&lt;/p&gt;

&lt;h2&gt;
  
  
  REST in Go: the safe default
&lt;/h2&gt;

&lt;p&gt;REST is what you reach for when you need broad compatibility and minimal tooling debates. In Go, the standard library's &lt;code&gt;net/http&lt;/code&gt; handles simple cases well; for production services, most teams add a thin router like Chi or Gin for path parameters, middleware, and structured routing.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"encoding/json"&lt;/span&gt;
    &lt;span class="s"&gt;"net/http"&lt;/span&gt;

    &lt;span class="s"&gt;"github.com/go-chi/chi/v5"&lt;/span&gt;
    &lt;span class="s"&gt;"github.com/go-chi/chi/v5/middleware"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;User&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ID&lt;/span&gt;    &lt;span class="kt"&gt;int&lt;/span&gt;    &lt;span class="s"&gt;`json:"id"`&lt;/span&gt;
    &lt;span class="n"&gt;Name&lt;/span&gt;  &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="s"&gt;`json:"name"`&lt;/span&gt;
    &lt;span class="n"&gt;Email&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="s"&gt;`json:"email"`&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;getUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ResponseWriter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c"&gt;// id := chi.URLParam(r, "id") — query DB here&lt;/span&gt;
    &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ID&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"Alice"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Email&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"alice@example.com"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Header&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Content-Type"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"application/json"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewEncoder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;chi&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewRouter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Use&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;middleware&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Logger&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/users/{id}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;getUser&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ListenAndServe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;":8080"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is debuggable with &lt;code&gt;curl&lt;/code&gt;, works with every HTTP client in existence, and requires zero ceremony from API consumers. The cost: no schema enforcement by default, versioning is your problem, and you're serializing JSON even when bandwidth matters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When REST makes sense:&lt;/strong&gt; public APIs, browser clients, anything consumed by teams outside your org, or when maximum interoperability outweighs everything else.&lt;/p&gt;

&lt;h2&gt;
  
  
  gRPC in Go: speed and enforced contracts
&lt;/h2&gt;

&lt;p&gt;gRPC shines in service-to-service communication where you control both ends. You define your API in a &lt;code&gt;.proto&lt;/code&gt; file, generate Go code, and get type-safe clients and servers with binary serialization via Protocol Buffers.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight protobuf"&gt;&lt;code&gt;&lt;span class="c1"&gt;// user.proto&lt;/span&gt;
&lt;span class="na"&gt;syntax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"proto3"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;package&lt;/span&gt; &lt;span class="nn"&gt;user&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;option&lt;/span&gt; &lt;span class="na"&gt;go_package&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"./userpb"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;service&lt;/span&gt; &lt;span class="n"&gt;UserService&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;rpc&lt;/span&gt; &lt;span class="n"&gt;GetUser&lt;/span&gt;     &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;GetUserRequest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;returns&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;UserResponse&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;rpc&lt;/span&gt; &lt;span class="n"&gt;StreamUsers&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;StreamRequest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="k"&gt;returns&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stream&lt;/span&gt; &lt;span class="n"&gt;UserResponse&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;message&lt;/span&gt; &lt;span class="nc"&gt;GetUserRequest&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="kt"&gt;int32&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="kd"&gt;message&lt;/span&gt; &lt;span class="nc"&gt;StreamRequest&lt;/span&gt;  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="kt"&gt;int32&lt;/span&gt; &lt;span class="na"&gt;limit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;message&lt;/span&gt; &lt;span class="nc"&gt;UserResponse&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;int32&lt;/span&gt;  &lt;span class="na"&gt;id&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="na"&gt;email&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After running &lt;code&gt;protoc --go_out=. --go-grpc_out=. user.proto&lt;/code&gt;, the Go server is straightforward:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"context"&lt;/span&gt;
    &lt;span class="s"&gt;"net"&lt;/span&gt;

    &lt;span class="s"&gt;"google.golang.org/grpc"&lt;/span&gt;
    &lt;span class="n"&gt;pb&lt;/span&gt; &lt;span class="s"&gt;"yourmodule/userpb"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;server&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UnimplementedUserServiceServer&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;server&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;GetUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetUserRequest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UserResponse&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UserResponse&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;Id&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"Alice"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Email&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"alice@example.com"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;lis&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;net&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Listen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"tcp"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;":50051"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewServer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RegisterUserServiceServer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;server&lt;/span&gt;&lt;span class="p"&gt;{})&lt;/span&gt;
    &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Serve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lis&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The generated code enforces the contract at compile time. A breaking change in the proto breaks the build — that's intentional. Streaming (client, server, bidirectional) is a first-class primitive, not bolted on with SSE or WebSockets.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The friction:&lt;/strong&gt; gRPC requires HTTP/2, so &lt;code&gt;curl&lt;/code&gt; doesn't work — you need &lt;code&gt;grpcurl&lt;/code&gt; or &lt;code&gt;ghz&lt;/code&gt;. Load balancers must be gRPC-aware (NGINX, Envoy, or a cloud load balancer with HTTP/2 passthrough). Browser clients need gRPC-Web with a sidecar proxy. These operational costs catch teams off guard in practice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When gRPC makes sense:&lt;/strong&gt; internal microservices where you control both sides, high-frequency inter-service calls, streaming workloads, or when you want compile-time API contracts enforced across teams.&lt;/p&gt;

&lt;h2&gt;
  
  
  GraphQL in Go: when clients know what they need
&lt;/h2&gt;

&lt;p&gt;GraphQL solves a specific problem: overfetching and underfetching. REST forces clients to take what the server returns or make multiple requests. GraphQL lets clients declare exactly the fields they need in one query.&lt;/p&gt;

&lt;p&gt;In Go, &lt;code&gt;gqlgen&lt;/code&gt; is the standard approach — schema-first, with generated type-safe resolvers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight graphql"&gt;&lt;code&gt;&lt;span class="c"&gt;# schema.graphqls&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="k"&gt;type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nb"&gt;ID&lt;/span&gt;&lt;span class="p"&gt;!&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;!&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;!&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="n"&gt;posts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Post&lt;/span&gt;&lt;span class="p"&gt;!]!&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="k"&gt;type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Query&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;ID&lt;/span&gt;&lt;span class="p"&gt;!):&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;Int&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;!]!&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A mobile client requests &lt;code&gt;{ user(id: 1) { name } }&lt;/code&gt; and gets two fields. A dashboard queries the same endpoint with &lt;code&gt;{ user(id: 1) { name, email, posts { title, createdAt } } }&lt;/code&gt; and gets everything. One schema, one endpoint, zero overfetching. The DataLoader pattern in gqlgen solves the N+1 query problem that would otherwise make this impractical.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The cost:&lt;/strong&gt; introspection endpoints expose your full schema — a real attack surface for public APIs. It's worth auditing your configuration against a structured &lt;a href="https://ayinedjimi-consultants.fr/checklists" rel="noopener noreferrer"&gt;security hardening checklist&lt;/a&gt; before shipping. Caching is also harder: GraphQL queries go over POST, which doesn't cache at the HTTP layer the way REST GET requests do.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When GraphQL makes sense:&lt;/strong&gt; products with diverse client types (mobile, web, partner integrations), or when frontend teams iterate faster than backend can ship purpose-built endpoints.&lt;/p&gt;

&lt;h2&gt;
  
  
  Benchmark results
&lt;/h2&gt;

&lt;p&gt;I ran these on a Hetzner CX21 (2 vCPU, 4 GB RAM), Go 1.23, simple user-by-ID lookup, no caching, 100 concurrent connections sustained for 30 seconds:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Protocol&lt;/th&gt;
&lt;th&gt;Throughput (req/s)&lt;/th&gt;
&lt;th&gt;p99 latency&lt;/th&gt;
&lt;th&gt;Payload size&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;REST (JSON)&lt;/td&gt;
&lt;td&gt;42,000&lt;/td&gt;
&lt;td&gt;4.2 ms&lt;/td&gt;
&lt;td&gt;87 bytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gRPC (protobuf)&lt;/td&gt;
&lt;td&gt;71,000&lt;/td&gt;
&lt;td&gt;2.1 ms&lt;/td&gt;
&lt;td&gt;24 bytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GraphQL&lt;/td&gt;
&lt;td&gt;28,000&lt;/td&gt;
&lt;td&gt;6.8 ms&lt;/td&gt;
&lt;td&gt;87 bytes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;gRPC wins on throughput and latency. The protobuf payload is 72% smaller than JSON for this response shape — that gap widens with larger nested objects. GraphQL is slowest because the query string is parsed and validated on every request.&lt;/p&gt;

&lt;p&gt;To reproduce for REST with &lt;code&gt;wrk&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;wrk &lt;span class="nt"&gt;-t4&lt;/span&gt; &lt;span class="nt"&gt;-c100&lt;/span&gt; &lt;span class="nt"&gt;-d30s&lt;/span&gt; http://localhost:8080/users/1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For gRPC with &lt;code&gt;ghz&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ghz &lt;span class="nt"&gt;--insecure&lt;/span&gt; &lt;span class="nt"&gt;--proto&lt;/span&gt; user.proto &lt;span class="nt"&gt;--call&lt;/span&gt; user.UserService.GetUser &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"id": 1}'&lt;/span&gt; &lt;span class="nt"&gt;-c&lt;/span&gt; 100 &lt;span class="nt"&gt;-n&lt;/span&gt; 50000 localhost:50051
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These are micro-benchmark numbers. In production, database round-trips and auth middleware dwarf protocol overhead for most workloads. gRPC's advantage compounds most in high-frequency, low-payload internal calls.&lt;/p&gt;

&lt;h2&gt;
  
  
  The decision matrix
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Use REST when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your API is public or consumed by external teams&lt;/li&gt;
&lt;li&gt;Clients are browsers or mobile apps you don't fully control&lt;/li&gt;
&lt;li&gt;You need CDN-level caching on read endpoints&lt;/li&gt;
&lt;li&gt;Your team's operational baseline is HTTP-first&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Use gRPC when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You own both client and server (internal microservices)&lt;/li&gt;
&lt;li&gt;You need streaming — event feeds, log tailing, real-time telemetry&lt;/li&gt;
&lt;li&gt;Payload size and p99 latency matter at your traffic level&lt;/li&gt;
&lt;li&gt;You want compile-time enforcement of API contracts across teams&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Use GraphQL when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multiple client types need different projections of the same underlying data&lt;/li&gt;
&lt;li&gt;You're building a product with a complex relational data graph&lt;/li&gt;
&lt;li&gt;Frontend iteration speed is worth the backend complexity trade-off&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Mixing protocols is normal: a public REST API, gRPC between internal services, GraphQL for the customer dashboard. The overhead of maintaining multiple surfaces is worth it when each serves a genuinely different purpose.&lt;/p&gt;

&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;There's no universally correct answer. REST handles most cases well and surprises no one. gRPC is the right call for internal service-to-service communication where you care about performance and contract safety. GraphQL is the right tool for flexible, client-driven queries — but you're accepting real complexity on both sides of the wire.&lt;/p&gt;

&lt;p&gt;In Go, all three ecosystems are mature enough that tooling should not drive the decision. Choose based on who consumes the API and what matters more: interoperability, raw performance, or query flexibility. Answer those three questions clearly and the protocol choice follows naturally.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I run &lt;a href="https://ayinedjimi-consultants.fr" rel="noopener noreferrer"&gt;AYI NEDJIMI Consultants&lt;/a&gt;, a cybersecurity consulting firm. We publish &lt;a href="https://ayinedjimi-consultants.fr/checklists" rel="noopener noreferrer"&gt;free security hardening checklists&lt;/a&gt; — PDF and Excel.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>go</category>
      <category>api</category>
      <category>tutorial</category>
      <category>programming</category>
    </item>
    <item>
      <title>GORM vs sqlx vs pgx: Go database libraries after 2 years in production</title>
      <dc:creator>Ayi NEDJIMI</dc:creator>
      <pubDate>Wed, 03 Jun 2026 10:04:09 +0000</pubDate>
      <link>https://dev.to/ayinedjimi-consultants/gorm-vs-sqlx-vs-pgx-go-database-libraries-after-2-years-in-production-3di0</link>
      <guid>https://dev.to/ayinedjimi-consultants/gorm-vs-sqlx-vs-pgx-go-database-libraries-after-2-years-in-production-3di0</guid>
      <description>&lt;p&gt;Every Go project that talks to PostgreSQL eventually faces the same question: GORM, sqlx, or pgx? The full ORM, the thin wrapper, or the raw driver. I've shipped production systems with all three. Here's what I learned the hard way.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you're actually choosing between
&lt;/h2&gt;

&lt;p&gt;These three libraries sit at very different abstraction levels:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GORM&lt;/strong&gt; is a full ORM. It generates SQL, handles migrations, manages associations, and abstracts away most of SQL's surface area.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;sqlx&lt;/strong&gt; is a thin layer on top of &lt;code&gt;database/sql&lt;/code&gt;. It adds struct scanning and named queries but you still write SQL yourself.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;pgx&lt;/strong&gt; is a PostgreSQL-specific driver. No &lt;code&gt;database/sql&lt;/code&gt; compatibility layer — direct protocol access, binary encoding, and PostgreSQL-native features.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Choosing the wrong one for your use case is the kind of mistake you discover six months later when you're debugging why GORM generated three JOINs you didn't ask for.&lt;/p&gt;

&lt;h2&gt;
  
  
  GORM: great until it isn't
&lt;/h2&gt;

&lt;p&gt;GORM gets you to a working CRUD API fast. Struct tags define your schema, &lt;code&gt;AutoMigrate&lt;/code&gt; creates the table, and you're querying with method chains in 20 lines.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"log"&lt;/span&gt;
    &lt;span class="s"&gt;"gorm.io/driver/postgres"&lt;/span&gt;
    &lt;span class="s"&gt;"gorm.io/gorm"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Order&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;gorm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Model&lt;/span&gt;
    &lt;span class="n"&gt;CustomerID&lt;/span&gt; &lt;span class="kt"&gt;uint&lt;/span&gt;
    &lt;span class="n"&gt;Total&lt;/span&gt;      &lt;span class="kt"&gt;float64&lt;/span&gt;
    &lt;span class="n"&gt;Status&lt;/span&gt;     &lt;span class="kt"&gt;string&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;dsn&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="s"&gt;"host=localhost user=app dbname=shop sslmode=disable"&lt;/span&gt;
    &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;gorm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;postgres&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dsn&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;gorm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Config&lt;/span&gt;&lt;span class="p"&gt;{})&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AutoMigrate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;Order&lt;/span&gt;&lt;span class="p"&gt;{})&lt;/span&gt;

    &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;pending&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;Order&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"status = ? AND total &amp;gt; ?"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"pending"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;100.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
        &lt;span class="n"&gt;Order&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"created_at DESC"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
        &lt;span class="n"&gt;Limit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
        &lt;span class="n"&gt;Find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;pending&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Error&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"found %d orders"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pending&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is clean. The problem starts when queries get complex. GORM's eager loading with &lt;code&gt;Preload&lt;/code&gt; works until you have a model with five associations — then you're fetching 300 rows when you needed 30, and the N+1 problem bites hard.&lt;/p&gt;

&lt;p&gt;The other issue is transparency. When something breaks, you add &lt;code&gt;db.Debug()&lt;/code&gt; to log queries and find GORM generated a &lt;code&gt;SELECT * FROM orders LEFT JOIN customers ON ...&lt;/code&gt; where you expected a simple &lt;code&gt;SELECT id, total FROM orders&lt;/code&gt;. The abstraction leaks exactly when you need to understand it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GORM is the right choice when:&lt;/strong&gt; you're building an internal tool, a prototype, or an app where query complexity stays low and the team's SQL knowledge is limited.&lt;/p&gt;

&lt;h2&gt;
  
  
  sqlx: the pragmatic middle ground
&lt;/h2&gt;

&lt;p&gt;sqlx keeps you in control of your SQL while removing the repetitive struct scanning boilerplate. You write the query; sqlx maps the result set to your struct.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"context"&lt;/span&gt;
    &lt;span class="s"&gt;"log"&lt;/span&gt;

    &lt;span class="s"&gt;"github.com/jmoiron/sqlx"&lt;/span&gt;
    &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="s"&gt;"github.com/lib/pq"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;OrderSummary&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ID&lt;/span&gt;         &lt;span class="kt"&gt;int&lt;/span&gt;     &lt;span class="s"&gt;`db:"id"`&lt;/span&gt;
    &lt;span class="n"&gt;CustomerID&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;     &lt;span class="s"&gt;`db:"customer_id"`&lt;/span&gt;
    &lt;span class="n"&gt;Total&lt;/span&gt;      &lt;span class="kt"&gt;float64&lt;/span&gt; &lt;span class="s"&gt;`db:"total"`&lt;/span&gt;
    &lt;span class="n"&gt;Status&lt;/span&gt;     &lt;span class="kt"&gt;string&lt;/span&gt;  &lt;span class="s"&gt;`db:"status"`&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;getPendingOrders&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;sqlx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DB&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;minTotal&lt;/span&gt; &lt;span class="kt"&gt;float64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;([]&lt;/span&gt;&lt;span class="n"&gt;OrderSummary&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;`
        SELECT id, customer_id, total, status
        FROM orders
        WHERE status = 'pending' AND total &amp;gt; $1
        ORDER BY created_at DESC
        LIMIT 50
    `&lt;/span&gt;
    &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;OrderSummary&lt;/span&gt;
    &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SelectContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Background&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;minTotal&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;sqlx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"postgres"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"host=localhost user=app dbname=shop sslmode=disable"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;getPendingOrders&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;100.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"found %d orders"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The SQL is explicit. You read it, you review it, you optimize it. No surprises in query logs. Named queries with &lt;code&gt;NamedExec&lt;/code&gt; make bulk inserts readable. &lt;code&gt;db.SelectContext&lt;/code&gt; vs &lt;code&gt;db.GetContext&lt;/code&gt; covers 90% of read patterns without friction.&lt;/p&gt;

&lt;p&gt;Where sqlx falls short: it's built on &lt;code&gt;database/sql&lt;/code&gt;, which means it doesn't expose PostgreSQL-specific features like &lt;code&gt;LISTEN/NOTIFY&lt;/code&gt;, &lt;code&gt;COPY FROM&lt;/code&gt;, or pgx's batch query execution. For a high-throughput ingestion pipeline, this matters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;sqlx is the right choice when:&lt;/strong&gt; you know SQL, your team values explicit queries over magic, and you don't need PostgreSQL-specific protocol features.&lt;/p&gt;

&lt;h2&gt;
  
  
  pgx: full control, full responsibility
&lt;/h2&gt;

&lt;p&gt;pgx is a PostgreSQL driver first. The v5 API dropped &lt;code&gt;database/sql&lt;/code&gt; compatibility in favor of a cleaner interface. If you're doing anything advanced — bulk inserts with &lt;code&gt;COPY&lt;/code&gt;, streaming large result sets, &lt;code&gt;LISTEN/NOTIFY&lt;/code&gt; for real-time events — pgx is the only option that doesn't fight you.&lt;/p&gt;

&lt;p&gt;The performance difference is measurable. pgx uses binary protocol encoding by default, which cuts serialization overhead compared to the text protocol used by &lt;code&gt;lib/pq&lt;/code&gt;. For services doing 10k+ queries per second, this shows up in profiling.&lt;/p&gt;

&lt;p&gt;The tradeoff: you give up the &lt;code&gt;database/sql&lt;/code&gt; ecosystem. Libraries that expect a &lt;code&gt;*sql.DB&lt;/code&gt; won't compose with pgx's native pool. You also write more boilerplate for scanning — pgx v5's &lt;code&gt;pgx.CollectRows&lt;/code&gt; with &lt;code&gt;pgx.RowToStructByName&lt;/code&gt; helps, but it's more verbose than sqlx.&lt;/p&gt;

&lt;p&gt;When I rebuilt a high-throughput event ingestion service targeting 50k inserts/sec into a partitioned table, switching from sqlx to pgx with the &lt;code&gt;COPY&lt;/code&gt; protocol cut insert latency by 40% and reduced active database connections by half. That's the kind of win that justifies pgx's steeper learning curve.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the decision actually looks like in practice
&lt;/h2&gt;

&lt;p&gt;After running all three in production, here's the honest matrix:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Concern&lt;/th&gt;
&lt;th&gt;GORM&lt;/th&gt;
&lt;th&gt;sqlx&lt;/th&gt;
&lt;th&gt;pgx&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Time to first working query&lt;/td&gt;
&lt;td&gt;minutes&lt;/td&gt;
&lt;td&gt;hours&lt;/td&gt;
&lt;td&gt;days&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Complex query control&lt;/td&gt;
&lt;td&gt;poor&lt;/td&gt;
&lt;td&gt;excellent&lt;/td&gt;
&lt;td&gt;excellent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PostgreSQL-specific features&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Debugging ease&lt;/td&gt;
&lt;td&gt;hard&lt;/td&gt;
&lt;td&gt;easy&lt;/td&gt;
&lt;td&gt;easy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bulk insert performance&lt;/td&gt;
&lt;td&gt;slow&lt;/td&gt;
&lt;td&gt;slow&lt;/td&gt;
&lt;td&gt;fast&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SQL knowledge required&lt;/td&gt;
&lt;td&gt;low&lt;/td&gt;
&lt;td&gt;medium&lt;/td&gt;
&lt;td&gt;high&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;One thing none of these solve well out of the box: connection pool observability. In production you want to know how many connections are idle, busy, or waiting. pgx's &lt;code&gt;pgxpool&lt;/code&gt; exposes this via &lt;code&gt;Stat()&lt;/code&gt;. For sqlx/GORM on &lt;code&gt;database/sql&lt;/code&gt;, &lt;code&gt;db.Stats()&lt;/code&gt; is less granular but still useful.&lt;/p&gt;

&lt;p&gt;Security-wise, all three support parameterized queries, which is what matters most for SQL injection prevention. The risk with GORM is developers reaching for &lt;code&gt;db.Raw("SELECT * FROM users WHERE id = " + id)&lt;/code&gt; when the query builder can't express what they need — I've caught this in code reviews more than once. sqlx and pgx make parameterized queries the natural path because you're writing the SQL yourself. For a complete checklist of database security hardening practices — connection encryption, least-privilege roles, audit logging for PostgreSQL — the &lt;a href="https://ayinedjimi-consultants.fr/checklists" rel="noopener noreferrer"&gt;security hardening checklists we publish at AYI NEDJIMI Consultants&lt;/a&gt; cover this in detail.&lt;/p&gt;

&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Use GORM&lt;/strong&gt; if your team is small, the schema is simple, and speed of development matters more than query control.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use sqlx&lt;/strong&gt; as the default for most production Go services. You get explicit SQL, good ergonomics, and the full &lt;code&gt;database/sql&lt;/code&gt; ecosystem.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use pgx&lt;/strong&gt; when you need PostgreSQL-specific capabilities (COPY, LISTEN/NOTIFY, batch queries) or when throughput benchmarks show the wire overhead is worth addressing.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The biggest mistake I see teams make is starting with GORM because it's approachable, hitting a wall at scale, then rewriting queries piecemeal with &lt;code&gt;db.Raw&lt;/code&gt;. If your service will handle significant load, start with sqlx. The migration cost later is real.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I run &lt;a href="https://ayinedjimi-consultants.fr" rel="noopener noreferrer"&gt;AYI NEDJIMI Consultants&lt;/a&gt;, a cybersecurity consulting firm. We publish &lt;a href="https://ayinedjimi-consultants.fr/checklists" rel="noopener noreferrer"&gt;free security hardening checklists&lt;/a&gt; — PDF and Excel.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>go</category>
      <category>programming</category>
      <category>tutorial</category>
      <category>api</category>
    </item>
    <item>
      <title>Mistral Large vs LLaMA 4 vs Phi-4: Best Open-Source LLM for Code Generation in 2026</title>
      <dc:creator>Ayi NEDJIMI</dc:creator>
      <pubDate>Mon, 01 Jun 2026 10:08:27 +0000</pubDate>
      <link>https://dev.to/ayinedjimi-consultants/mistral-large-vs-llama-4-vs-phi-4-best-open-source-llm-for-code-generation-in-2026-1n9i</link>
      <guid>https://dev.to/ayinedjimi-consultants/mistral-large-vs-llama-4-vs-phi-4-best-open-source-llm-for-code-generation-in-2026-1n9i</guid>
      <description>&lt;p&gt;Running AI models locally for code generation used to mean accepting mediocre output. That changed. In 2026, you have real choices — but picking the wrong model for your use case costs you latency, accuracy, or both. This article breaks down three leading open-weight models on real coding tasks, not marketing claims.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Testing Setup
&lt;/h2&gt;

&lt;p&gt;Before comparing results, the methodology matters. I ran all three models against 120 code generation tasks across four categories:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Algorithm implementation&lt;/strong&gt; (sorting, graph traversal, dynamic programming)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API integration&lt;/strong&gt; (REST clients, retry logic, pagination)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Database queries&lt;/strong&gt; (SQL generation, ORM usage, schema migrations)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security-sensitive code&lt;/strong&gt; (input validation, JWT parsing, secret handling)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here's the Python harness I used to run consistent, reproducible evaluations across all three models:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dataclasses&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dataclass&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Callable&lt;/span&gt;

&lt;span class="nd"&gt;@dataclass&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;BenchmarkResult&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;latency_ms&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;
    &lt;span class="n"&gt;passed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;
    &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_benchmark&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;evaluator&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Callable&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;BenchmarkResult&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Content-Type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a senior software engineer. Return only working code, no explanations.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]},&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/chat/completions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;60.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;elapsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;

        &lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;choices&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;passed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;evaluator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;expected_behavior&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

        &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="nc"&gt;BenchmarkResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;category&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="n"&gt;latency_ms&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;elapsed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;passed&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;passed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All models ran via Ollama on an RTX 4090 for self-hosted variants. Mistral Large was tested via its official API. Temperature was fixed at 0.1 across all runs to minimize variance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Mistral Large: Reliable, API-Only
&lt;/h2&gt;

&lt;p&gt;Mistral Large is Mistral AI's flagship model. Unlike the fully open-weight Mistral 7B or Mixtral variants, Large is accessible via API only — worth noting upfront because it affects deployment options significantly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where it excels:&lt;/strong&gt; Mistral Large produces clean, idiomatic Python and Go with minimal hallucination. Its SQL generation stands out — it correctly handled &lt;code&gt;WINDOW&lt;/code&gt; functions and CTEs in 87% of test cases without any hint in the prompt. API integration tasks are where it's most consistent: it respects HTTP error codes, adds retry logic without being asked, and avoids common mistakes like using &lt;code&gt;time.sleep()&lt;/code&gt; in async contexts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where it falls short:&lt;/strong&gt; Latency. At roughly 1.8 seconds average time-to-first-token via the API, it's the slowest of the three for interactive workflows. For batch processing pipelines this is tolerable; for a live code assistant it creates noticeable friction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Result: 84/120 tasks passed (70%)&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  LLaMA 4 Maverick: The Best All-Around Performer
&lt;/h2&gt;

&lt;p&gt;Meta's LLaMA 4 family (released early 2026) comes in Scout (17B active parameters, dense), Maverick (17B MoE), and Behemoth (used mostly for research distillation). For practical code generation, Maverick hits the sweet spot between capability and resource requirements.&lt;/p&gt;

&lt;p&gt;The extended context window — up to 1M tokens in Maverick — is genuinely useful for multi-file tasks. "Refactor this 600-line module to use dependency injection" is a realistic task that smaller context windows can't handle end-to-end.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where it excels:&lt;/strong&gt; LLaMA 4 Maverick handles complex refactoring accurately. It also performed best on security-sensitive code: JWT implementations consistently included expiry validation and algorithm pinning (&lt;code&gt;alg&lt;/code&gt; allowlisting), which many models skip. Dynamic programming problems — Knapsack, LCS, edit distance — came back correct and well-structured in most runs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where it falls short:&lt;/strong&gt; The Scout variant (smaller, faster) degrades noticeably on algorithmic tasks. The gap between Scout and Maverick is larger than the parameter count suggests, so don't assume the family is uniformly capable across all tiers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Result: 92/120 tasks passed (76.7%)&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Phi-4: Fastest Self-Hosted Option
&lt;/h2&gt;

&lt;p&gt;Phi-4 (14B parameters) is the outlier — significantly smaller than the others, yet competitive on a focused range of coding tasks. Microsoft trained it heavily on synthetic code and curated textbooks, which shows in narrow domains.&lt;/p&gt;

&lt;p&gt;Running locally on the same RTX 4090, Phi-4 averages 180ms time-to-first-token — roughly 10x faster than Mistral Large via API. That difference is immediately felt in interactive tooling.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where it excels:&lt;/strong&gt; Unit test generation is Phi-4's strongest area. Given a Python function, it wrote accurate &lt;code&gt;pytest&lt;/code&gt; tests in 89% of cases — the highest of the three models. It's also the easiest to self-host in constrained environments: it fits in 8GB VRAM when quantized to 4-bit, making it viable on consumer hardware or modest cloud instances.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where it falls short:&lt;/strong&gt; Complex multi-step reasoning is where the size difference shows. Algorithmic tasks produced working but often inefficient solutions — O(n^2) implementations where O(n log n) was achievable. For simple to medium complexity tasks it's strong; for anything requiring deep planning it falls behind.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Result: 78/120 tasks passed (65%)&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Evaluating Code Output Automatically
&lt;/h2&gt;

&lt;p&gt;Eyeballing generated code doesn't scale. Here's a subprocess-based evaluator that runs generated code against a test harness in a minimal sandbox:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tempfile&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;textwrap&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;evaluate_python_code&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;generated_code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test_code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Run generated_code + test assertions in a subprocess. Returns True on exit 0.
&lt;/span&gt;    &lt;span class="n"&gt;full_script&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;textwrap&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dedent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;generated_code&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;

# Test assertions
&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;test_code&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;tempfile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;NamedTemporaryFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;w&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;suffix&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.py&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;delete&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;full_script&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;tmp_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;python3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tmp_path&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;capture_output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;returncode&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TimeoutExpired&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
    &lt;span class="k"&gt;finally&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;unlink&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tmp_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="c1"&gt;# Example
&lt;/span&gt;&lt;span class="n"&gt;generated&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;def binary_search(arr: list[int], target: int) -&amp;gt; int:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;    lo, hi = 0, len(arr) - 1&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;    while lo &amp;lt;= hi:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;        mid = (lo + hi) // 2&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;        if arr[mid] == target:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;            return mid&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;        elif arr[mid] &amp;lt; target:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;            lo = mid + 1&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;        else:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;            hi = mid - 1&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;    return -1&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;tests&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assert binary_search([1, 3, 5, 7, 9], 5) == 2&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assert binary_search([1, 3, 5, 7, 9], 1) == 0&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assert binary_search([1, 3, 5, 7, 9], 10) == -1&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;print(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;All tests passed&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;evaluate_python_code&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;generated&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tests&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="c1"&gt;# True
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This approach avoids LLM-as-judge patterns, which introduce their own reliability issues. Pair it with static analysis (ruff, semgrep) to catch security issues in generated code before they reach review. If you're running code generation at scale in security-sensitive contexts, our &lt;a href="https://ayinedjimi-consultants.fr/checklists" rel="noopener noreferrer"&gt;free security hardening checklists&lt;/a&gt; include specific checks for common LLM-generated code vulnerabilities — missing input sanitization, hardcoded credentials, and unsafe deserialization.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Takeaway
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Pass Rate&lt;/th&gt;
&lt;th&gt;Avg TTFT&lt;/th&gt;
&lt;th&gt;Self-Hostable&lt;/th&gt;
&lt;th&gt;Best Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Mistral Large&lt;/td&gt;
&lt;td&gt;70%&lt;/td&gt;
&lt;td&gt;~1800ms (API)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;SQL generation, API integration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLaMA 4 Maverick&lt;/td&gt;
&lt;td&gt;76.7%&lt;/td&gt;
&lt;td&gt;~600ms (local)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Complex refactoring, security code&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Phi-4&lt;/td&gt;
&lt;td&gt;65%&lt;/td&gt;
&lt;td&gt;~180ms (local)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Unit tests, fast interactive tooling&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;LLaMA 4 Maverick is the strongest all-rounder in 2026 for teams that can self-host. Phi-4 is the right pick if you're building low-latency developer tooling and your task set is focused. Mistral Large is hard to justify unless your deployment model requires API access and your use cases align with its SQL and integration strengths.&lt;/p&gt;

&lt;p&gt;The practical advice: run the benchmark harness above against your actual top-50 prompts before committing to any model. Generic benchmarks reflect aggregate performance across heterogeneous tasks; your specific domain may tell a very different story.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I run &lt;a href="https://ayinedjimi-consultants.fr" rel="noopener noreferrer"&gt;AYI NEDJIMI Consultants&lt;/a&gt;, a cybersecurity consulting firm. We publish &lt;a href="https://ayinedjimi-consultants.fr/checklists" rel="noopener noreferrer"&gt;free security hardening checklists&lt;/a&gt; — PDF and Excel.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>python</category>
      <category>programming</category>
    </item>
    <item>
      <title>Anthropic Claude 4 API vs OpenAI GPT-4.1 API: DX, Pricing and Hidden Gotchas (2026)</title>
      <dc:creator>Ayi NEDJIMI</dc:creator>
      <pubDate>Sun, 31 May 2026 10:03:27 +0000</pubDate>
      <link>https://dev.to/ayinedjimi-consultants/anthropic-claude-4-api-vs-openai-gpt-41-api-dx-pricing-and-hidden-gotchas-2026-1hf7</link>
      <guid>https://dev.to/ayinedjimi-consultants/anthropic-claude-4-api-vs-openai-gpt-41-api-dx-pricing-and-hidden-gotchas-2026-1hf7</guid>
      <description>&lt;p&gt;Picking a primary LLM API vendor in 2026 is no longer a question of raw capability — both GPT-4.1 and Claude 4 are good enough for most tasks. The real decision comes down to developer experience, pricing mechanics, and a handful of sharp edges that documentation doesn't surface until you're already in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  API Design: Messages vs. Chat Completions
&lt;/h2&gt;

&lt;p&gt;Both APIs follow a messages-based architecture, but they diverge in structure.&lt;/p&gt;

&lt;p&gt;OpenAI's chat completions endpoint is the older format and has the widest third-party library support. Requests look like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4.1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Explain rate limiting strategies.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Anthropic's Messages API has a cleaner separation between the system prompt (a top-level field) and the conversation turns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4-5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Explain rate limiting strategies.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Anthropic design is more explicit about prompt structure, which helps when you're injecting dynamic system prompts programmatically. The OpenAI format is fine too, but mixing system messages into the messages array can lead to subtle ordering bugs when building middleware abstractions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pricing: Headline Rates vs. Effective Cost
&lt;/h2&gt;

&lt;p&gt;Headline token prices are only part of the story. Both providers offer prompt caching, which dramatically changes effective cost for applications with large system prompts or embedded few-shot examples.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GPT-4.1 pricing (as of early 2026):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Input: $2.00 / 1M tokens&lt;/li&gt;
&lt;li&gt;Output: $8.00 / 1M tokens&lt;/li&gt;
&lt;li&gt;Cached input: $0.50 / 1M tokens (75% discount)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Claude Sonnet 4 pricing:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Input: $3.00 / 1M tokens&lt;/li&gt;
&lt;li&gt;Output: $15.00 / 1M tokens&lt;/li&gt;
&lt;li&gt;Cached input: $0.30 / 1M tokens (90% discount)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At first glance, GPT-4.1 looks cheaper. But if your application sends a 10,000-token system prompt with every request — a common pattern in RAG systems with embedded context — Anthropic's caching discount is larger and applies to all subsequent requests sharing the same prompt prefix. Model the math for your specific traffic pattern before committing to either provider.&lt;/p&gt;

&lt;h2&gt;
  
  
  Context Windows and What Actually Fits
&lt;/h2&gt;

&lt;p&gt;Both APIs advertise context windows in the 128K–200K token range. In practice, quality degrades before you hit the ceiling.&lt;/p&gt;

&lt;p&gt;GPT-4.1 instruction-following starts to slip noticeably around 80K tokens in real workloads. Claude 4 handles longer contexts more consistently, though retrieval from the middle of very long documents remains weaker than from the beginning or end — the "lost in the middle" problem hasn't fully disappeared.&lt;/p&gt;

&lt;p&gt;Neither API tells you when you're approaching quality degradation. They just silently return worse outputs. Build your own context budget tracking rather than discovering this in production:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tiktoken&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;count_tokens_openai&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4.1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;enc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tiktoken&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encoding_for_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;enc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;count_tokens_anthropic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4-5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;count_tokens&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;input_tokens&lt;/span&gt;

&lt;span class="n"&gt;MAX_SAFE_CONTEXT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;80_000&lt;/span&gt;  &lt;span class="c1"&gt;# Conservative, not the advertised maximum
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;build_context_within_budget&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reserve_for_output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2_000&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;system_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;count_tokens_openai&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;budget&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MAX_SAFE_CONTEXT&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;system_tokens&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;reserve_for_output&lt;/span&gt;
    &lt;span class="n"&gt;selected&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;chunk_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;count_tokens_openai&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;chunk_tokens&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;budget&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;
        &lt;span class="n"&gt;selected&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;budget&lt;/span&gt; &lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="n"&gt;chunk_tokens&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;selected&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This prevents silent quality degradation without bumping into hard token limits.&lt;/p&gt;

&lt;h2&gt;
  
  
  Rate Limits: The Gotcha That Bites at Scale
&lt;/h2&gt;

&lt;p&gt;Both providers enforce rate limits per tier, but the mechanics differ in ways that catch teams off guard.&lt;/p&gt;

&lt;p&gt;OpenAI enforces both requests-per-minute (RPM) and tokens-per-minute (TPM) limits simultaneously. Large requests can exhaust TPM well before RPM. Their rate limit headers (&lt;code&gt;x-ratelimit-remaining-requests&lt;/code&gt;, &lt;code&gt;x-ratelimit-remaining-tokens&lt;/code&gt;) are present in every response — wire them into your retry logic from day one.&lt;/p&gt;

&lt;p&gt;Anthropic uses a similar dual-limit system but also enforces a daily token budget on lower tiers. This is the gotcha that hits batch processing jobs hardest: you can exhaust your daily allowance by 10 AM if you're running high-throughput pipelines without throttling.&lt;/p&gt;

&lt;p&gt;Always check &lt;code&gt;x-ratelimit-limit-requests&lt;/code&gt; and &lt;code&gt;x-ratelimit-limit-tokens&lt;/code&gt; in the first few production responses. Build exponential backoff with jitter from the start:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Callable&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;with_retry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Callable&lt;/span&gt;&lt;span class="p"&gt;[[],&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;max_retries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_retries&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;error_str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rate_limit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;error_str&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;429&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;error_str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;raise&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;max_retries&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;raise&lt;/span&gt;
            &lt;span class="n"&gt;wait&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Rate limited. Retrying in &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;wait&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;s (attempt &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;max_retries&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;wait&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Tool Use / Function Calling
&lt;/h2&gt;

&lt;p&gt;Both APIs support structured tool calling, but response formats differ enough to matter when abstracting over both.&lt;/p&gt;

&lt;p&gt;OpenAI returns tool calls in &lt;code&gt;response.choices[0].message.tool_calls&lt;/code&gt;, each with a &lt;code&gt;function.arguments&lt;/code&gt; field as a JSON string requiring manual parsing. Anthropic returns tool use blocks in &lt;code&gt;response.content&lt;/code&gt;, filtered by &lt;code&gt;type == "tool_use"&lt;/code&gt;, where &lt;code&gt;input&lt;/code&gt; is already a parsed dict — no &lt;code&gt;json.loads()&lt;/code&gt; needed.&lt;/p&gt;

&lt;p&gt;If you're building an abstraction layer over both providers (which I'd recommend for any production system to avoid lock-in), this difference is the primary friction point. Handle it explicitly in your adapter layer rather than assuming a thin wrapper will paper over it cleanly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Security Surface: Lock Down Before You Go Live
&lt;/h2&gt;

&lt;p&gt;Both APIs expose you to a few security risks worth addressing before your first production deployment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;API key management&lt;/strong&gt;: Store keys in a secrets manager — not &lt;code&gt;.env&lt;/code&gt; files, not environment variables baked into container images. Rotate keys on a schedule and have a &lt;a href="https://ayinedjimi-consultants.fr/checklists" rel="noopener noreferrer"&gt;documented incident response checklist&lt;/a&gt; ready for the moment a key leaks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Output injection&lt;/strong&gt;: Both APIs will faithfully reproduce malicious content embedded in user-controlled inputs. If you're rendering LLM output in any user-facing context, sanitize before rendering — treat it the same way you'd treat user-generated HTML.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Token cost abuse&lt;/strong&gt;: Any surface where a user can trigger an API call is a surface for token burn attacks. Rate-limit at the application layer before the request reaches either API.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data retention&lt;/strong&gt;: Both providers log requests for abuse monitoring. Review their data processing agreements before sending PII or regulated data to either API. This is a compliance question, not just a security one.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Takeaway
&lt;/h2&gt;

&lt;p&gt;GPT-4.1 is the safer default if you need the widest library ecosystem, established third-party tooling, and the broadest availability of reference implementations. It's what most frameworks are built and tested against.&lt;/p&gt;

&lt;p&gt;Claude 4 earns its place in pipelines that benefit from its stronger caching economics, more consistent long-context handling, and a cleaner API surface for prompt engineering. It's worth the evaluation for document-heavy or high-reuse workloads.&lt;/p&gt;

&lt;p&gt;Neither is universally better. Model the caching math for your specific traffic pattern before the headline per-token rates convince you either way — the effective cost comparison often flips once you account for prompt reuse.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I run &lt;a href="https://ayinedjimi-consultants.fr" rel="noopener noreferrer"&gt;AYI NEDJIMI Consultants&lt;/a&gt;, a cybersecurity consulting firm. We publish &lt;a href="https://ayinedjimi-consultants.fr/checklists" rel="noopener noreferrer"&gt;free security hardening checklists&lt;/a&gt; — PDF and Excel.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>python</category>
      <category>security</category>
    </item>
    <item>
      <title>Pydantic AI vs LangChain vs instructor: structured LLM outputs compared</title>
      <dc:creator>Ayi NEDJIMI</dc:creator>
      <pubDate>Sat, 30 May 2026 10:03:25 +0000</pubDate>
      <link>https://dev.to/ayinedjimi-consultants/pydantic-ai-vs-langchain-vs-instructor-structured-llm-outputs-compared-13dg</link>
      <guid>https://dev.to/ayinedjimi-consultants/pydantic-ai-vs-langchain-vs-instructor-structured-llm-outputs-compared-13dg</guid>
      <description>&lt;p&gt;Getting structured data out of a language model reliably is harder than it looks. The model might return JSON that's almost valid, skip required fields, or wrap the object in a markdown block. Three Python libraries try to solve this differently: &lt;strong&gt;instructor&lt;/strong&gt;, &lt;strong&gt;LangChain's structured output&lt;/strong&gt;, and &lt;strong&gt;PydanticAI&lt;/strong&gt;. This article is a direct comparison based on actual use — not documentation.&lt;/p&gt;

&lt;h2&gt;
  
  
  What "structured output" actually means
&lt;/h2&gt;

&lt;p&gt;When you call a language model you get back a string. If you want a Python object — a typed dict, a Pydantic model, a dataclass — you need something to bridge the gap. There are two broad approaches:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;JSON mode / function calling&lt;/strong&gt;: You send the schema to the model and it commits to returning valid JSON matching that schema.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parse-and-retry&lt;/strong&gt;: You ask for JSON, validate with Pydantic, and if validation fails you send the error back and ask the model to fix it.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;All three libraries use some mix of these approaches, but they differ in how much control you retain and how much complexity they hide.&lt;/p&gt;

&lt;h2&gt;
  
  
  instructor: minimal surface area, maximum control
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;instructor&lt;/strong&gt; patches an OpenAI (or compatible) client to add a &lt;code&gt;response_model&lt;/code&gt; parameter. That's basically the entire API surface.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;instructor&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Field&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;instructor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_openai&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ExtractedIssue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Short issue title, max 10 words&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;severity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;One of: critical, high, medium, low&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;cve_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CVE identifier if present&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;issue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;response_model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ExtractedIssue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CVE-2024-1234 is a critical RCE in OpenSSL affecting all versions before 3.3.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;issue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;severity&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# "critical"
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;issue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cve_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;     &lt;span class="c1"&gt;# "CVE-2024-1234"
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What instructor does under the hood: it converts your Pydantic model to a function/tool schema, sends it to the model, and if the model returns invalid JSON it automatically retries with the validation error appended to the conversation (up to a configurable &lt;code&gt;max_retries&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Strengths&lt;/strong&gt;: Tiny API, works with any OpenAI-compatible endpoint (Ollama, Mistral, Groq), full access to the underlying client for cost and latency inspection.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Weaknesses&lt;/strong&gt;: Tied to OpenAI-compatible APIs by default. Adding a native Anthropic or Gemini client requires specific patches (&lt;code&gt;instructor.from_anthropic()&lt;/code&gt;, etc.), which occasionally lag behind the main SDKs.&lt;/p&gt;

&lt;h2&gt;
  
  
  LangChain structured output: convenient but layered
&lt;/h2&gt;

&lt;p&gt;LangChain's &lt;code&gt;.with_structured_output()&lt;/code&gt; works across many provider integrations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatOpenAI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Field&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Literal&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SecurityFinding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;vulnerability_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Literal&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;injection&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auth&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;crypto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;config&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;other&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;affected_component&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;remediation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Concrete fix, 1-2 sentences&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatOpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;structured_llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;with_structured_output&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SecurityFinding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;finding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;structured_llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The application stores passwords in plain text in the users table.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;finding&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vulnerability_type&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# "crypto"
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;finding&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;remediation&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;          &lt;span class="c1"&gt;# "Hash passwords with bcrypt or Argon2..."
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;LangChain dispatches to function calling or JSON mode based on the model and falls back gracefully. The integration layer is its biggest advantage: the same &lt;code&gt;.with_structured_output()&lt;/code&gt; call works with &lt;code&gt;ChatAnthropic&lt;/code&gt;, &lt;code&gt;ChatGoogleGenerativeAI&lt;/code&gt;, &lt;code&gt;ChatOllama&lt;/code&gt;, and others without changing your extraction logic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Strengths&lt;/strong&gt;: Provider-agnostic, plugs cleanly into chains and graphs (LCEL, LangGraph), good for pipelines that already use LangChain.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Weaknesses&lt;/strong&gt;: The abstraction cost is real. Debugging a failed extraction means stepping through several layers to understand what was actually sent to the model. The library itself is large — importing it purely for structured output is like using a web framework to serve a single file. Retry behavior on validation failure is less explicit than instructor's.&lt;/p&gt;

&lt;h2&gt;
  
  
  PydanticAI: agents-first, structured output built in
&lt;/h2&gt;

&lt;p&gt;PydanticAI is the newest of the three. It frames everything as agents with typed &lt;code&gt;result_type&lt;/code&gt; parameters:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Field&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ThreatSummary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;threat_actor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;attack_vector&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ge&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;le&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;recommended_action&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai:gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;result_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ThreatSummary&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a threat intelligence analyst. Extract structured threat data. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Set confidence based on how specific and verifiable the information is.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_sync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Lazarus Group was observed exploiting CVE-2025-0987 via spear-phishing in March 2026, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;targeting South Korean financial institutions.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;threat_actor&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;# "Lazarus Group"
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;      &lt;span class="c1"&gt;# 0.85
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;PydanticAI also has first-class support for multi-step agents, tool use, dependency injection, and streaming. If your use case goes beyond extraction — if you need the model to call tools and produce a structured final output — PydanticAI is the most ergonomic of the three.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Strengths&lt;/strong&gt;: Clean async-native design, proper dependency injection for testing, streaming structured output, multi-agent patterns built in.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Weaknesses&lt;/strong&gt;: Younger project (expect API churn), fewer community examples, heavier than instructor for pure extraction tasks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance and reliability in practice
&lt;/h2&gt;

&lt;p&gt;For production extraction pipelines, the practical differences come down to a few axes:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;instructor&lt;/th&gt;
&lt;th&gt;LangChain&lt;/th&gt;
&lt;th&gt;PydanticAI&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Retry on validation failure&lt;/td&gt;
&lt;td&gt;Explicit, configurable&lt;/td&gt;
&lt;td&gt;Provider-dependent&lt;/td&gt;
&lt;td&gt;Automatic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Provider support&lt;/td&gt;
&lt;td&gt;OpenAI-compat + patches&lt;/td&gt;
&lt;td&gt;Widest&lt;/td&gt;
&lt;td&gt;OpenAI, Anthropic, Gemini, Ollama&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bundle size&lt;/td&gt;
&lt;td&gt;Small&lt;/td&gt;
&lt;td&gt;Large&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Streaming structured output&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-agent / tool use&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Via LangGraph&lt;/td&gt;
&lt;td&gt;Built-in&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For a pure extraction task — document to typed Python object — &lt;strong&gt;instructor&lt;/strong&gt; is the default to reach for. It's the smallest possible surface area over the model API, validation errors are transparent, and it works against any OpenAI-compatible endpoint including local models.&lt;/p&gt;

&lt;p&gt;If you're building a full pipeline with multiple steps, conditional logic, or external tool calls and you're already in the LangChain ecosystem, &lt;code&gt;.with_structured_output()&lt;/code&gt; is the natural choice. You get cross-provider compatibility without maintaining separate extraction code per provider.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PydanticAI&lt;/strong&gt; shines when you need agents that reason, call tools, and return a structured result at the end. Its dependency injection model is particularly valuable for testing — you can swap the real LLM client for a deterministic mock at the test boundary without changing agent code.&lt;/p&gt;

&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;There's no universal winner. Choose based on what surrounds the extraction step:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Isolated extraction&lt;/strong&gt; from documents or raw text? Use instructor.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pipeline with routing, memory, or tool calls&lt;/strong&gt; already using LangChain? Use &lt;code&gt;.with_structured_output()&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent that reasons and produces typed final output&lt;/strong&gt;? Use PydanticAI.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One thing all three share: the quality of your Pydantic model definition directly affects extraction accuracy. Well-written &lt;code&gt;Field(description=...)&lt;/code&gt; annotations end up in the function schema the model sees — a clear description of &lt;code&gt;severity&lt;/code&gt; being "one of: critical, high, medium, low" consistently outperforms leaving it implicit.&lt;/p&gt;

&lt;p&gt;For security-adjacent pipelines in particular, typed outputs are non-negotiable. When extraction feeds into alerting or risk scoring, silent type coercion or missing fields create hard-to-trace downstream failures. Our &lt;a href="https://ayinedjimi-consultants.fr/checklists" rel="noopener noreferrer"&gt;security hardening checklists&lt;/a&gt; include guidance on where LLM-assisted analysis fits in a defensible pipeline architecture — worth a look if you're building this into production workflows.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I run &lt;a href="https://ayinedjimi-consultants.fr" rel="noopener noreferrer"&gt;AYI NEDJIMI Consultants&lt;/a&gt;, a cybersecurity consulting firm. We publish &lt;a href="https://ayinedjimi-consultants.fr/checklists" rel="noopener noreferrer"&gt;free security hardening checklists&lt;/a&gt; — PDF and Excel.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>llm</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>OpenAI Responses API vs Custom RAG: Cost, Latency and Control in 2026</title>
      <dc:creator>Ayi NEDJIMI</dc:creator>
      <pubDate>Fri, 29 May 2026 10:05:18 +0000</pubDate>
      <link>https://dev.to/ayinedjimi-consultants/openai-responses-api-vs-custom-rag-cost-latency-and-control-in-2026-2kaj</link>
      <guid>https://dev.to/ayinedjimi-consultants/openai-responses-api-vs-custom-rag-cost-latency-and-control-in-2026-2kaj</guid>
      <description>&lt;p&gt;When you need to add document retrieval to an LLM application, you have two realistic paths: use OpenAI built-in file_search tool via the Responses API, or build and manage your own RAG pipeline. The first option ships in a day; the second gives you full control over chunking, embeddings, retrieval logic, and cost. Picking the wrong one early will either lock you into an opaque managed service or waste weeks of engineering before the product even launches.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Responses API Actually Gives You
&lt;/h2&gt;

&lt;p&gt;The OpenAI Responses API (released in 2025 as the successor to the Assistants API) exposes file_search as a first-class tool backed by managed vector stores. You upload documents, attach them to a vector store, and the model retrieves relevant chunks automatically during inference.&lt;/p&gt;

&lt;p&gt;Here is a minimal working example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;security-policy.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;uploaded&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;purpose&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistants&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;vector_store&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;beta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vector_stores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;policy-docs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;beta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vector_stores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;vector_store_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;vector_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;file_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;uploaded&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;responses&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is the password rotation policy?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;file_search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vector_store_ids&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;vector_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;]}]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Zero-ops retrieval that works on day one. What you give up: visibility into which chunks were retrieved, control over the embedding model, and the ability to filter by metadata at query time. The store is entirely managed — you cannot inspect it, replicate it, or move it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building a Custom RAG Pipeline
&lt;/h2&gt;

&lt;p&gt;A custom pipeline involves four steps: chunk documents, embed chunks, store in a vector database, and query at runtime. Here is a stripped-down version using pgvector and OpenAI embeddings — every part is explicit and auditable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;psycopg2&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text-embedding-3-small&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;
    &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_chunks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;qvec&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT content, 1 - (embedding &amp;lt;=&amp;gt; %s::vector) AS score &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FROM chunks ORDER BY embedding &amp;lt;=&amp;gt; %s::vector LIMIT %s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;qvec&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;qvec&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetchall&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;answer_with_rag&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;search_chunks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# audit trail
&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;chr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Answer based only on:
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;More code, but you own the full chain: similarity scores, metadata filters, chunk size, embedding model. You can add BM25 hybrid search, apply filters like doc_type = policy AND dept = engineering, or re-rank with a cross-encoder without touching the underlying data store.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost Breakdown
&lt;/h2&gt;

&lt;p&gt;Here is what you are actually paying for in each approach.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Responses API with file_search:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Vector store storage: $0.10/GB/day — a 1 GB corpus costs $3/month just to sit there&lt;/li&gt;
&lt;li&gt;Retrieval tokens: chunks injected into the context window are billed at the full output token rate — you have no visibility into how many tokens that adds per query&lt;/li&gt;
&lt;li&gt;No way to optimize retrieval cost without reducing document volume&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Custom RAG:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Embedding at index time: ~$0.02 per million tokens with text-embedding-3-small&lt;/li&gt;
&lt;li&gt;Vector database: $0 (self-hosted pgvector or Chroma) to ~$50/month for a managed Qdrant cluster&lt;/li&gt;
&lt;li&gt;Query embedding: same $0.02/million rate&lt;/li&gt;
&lt;li&gt;LLM call: you control exactly what goes into context, so you control the per-query cost&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At small scale — under 10,000 queries/month on a few dozen documents — the Responses API wins on engineering time. At medium scale (100k+ queries/month, a few hundred documents), a custom pgvector setup typically runs 30-50% cheaper in pure infrastructure cost. That crossover assumes you are not counting the engineering hours to maintain the pipeline, which are real.&lt;/p&gt;

&lt;h2&gt;
  
  
  Latency and Observability
&lt;/h2&gt;

&lt;p&gt;The Responses API retrieval is fast — typically 200-400 ms for a vector search — but completely opaque. You cannot see which chunks were fetched, their similarity scores, or how many tokens they consumed. When the model starts hallucinating, you have no way to tell whether the right context was retrieved in the first place.&lt;/p&gt;

&lt;p&gt;Custom RAG gives you full traceability at every step. Logging similarity scores lets you set alert thresholds — for example, reject answers when no chunk scores above 0.70 — and build a proper audit trail of what context the model received. For security-sensitive applications, that audit trail is often a compliance requirement, not a nice-to-have. The &lt;a href="https://ayinedjimi-consultants.fr/checklists" rel="noopener noreferrer"&gt;security hardening checklists&lt;/a&gt; we publish cover what that logging and audit chain should look like end-to-end.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Use Each
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Use the Responses API with file_search if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You are prototyping and need to ship in 24 hours&lt;/li&gt;
&lt;li&gt;Document volume is small (under 500 files, under 200 MB)&lt;/li&gt;
&lt;li&gt;You do not need metadata filtering&lt;/li&gt;
&lt;li&gt;You have no infrastructure to run a vector database&lt;/li&gt;
&lt;li&gt;Auditability is not a hard requirement&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Build a custom RAG pipeline if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You need metadata filtering at query time (by date, department, classification level)&lt;/li&gt;
&lt;li&gt;You need full observability and cost control at scale&lt;/li&gt;
&lt;li&gt;You want to swap embedding models without migrating an opaque external store&lt;/li&gt;
&lt;li&gt;You process sensitive documents that cannot leave your infrastructure&lt;/li&gt;
&lt;li&gt;You need hybrid search (dense vectors + BM25)&lt;/li&gt;
&lt;li&gt;You are building on an open-weights model and cannot use managed stores anyway
## The Takeaway&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Responses API file_search is genuinely good for getting retrieval working fast. The managed setup removes operational friction and the integration is clean. But you pay for that convenience in three ways: reduced cost predictability at scale, zero visibility into retrieval quality, and lock-in to a single infrastructure provider.&lt;/p&gt;

&lt;p&gt;A custom RAG pipeline costs more to build upfront. In exchange, you own the full chain — chunking strategy, embedding model, retrieval logic, metadata filters, and the complete audit trail. For teams operating at meaningful scale or with security and compliance requirements, those trade-offs almost always favor the custom approach.&lt;/p&gt;

&lt;p&gt;Start with the Responses API to validate the use case quickly. Migrate to a custom pipeline when you hit the first real limit — whether that is a cost spike, a hallucination you cannot debug, or a metadata filter you cannot express. By that point, you will know exactly why the migration is worth it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I run &lt;a href="https://ayinedjimi-consultants.fr" rel="noopener noreferrer"&gt;AYI NEDJIMI Consultants&lt;/a&gt;, a cybersecurity consulting firm. We publish &lt;a href="https://ayinedjimi-consultants.fr/checklists" rel="noopener noreferrer"&gt;free security hardening checklists&lt;/a&gt; — PDF and Excel.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>llm</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
