<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Maksim Danilchenko</title>
    <description>The latest articles on DEV Community by Maksim Danilchenko (@dmaxdev).</description>
    <link>https://dev.to/dmaxdev</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3851903%2F271b9f0d-273c-44e2-a2c7-0d4ec886b1c5.jpeg</url>
      <title>DEV Community: Maksim Danilchenko</title>
      <link>https://dev.to/dmaxdev</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/dmaxdev"/>
    <language>en</language>
    <item>
      <title>MarkItDown vs Docling vs Marker: PDF to Markdown for LLMs</title>
      <dc:creator>Maksim Danilchenko</dc:creator>
      <pubDate>Sun, 03 May 2026 08:36:33 +0000</pubDate>
      <link>https://dev.to/dmaxdev/markitdown-vs-docling-vs-marker-pdf-to-markdown-for-llms-571o</link>
      <guid>https://dev.to/dmaxdev/markitdown-vs-docling-vs-marker-pdf-to-markdown-for-llms-571o</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;If you're feeding PDFs into a RAG pipeline or an LLM context window in 2026, three open-source tools own the space: &lt;strong&gt;MarkItDown&lt;/strong&gt; (Microsoft, fast and shallow), &lt;strong&gt;Docling&lt;/strong&gt; (IBM, slow and structurally rich), and &lt;strong&gt;Marker&lt;/strong&gt; (Vik Paruchuri / Datalab, GPU-hungry and accuracy-first). None is universally best. Pick MarkItDown when your inputs are clean digital PDFs you control. Docling earns its keep when tables, formulas, or multi-column academic layouts dominate. Marker is the right call when you have GPU budget and need the highest fidelity you can get without paying a vendor.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why bother comparing these three
&lt;/h2&gt;

&lt;p&gt;Every team building on top of a language model hits the same wall eventually: most of the source material lives in PDFs. Contracts, research papers, datasheets, regulatory filings, internal SOPs all ship as PDF and don't paste cleanly into a context window. Even with the long-context tricks I covered in &lt;a href="https://www.danilchenko.dev/posts/recursive-language-models/" rel="noopener noreferrer"&gt;Recursive Language Models&lt;/a&gt;, you still need clean text on the way in — garbage tokenization is garbage retrieval. Markdown is the lowest-common-denominator format that an LLM actually reads well: headings, tables, lists, and code, without HTML's tag noise or PDF's positional spaghetti.&lt;/p&gt;

&lt;p&gt;I've spent the last three weeks rebuilding a RAG ingestion pipeline that pulls roughly 4,000 PDFs from a regulatory archive: a mix of scanned 1990s circulars, recent EU directive PDFs with embedded tables, and academic papers with two-column layouts and inline math. The pipeline previously used &lt;code&gt;pdfplumber&lt;/code&gt; plus a hand-rolled table heuristic, and it was a mess. So I sat down and tested the three tools that keep coming up in 2026 RAG threads on Reddit and HN. Here's what I found, what surprised me, and which one I shipped.&lt;/p&gt;

&lt;p&gt;This is a comparison post, not a tutorial, but each tool gets a runnable snippet so you can reproduce the smoke test on your own corpus before committing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The contenders, briefly
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/microsoft/markitdown" rel="noopener noreferrer"&gt;&lt;strong&gt;MarkItDown&lt;/strong&gt;&lt;/a&gt; is Microsoft's official converter, MIT-licensed, currently at v0.1.5 (released February 20, 2026). It supports a long tail of formats (PDF, DOCX, PPTX, XLSX, HTML, images, audio, even YouTube URLs and EPUBs) and dumps everything to Markdown. The architecture is a thin wrapper around format-specific Python libraries (&lt;code&gt;pdfminer.six&lt;/code&gt; for PDFs, &lt;code&gt;python-pptx&lt;/code&gt;, &lt;code&gt;mammoth&lt;/code&gt;, etc.). No models. No GPU. &lt;code&gt;pip install&lt;/code&gt; and you're done in about ten seconds.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/docling-project/docling" rel="noopener noreferrer"&gt;&lt;strong&gt;Docling&lt;/strong&gt;&lt;/a&gt; is IBM Research's MIT-licensed converter, currently at v2.92.0 (released April 29, 2026, four days before this post). It uses a layout-detection model and an optional Visual Language Model called GraniteDocling (258M params) to preserve document structure. It runs on CPU by default but supports MLX acceleration on Apple Silicon and CUDA on NVIDIA. Output is a structured &lt;code&gt;DoclingDocument&lt;/code&gt; you can export to Markdown, JSON, or HTML.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/VikParuchuri/marker" rel="noopener noreferrer"&gt;&lt;strong&gt;Marker&lt;/strong&gt;&lt;/a&gt; is Datalab's GPL-3.0 converter (model weights under a custom Open RAIL-M license, free for personal and startup use under $2M revenue). Currently at v1.10.2 (released January 31, 2026). It bundles three of Datalab's own models (Surya for OCR + layout, Texify for formulas, and a layout/order model) into a tightly-tuned PDF pipeline. Peak VRAM is 5GB per worker. Datalab claims 122 pages/second on an H100, which translates to roughly 0.18s/page.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I tested
&lt;/h2&gt;

&lt;p&gt;Three input documents, picked to stress different parts of each tool:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;A 14-page EU regulation PDF&lt;/strong&gt; (digital, multi-column, dense tables) — the realistic ingestion case.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A 1996 scanned circular&lt;/strong&gt; (300 DPI, blurry, OCR territory) — the worst case.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A 22-page arXiv paper&lt;/strong&gt; (LaTeX-rendered, two-column, inline math, figures with captions) — the academic case.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Hardware: a Hetzner CPX31 (4 vCPU, 8GB RAM, no GPU) for the CPU runs, and a local M2 Pro MacBook with 32GB unified memory for the MLX/Apple-Silicon runs. No H100, so I can't reproduce Marker's GPU benchmark numbers; those stay flagged as reported by Datalab.&lt;/p&gt;

&lt;p&gt;I scored each output on three axes: &lt;strong&gt;wall-clock speed&lt;/strong&gt;, &lt;strong&gt;table fidelity&lt;/strong&gt; (does the markdown table match the visual table cell-for-cell?), and &lt;strong&gt;structural sanity&lt;/strong&gt; (do headings come through as &lt;code&gt;##&lt;/code&gt;, do lists stay as lists, do figure captions survive?).&lt;/p&gt;

&lt;h2&gt;
  
  
  MarkItDown: the fast, shallow workhorse
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;markitdown&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MarkItDown&lt;/span&gt;

&lt;span class="n"&gt;md&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MarkItDown&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;enable_plugins&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;md&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;convert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;eu-regulation.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text_content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the whole API. There's no model to download, no GPU to provision, no config knobs that matter. On the 14-page EU regulation, MarkItDown finished in 0.6 seconds on the Hetzner box. On the 22-page arXiv paper, 1.1 seconds. On the scanned 1996 circular, it produced almost no usable output. &lt;code&gt;pdfminer.six&lt;/code&gt; can't OCR, and MarkItDown doesn't run OCR by default.&lt;/p&gt;

&lt;p&gt;The structural fidelity is where it falls apart. Tables in the EU regulation came out as run-on paragraphs of cell content with no pipe characters, no row breaks, nothing a downstream parser could recover. The arXiv paper's two-column layout interleaved left and right columns sentence by sentence, which is exactly what you don't want when chunking for retrieval. Headings sometimes survived as &lt;code&gt;## Heading&lt;/code&gt;, sometimes came through as bold text, sometimes vanished into the body.&lt;/p&gt;

&lt;p&gt;Where MarkItDown shines is the rest of its format support. Throw it a PowerPoint deck and it produces clean Markdown with one slide per heading. Hand it a Word doc and it preserves nested lists and tables. The PDF path is the weak link, not the tool itself. If your corpus is 80% PowerPoint and 20% PDF, MarkItDown is the right answer. If it's the other way around, you're going to spend more time post-processing than you save.&lt;/p&gt;

&lt;p&gt;One detail Microsoft buries in the README: MarkItDown can call Azure Document Intelligence as an OCR backend if you set the &lt;code&gt;docintel_endpoint&lt;/code&gt; argument. That promotes it from "useless on scans" to "competitive on scans," but you're now paying Azure per page (roughly $1.50 per 1,000 pages on the read tier as of last check, with volume discounts above 1M pages), which is a different conversation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Docling: slow, model-heavy, structurally accurate
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;docling.document_converter&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DocumentConverter&lt;/span&gt;

&lt;span class="n"&gt;converter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DocumentConverter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;converter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;convert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;eu-regulation.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;export_to_markdown&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same shape. Underneath, the first call downloads roughly 600MB of model weights from Hugging Face into your &lt;code&gt;~/.cache&lt;/code&gt;. Subsequent runs are faster but never as fast as MarkItDown. On the Hetzner CPX31, the EU regulation took 41 seconds. On the M2 Pro with MLX, it dropped to 9 seconds. The arXiv paper took 78 seconds CPU, 14 seconds MLX. The scanned 1996 circular finally produced legible Markdown at 52 seconds, because Docling's layout model can route scanned regions through its OCR path automatically.&lt;/p&gt;

&lt;p&gt;Tables are where Docling earns its keep. The EU regulation's three-row, six-column tariff schedule came out as a clean Markdown table with the right cells in the right rows. The arXiv paper's results table preserved its column headers and row labels exactly. I didn't have to write a single regex to clean up output. That alone justifies the 50× wall-clock penalty for my use case.&lt;/p&gt;

&lt;p&gt;Docling's &lt;code&gt;DoclingDocument&lt;/code&gt; intermediate representation is more useful than I expected. You can export to Markdown, but you can also walk the document tree programmatically and pull out figures with their captions, tables as structured cells, or extract just the abstracts of academic papers without parsing the Markdown twice. For an ingestion pipeline that needs to chunk by section heading, this is a real win.&lt;/p&gt;

&lt;p&gt;The downside, beyond speed: install size. The base wheel pulls in PyTorch, Transformers, and several CV libraries. A clean &lt;code&gt;pip install docling&lt;/code&gt; in a fresh Docker image weighs in around 2.4GB. If you're packaging this for AWS Lambda, you're going to have a bad day. ECS Fargate or a real container runtime is the realistic deployment story.&lt;/p&gt;

&lt;h2&gt;
  
  
  Marker: GPU-hungry, accuracy-first
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;marker.converters.pdf&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;PdfConverter&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;marker.models&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;create_model_dict&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;marker.output&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;text_from_rendered&lt;/span&gt;

&lt;span class="n"&gt;converter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;PdfConverter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;artifact_dict&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;create_model_dict&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="n"&gt;rendered&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;converter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;eu-regulation.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;images&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;text_from_rendered&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rendered&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three lines instead of two, but the API is still small. The first call downloads Datalab's Surya, Texify, and layout models (about 1.1GB). On the Hetzner CPX31 (CPU only), Marker took 2 minutes 14 seconds on the EU regulation, 4 minutes 30 seconds on the arXiv paper. CPU is not Marker's preferred surface. On the M2 Pro with MPS, those dropped to 38 seconds and 71 seconds, which is still slower than Docling-MLX but produced visibly better math output on the arXiv paper.&lt;/p&gt;

&lt;p&gt;Where Marker pulls ahead: inline LaTeX. The arXiv paper's equations came through as &lt;code&gt;$\hat{y} = \mathbf{W}x + b$&lt;/code&gt;-style spans inside the Markdown, which is exactly what you want if you're handing the result to GPT or Claude. Both render LaTeX internally and reason about equations more accurately when the structure is preserved. Docling rendered most equations as image references with garbled OCR'd text. MarkItDown skipped them.&lt;/p&gt;

&lt;p&gt;Marker's structural recall on tables was a tie with Docling on simple grids and slightly worse on nested headers (a multi-row column header in the EU regulation came out flattened). On figures, Marker has the cleanest behavior of the three: it extracts each figure as a separate PNG, references it from the Markdown with a relative path, and pulls the caption from the surrounding text. For a RAG pipeline that wants to embed image regions separately, this is a big quality-of-life upgrade.&lt;/p&gt;

&lt;p&gt;Don't skip the license fine print. Marker's &lt;em&gt;code&lt;/em&gt; is GPL-3.0, which is fine for most server-side workloads. The &lt;em&gt;model weights&lt;/em&gt; are under Datalab's modified Open RAIL-M: free for personal use, research, and startups under $2M annual revenue/funding. Above that threshold, you need a commercial license from Datalab. If you're a Series-B-and-up company, factor in the procurment conversation before standardizing on Marker.&lt;/p&gt;

&lt;h2&gt;
  
  
  Head-to-head: the numbers
&lt;/h2&gt;

&lt;p&gt;All wall-clock numbers below are from my own runs, not vendor benchmarks. The H100 column for Marker is reported by Datalab and not independently verified.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;MarkItDown&lt;/th&gt;
&lt;th&gt;Docling&lt;/th&gt;
&lt;th&gt;Marker&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;License&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;MIT&lt;/td&gt;
&lt;td&gt;MIT&lt;/td&gt;
&lt;td&gt;GPL-3.0 + Open RAIL-M (weights)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Install size&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~80MB&lt;/td&gt;
&lt;td&gt;~2.4GB&lt;/td&gt;
&lt;td&gt;~1.5GB + 1.1GB models&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Stars (May 2026)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;120k&lt;/td&gt;
&lt;td&gt;59k&lt;/td&gt;
&lt;td&gt;34.6k&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GPU required?&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Optional (helps a lot)&lt;/td&gt;
&lt;td&gt;Recommended&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;EU reg, CPU&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.6s&lt;/td&gt;
&lt;td&gt;41s&lt;/td&gt;
&lt;td&gt;2m 14s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;arXiv paper, MLX/MPS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1.1s (CPU)&lt;/td&gt;
&lt;td&gt;14s&lt;/td&gt;
&lt;td&gt;71s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Scanned 1996 PDF&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Empty&lt;/td&gt;
&lt;td&gt;Legible&lt;/td&gt;
&lt;td&gt;Legible&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Tables (simple)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Broken&lt;/td&gt;
&lt;td&gt;Excellent&lt;/td&gt;
&lt;td&gt;Excellent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Tables (nested headers)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Broken&lt;/td&gt;
&lt;td&gt;Excellent&lt;/td&gt;
&lt;td&gt;OK&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Inline math&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Skipped&lt;/td&gt;
&lt;td&gt;Image+OCR&lt;/td&gt;
&lt;td&gt;LaTeX preserved&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Figures + captions&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Lost&lt;/td&gt;
&lt;td&gt;Caption only&lt;/td&gt;
&lt;td&gt;Image extracted + caption&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Reported H100 throughput&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;122 pages/sec&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Three takeaways from this matrix:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;MarkItDown is in a different speed class from the other two. If your PDFs are clean and your downstream consumer doesn't care about table structure, MarkItDown buys you a 50–100× speedup over the other two. That gap is the difference between processing a 10K-document corpus in an afternoon and a week.&lt;/li&gt;
&lt;li&gt;Docling and Marker are close on accuracy and far apart on dependencies. Docling is the easier deploy. Marker is the better GPU citizen.&lt;/li&gt;
&lt;li&gt;Nobody ships table-fidelity Markdown without a model. The 2024-era pure-Python parsers (&lt;code&gt;pdfplumber&lt;/code&gt;, &lt;code&gt;pdfminer&lt;/code&gt;) do not produce LLM-grade output on real-world documents, and MarkItDown is essentially a polished wrapper around those parsers.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  When to pick which
&lt;/h2&gt;

&lt;p&gt;A short decision matrix, based on what I actually shipped:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pick MarkItDown&lt;/strong&gt; if your PDFs are digital-native and structurally simple, your corpus skews toward Office formats, you need to deploy to a constrained environment (Lambda, edge), or you're prototyping and don't yet know if PDF quality will be a bottleneck. I keep MarkItDown around for the PowerPoint and Word path even when Docling handles the PDFs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pick Docling&lt;/strong&gt; if tables, formulas, or multi-column layouts dominate your corpus, you don't have a GPU, you want a clean intermediate representation you can walk programmatically, or you're on Apple Silicon and want MLX acceleration. This is what I shipped for the EU regulatory archive.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pick Marker&lt;/strong&gt; if you have GPU budget, your corpus is heavy on academic papers with inline math, you need clean per-figure extraction for downstream image embedding, or you're below the $2M revenue threshold for the model-weights license. For a research-paper pipeline at any reasonable scale, Marker is the strongest answer.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're building something general (a Notion-style "drop a PDF, get clean Markdown" feature, say), I'd run a tiered pipeline: MarkItDown first, fall back to Docling if MarkItDown's output looks structurally broken (zero tables detected, very low headings-to-body ratio), and fall back to Marker only for the documents that contain math. Most documents land in the fast path; the slow path only fires when it's worth the cost.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the hosted alternatives offer
&lt;/h2&gt;

&lt;p&gt;Two closed-source services keep coming up in the same threads, and they belong in any honest comparison even though this post focuses on open source:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://docs.mistral.ai/capabilities/document/" rel="noopener noreferrer"&gt;Mistral Document AI&lt;/a&gt;&lt;/strong&gt; is a hosted endpoint priced around $2 per 1,000 pages at last check (about half that with batch discounts). Reported quality on tables and math sits between Docling and Marker, with the operational benefit of zero local compute. I haven't run it on the same corpus as the open-source three, so treat that as second-hand impression rather than a measured ranking.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://reducto.ai/" rel="noopener noreferrer"&gt;Reducto&lt;/a&gt;&lt;/strong&gt; is more expensive (roughly $15 per 1,000 pages on the base tier) and is reportedly the strongest option on truly nasty inputs (handwritten annotations, multi-column scientific PDFs with inline formulas). Same caveat: I haven't paid for it on this corpus, so the framing is based on third-party benchmarks and a couple of recent HN threads, not my own runs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you care about time-to-market more than unit economics, paying a vendor is a perfectly defensible choice. If your corpus is large enough that the per-page bill would dominate your budget, the open-source path wins on cost even after you account for engineering time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting started
&lt;/h2&gt;

&lt;p&gt;The fastest path to evaluating all three on your own corpus:&lt;/p&gt;

&lt;p&gt;If your usual stack is &lt;code&gt;uv&lt;/code&gt; instead of plain &lt;code&gt;pip&lt;/code&gt; (worth it — see &lt;a href="https://www.danilchenko.dev/posts/uv-vs-pip-vs-poetry/" rel="noopener noreferrer"&gt;uv vs pip vs Poetry&lt;/a&gt; for the case), swap the install command for &lt;code&gt;uv pip install&lt;/code&gt;. The rest is identical.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# fresh venv&lt;/span&gt;
python3 &lt;span class="nt"&gt;-m&lt;/span&gt; venv .venv &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;source&lt;/span&gt; .venv/bin/activate

&lt;span class="c"&gt;# install all three&lt;/span&gt;
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s1"&gt;'markitdown[all]'&lt;/span&gt; docling marker-pdf

&lt;span class="c"&gt;# point them at the same PDF&lt;/span&gt;
python &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"from markitdown import MarkItDown; print(MarkItDown().convert('test.pdf').text_content)"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; out_markitdown.md
python &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"from docling.document_converter import DocumentConverter; print(DocumentConverter().convert('test.pdf').document.export_to_markdown())"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; out_docling.md
python &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"from marker.converters.pdf import PdfConverter; from marker.models import create_model_dict; from marker.output import text_from_rendered; r = PdfConverter(artifact_dict=create_model_dict())('test.pdf'); t,_,_ = text_from_rendered(r); print(t)"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; out_marker.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Diff the three Markdown outputs against your eyeballs. Whichever one you stop arguing with first is your tool. If you end up arguing with all three, you probably need a hosted service or a custom layout model, and that's a different post.&lt;/p&gt;

&lt;p&gt;For deployment, my opinionated default in 2026: Docling in a slim Python container, with MarkItDown as the fast-path fallback for clean digital PDFs. Marker stays in a GPU pool for the academic-paper subset, called only when the document's first page contains LaTeX-shaped tokens. If you're exposing the converter as a tool for an LLM agent rather than a batch job, wrap it as an MCP server — see &lt;a href="https://www.danilchenko.dev/posts/fastmcp-mcp-server/" rel="noopener noreferrer"&gt;Build a real MCP server with FastMCP&lt;/a&gt; for the Python pattern I use for exactly this kind of glue.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Which is better, MarkItDown or Docling?
&lt;/h3&gt;

&lt;p&gt;For PDFs specifically, Docling produces materially better output on tables, formulas, and multi-column layouts. MarkItDown is roughly 50–100× faster on simple digital PDFs but loses structural information that downstream RAG retrieval depends on. For non-PDF formats (PPTX, DOCX, EPUB), MarkItDown is the better tool because Docling's PDF-first model architecture isn't applied there.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is the fastest PDF-to-Markdown tool for LLMs?
&lt;/h3&gt;

&lt;p&gt;MarkItDown, by a wide margin: it's a thin wrapper around &lt;code&gt;pdfminer.six&lt;/code&gt; and runs in well under a second per page on CPU. The price is structural fidelity: it produces unusable output on tables, broken column ordering on multi-column PDFs, and nothing at all on scanned documents.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does Docling work without a GPU?
&lt;/h3&gt;

&lt;p&gt;Yes. Docling runs on CPU by default and is the only one of the three I'd recommend for CPU-only environments where accurate output still has to hold up. CPU runs are slower (40–80 seconds per multi-page document in my tests), but the output quality is the same. Apple Silicon with MLX cuts wall-clock by 3–5× without needing a discrete GPU.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is Marker free to use commercially?
&lt;/h3&gt;

&lt;p&gt;The code is GPL-3.0 and free to use, including commercially. The model weights are under Datalab's modified Open RAIL-M license: free for research, personal use, and any startup under $2M in annual revenue/funding. Above that threshold, you need a commercial license from Datalab.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I convert a PDF to Markdown for a RAG pipeline?
&lt;/h3&gt;

&lt;p&gt;Pick the converter that matches your accuracy and compute budget: MarkItDown for clean digital PDFs and constrained compute, Docling for tables and CPU-only deploys, Marker for math and GPU-equipped pipelines. Then chunk the resulting Markdown by heading (split on &lt;code&gt;^##&lt;/code&gt;), embed each chunk with a sentence-transformer or a hosted embedding API, and store in your vector DB of choice. The converter quality directly determines retrieval quality, so it's worth A/B-testing two or three options on a representative slice of your corpus before committing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/microsoft/markitdown" rel="noopener noreferrer"&gt;MarkItDown — github.com/microsoft/markitdown&lt;/a&gt; — official Microsoft repo, MIT license, v0.1.5 release notes&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/docling-project/docling" rel="noopener noreferrer"&gt;Docling — github.com/docling-project/docling&lt;/a&gt; — official IBM Research repo, MIT license, v2.92.0 release notes&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/VikParuchuri/marker" rel="noopener noreferrer"&gt;Marker — github.com/VikParuchuri/marker&lt;/a&gt; — official Datalab repo, GPL-3.0 + Open RAIL-M weights, v1.10.2 release notes&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://arxiv.org/abs/2408.09869" rel="noopener noreferrer"&gt;Docling whitepaper — arXiv:2408.09869&lt;/a&gt; — IBM's technical report on the Docling architecture&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.mistral.ai/capabilities/document/" rel="noopener noreferrer"&gt;Mistral Document AI&lt;/a&gt; — hosted alternative referenced for pricing context&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Bottom line
&lt;/h2&gt;

&lt;p&gt;Three usable tools, three honest tradeoffs. MarkItDown wins on speed and Office-format coverage. Docling wins on table fidelity and CPU-friendliness. Marker wins on math and figure handling, if you can spare the GPU. Pick the tool whose weakness you can live with rather than the one with the flashiest benchmark. Your bottleneck is downstream retrieval quality, not converter throughput, and the converter you pick is the input to that quality.&lt;/p&gt;

&lt;p&gt;For my regulatory-archive job: Docling, MLX-accelerated on the M2 Pro for nightly batch ingestion, with MarkItDown as a fast-path optimization for the documents I already know are clean. The 4,000-PDF backfill ran over a weekend. The downstream retrieval got measurably better the day I switched off the old &lt;code&gt;pdfplumber&lt;/code&gt; script, which was the whole point of the rebuild.&lt;/p&gt;

</description>
      <category>markitdown</category>
      <category>docling</category>
      <category>marker</category>
      <category>rag</category>
    </item>
    <item>
      <title>Python t-strings (PEP 750): A Practical Tutorial With Real Examples</title>
      <dc:creator>Maksim Danilchenko</dc:creator>
      <pubDate>Mon, 27 Apr 2026 08:35:20 +0000</pubDate>
      <link>https://dev.to/dmaxdev/python-t-strings-pep-750-a-practical-tutorial-with-real-examples-12cf</link>
      <guid>https://dev.to/dmaxdev/python-t-strings-pep-750-a-practical-tutorial-with-real-examples-12cf</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;Python 3.14 ships &lt;a href="https://peps.python.org/pep-0750/" rel="noopener noreferrer"&gt;t-strings (PEP 750)&lt;/a&gt;, a new string literal that looks like an f-string but returns a &lt;code&gt;Template&lt;/code&gt; object instead of a finished &lt;code&gt;str&lt;/code&gt;. You get the static parts and the interpolated values separately, so a library author can sanitize, escape, parameterize, or defer the rendering. I rewrote a small SQLite logger I keep on my laptop using t-strings and the diff was about ten lines, but the SQL injection class of bug is now structurally impossible. Library authors will get the most use out of them; application code will mostly read t-strings rather than write them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why f-strings stop being enough
&lt;/h2&gt;

&lt;p&gt;I have been writing Python since 2.6 and f-strings, introduced in 3.6, were a clear win. They replaced &lt;code&gt;%&lt;/code&gt; formatting and &lt;code&gt;.format()&lt;/code&gt; for almost everything I do. The catch is that f-strings &lt;em&gt;evaluate immediately&lt;/em&gt;: the moment you write &lt;code&gt;f"... {x} ..."&lt;/code&gt;, Python calls &lt;code&gt;str.__format__&lt;/code&gt; on each interpolated value and concatenates the result. There is no hook, no transform, no chance for a library to inspect what got plugged into the gaps.&lt;/p&gt;

&lt;p&gt;That sounds academic until you watch a junior engineer write &lt;code&gt;cursor.execute(f"SELECT * FROM users WHERE name = '{name}'")&lt;/code&gt; for the third time. The "use parameterized queries" lecture is technically correct and operationally ignored, because the f-string syntax is too inviting. The Python 3.14 release notes from the &lt;a href="https://www.python.org/downloads/release/python-3144/" rel="noopener noreferrer"&gt;Python 3.14.4 page&lt;/a&gt; call this out indirectly: PEP 750 lists "domain-specific languages that need string-like syntax with safe interpolation" as the headline use case.&lt;/p&gt;

&lt;p&gt;T-strings close that hole. Instead of producing a &lt;code&gt;str&lt;/code&gt;, the literal &lt;code&gt;t"..."&lt;/code&gt; produces a &lt;code&gt;string.templatelib.Template&lt;/code&gt; instance. The library author decides what happens next.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setup: getting Python 3.14 on your machine
&lt;/h2&gt;

&lt;p&gt;You need Python 3.14 or newer. The current stable as of this post is 3.14.4 (April 7, 2026). On macOS I use &lt;code&gt;uv&lt;/code&gt; because it manages interpreter installs without touching the system Python (I &lt;a href="https://www.danilchenko.dev/posts/uv-vs-pip-vs-poetry/" rel="noopener noreferrer"&gt;compared uv against pip and Poetry here&lt;/a&gt; if you want the long version):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;uv python &lt;span class="nb"&gt;install &lt;/span&gt;3.14
&lt;span class="nv"&gt;$ &lt;/span&gt;uv python pin 3.14
&lt;span class="nv"&gt;$ &lt;/span&gt;uv run python &lt;span class="nt"&gt;--version&lt;/span&gt;
Python 3.14.4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you prefer pyenv or the official installer, both work. The point is that t-strings are syntax. There is no &lt;code&gt;from __future__ import&lt;/code&gt; to backport them. A &lt;code&gt;t"..."&lt;/code&gt; literal is a &lt;code&gt;SyntaxError&lt;/code&gt; on 3.13 and earlier.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Already on Python 3.14? See my walkthrough of the &lt;a href="https://www.danilchenko.dev/posts/python-314-free-threading/" rel="noopener noreferrer"&gt;free-threaded build&lt;/a&gt; for the GIL story that shipped alongside t-strings.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The shape of a Template object
&lt;/h2&gt;

&lt;p&gt;Open a 3.14 REPL and try this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Pythonista&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;site&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;danilchenko.dev&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;template&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello, {name}! Welcome to {site}!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;template&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="err"&gt;'&lt;/span&gt;&lt;span class="nc"&gt;string&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;templatelib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Template&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;
&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;template&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;strings&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Hello, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;! Welcome to &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;!&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;template&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;interpolations&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Interpolation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Pythonista&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
 &lt;span class="nc"&gt;Interpolation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;danilchenko.dev&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;site&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;template&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Pythonista&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;danilchenko.dev&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That output is the whole secret. Three observations:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;strings&lt;/code&gt; is a tuple of the &lt;em&gt;literal&lt;/em&gt; fragments around your interpolations. There is always exactly one more string than there are interpolations (some may be empty).&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;interpolations&lt;/code&gt; is a tuple of &lt;code&gt;Interpolation&lt;/code&gt; objects, each with four fields: &lt;code&gt;value&lt;/code&gt;, &lt;code&gt;expression&lt;/code&gt;, &lt;code&gt;conversion&lt;/code&gt;, and &lt;code&gt;format_spec&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The order is implicit: the template alternates &lt;code&gt;strings[0], interpolations[0], strings[1], interpolations[1], ...&lt;/code&gt;. To walk the alternation explicitly you iterate the template directly: &lt;code&gt;for item in template: ...&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The &lt;code&gt;Interpolation&lt;/code&gt; class deserves a closer look because the &lt;code&gt;expression&lt;/code&gt; field is what makes structured logging click:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;template&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;interpolations&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;
&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Pythonista&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;expression&lt;/span&gt;
&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conversion&lt;/span&gt;        &lt;span class="c1"&gt;# 'a', 'r', 's', or None
&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;format_spec&lt;/span&gt;       &lt;span class="c1"&gt;# '' here, '04d' in t"{n:04d}", etc.
&lt;/span&gt;&lt;span class="sh"&gt;''&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The library author can read &lt;code&gt;i.expression&lt;/code&gt; to learn the &lt;em&gt;source code&lt;/em&gt; of the placeholder, not just its evaluated value. That single attribute makes structured logs, SQL placeholder names, and i18n catalog keys trivial to build. None of that was reachable from f-strings.&lt;/p&gt;

&lt;h2&gt;
  
  
  A SQL helper that makes injection structurally impossible
&lt;/h2&gt;

&lt;p&gt;Here is the shortest practical example I keep around. The function turns any t-string into a (&lt;code&gt;query&lt;/code&gt;, &lt;code&gt;params&lt;/code&gt;) pair compatible with &lt;code&gt;sqlite3.execute()&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# safe_sql.py
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sqlite3&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;string.templatelib&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Template&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;parameterize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;template&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Template&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...]]:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;template&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Template&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;TypeError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;safe_sql expected a t-string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;template&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nf"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;sqlite3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Connection&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;template&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Template&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;parameterize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;template&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now use it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sqlite3&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;safe_sql&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;parameterize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sqlite3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:memory:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CREATE TABLE users (name TEXT, age INT)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;INSERT INTO users VALUES (?, ?)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Anna&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;33&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;evil&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"'&lt;/span&gt;&lt;span class="s"&gt;; DROP TABLE users;--&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;parameterize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT * FROM users WHERE name = {evil}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;sql&lt;/span&gt;
&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;SELECT * FROM users WHERE name = ?&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"'&lt;/span&gt;&lt;span class="s"&gt;; DROP TABLE users;--&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT * FROM users WHERE name = {evil}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT * FROM users WHERE age &amp;gt; {30}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Anna&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;33&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The injected payload lands in the parameter tuple. SQLite escapes it correctly because the SQL itself never contains the value — it contains a &lt;code&gt;?&lt;/code&gt;. Compare against the f-string version that everyone has typed at 11 PM:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Don't do this. Ever.
&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT * FROM users WHERE name = &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;evil&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'"&lt;/span&gt;
&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# sqlite3.OperationalError: near "DROP": syntax error
# (and on a different DB it would have happily dropped the table)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The structural win is that &lt;code&gt;parameterize&lt;/code&gt; only accepts a &lt;code&gt;Template&lt;/code&gt;. If a junior writes &lt;code&gt;query(conn, f"...")&lt;/code&gt;, &lt;a href="https://www.danilchenko.dev/posts/ty-vs-mypy-vs-pyright/" rel="noopener noreferrer"&gt;your type checker of choice&lt;/a&gt; catches it at the type boundary, and at runtime the &lt;code&gt;isinstance&lt;/code&gt; check raises immediately. The unsafe path requires affirmative effort to reach.&lt;/p&gt;

&lt;p&gt;I tried this on a small budget tracker that lives in &lt;code&gt;~/code/buckets&lt;/code&gt;. The before-state was a smattering of &lt;code&gt;f"UPDATE accounts SET balance = {amount} WHERE id = '{acct}'"&lt;/code&gt; calls written for an audience of one (me) but written badly enough that I would not run it as a service. After porting to t-strings the diff was 8 lines of changed source plus a 14-line &lt;code&gt;safe_sql.py&lt;/code&gt; helper. Every place that used to take a string now takes a &lt;code&gt;Template&lt;/code&gt;. The class of bug went away because the wrong shape no longer typechecks.&lt;/p&gt;

&lt;h2&gt;
  
  
  HTML escaping with the same pattern
&lt;/h2&gt;

&lt;p&gt;The exact same skeleton produces an HTML helper. The PEP 750 reference and &lt;a href="https://realpython.com/python-t-strings/" rel="noopener noreferrer"&gt;Real Python's t-strings tutorial&lt;/a&gt; both show this; here is my version with the imports tightened:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# safe_html.py
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;html&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;escape&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;string.templatelib&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Template&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;render&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;template&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Template&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;template&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Template&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;TypeError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;safe_html expected a t-string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;template&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;escape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;quote&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;safe_html&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;render&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;bad&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;script&amp;gt;alert(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;xss&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;)&amp;lt;/script&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;render&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;p&amp;gt;Hello, {bad}!&amp;lt;/p&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;p&amp;gt;Hello, &amp;amp;lt;script&amp;amp;gt;alert(&amp;amp;#x27;xss&amp;amp;#x27;)&amp;amp;lt;/script&amp;amp;gt;!&amp;lt;/p&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The static &lt;code&gt;&amp;lt;p&amp;gt;...&amp;lt;/p&amp;gt;&lt;/code&gt; passes through untouched because it is part of &lt;code&gt;template.strings&lt;/code&gt;. The interpolated &lt;code&gt;bad&lt;/code&gt; lands in &lt;code&gt;template.interpolations&lt;/code&gt;, gets escaped, and only then concatenated. A reader cannot accidentally introduce XSS by writing user input into the template — the escaper sees user input &lt;em&gt;as user input&lt;/em&gt;, not as a string fragment.&lt;/p&gt;

&lt;p&gt;A more capable HTML library could special-case attribute interpolation, dict-of-attrs syntax, and component-style nesting. The PEP itself gestures at this with the &lt;code&gt;t"&amp;lt;img {attributes} /&amp;gt;"&lt;/code&gt; example where &lt;code&gt;attributes&lt;/code&gt; is a dict.&lt;/p&gt;

&lt;h2&gt;
  
  
  Logging without paying for the format string
&lt;/h2&gt;

&lt;p&gt;Python's &lt;code&gt;logging&lt;/code&gt; module has a long-standing performance trick: pass a format string and the args separately, like &lt;code&gt;log.info("user %s logged in", user_id)&lt;/code&gt;, so that &lt;code&gt;%&lt;/code&gt;-formatting only runs if the log line actually fires. F-strings break this — the format runs at the call site whether or not &lt;code&gt;INFO&lt;/code&gt; is enabled.&lt;/p&gt;

&lt;p&gt;T-strings give you the trick back, plus structured context:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# t_log.py
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;string.templatelib&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Template&lt;/span&gt;


&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;LazyTemplate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;A logging-safe wrapper that defers rendering.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;template&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Template&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;template&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Template&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;TypeError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LazyTemplate expected a t-string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_template&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;template&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__str__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_template&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;format_spec&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;expression&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_template&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;interpolations&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; | &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;template&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Template&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;%s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;LazyTemplate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;template&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Used like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t_log&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;basicConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;INFO&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;%(message)s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;latency&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anna&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;42.7&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;t_log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;login complete for {user} in {latency:.1f}ms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;login&lt;/span&gt; &lt;span class="n"&gt;complete&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;anna&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="mf"&gt;42.7&lt;/span&gt;&lt;span class="n"&gt;ms&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anna&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;latency&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;42.7&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When the level is raised to &lt;code&gt;WARNING&lt;/code&gt;, the &lt;code&gt;__str__&lt;/code&gt; call never runs and the JSON dict is never built. You get human-readable messages and machine-readable context from one literal, with no extra cost when the log line is suppressed.&lt;/p&gt;

&lt;h2&gt;
  
  
  f-strings vs t-strings — a side-by-side cheat sheet
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;f-string (&lt;code&gt;f"..."&lt;/code&gt;)&lt;/th&gt;
&lt;th&gt;t-string (&lt;code&gt;t"..."&lt;/code&gt;)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Return type&lt;/td&gt;
&lt;td&gt;&lt;code&gt;str&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;string.templatelib.Template&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Evaluated when?&lt;/td&gt;
&lt;td&gt;Immediately at the literal&lt;/td&gt;
&lt;td&gt;Whenever the consumer iterates it&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Where to use&lt;/td&gt;
&lt;td&gt;Application code, print, simple formatting&lt;/td&gt;
&lt;td&gt;Library APIs that take user-controlled values&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Can a library hook in?&lt;/td&gt;
&lt;td&gt;No — already concatenated&lt;/td&gt;
&lt;td&gt;Yes — via &lt;code&gt;template.strings&lt;/code&gt; and &lt;code&gt;template.interpolations&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Knows the source expression?&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes — &lt;code&gt;interpolation.expression&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Can replace any &lt;code&gt;str&lt;/code&gt;?&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No — needs a renderer first&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Backportable?&lt;/td&gt;
&lt;td&gt;No (3.6+)&lt;/td&gt;
&lt;td&gt;No (3.14+ syntax)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Raw variant?&lt;/td&gt;
&lt;td&gt;&lt;code&gt;rf"..."&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;rt"..."&lt;/code&gt; or &lt;code&gt;tr"..."&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The "Can replace any &lt;code&gt;str&lt;/code&gt;?" row is the source of every gotcha. Because a &lt;code&gt;Template&lt;/code&gt; is a separate type, you cannot pass it to &lt;code&gt;print&lt;/code&gt; and expect formatted output, you cannot send it to a function that calls &lt;code&gt;len()&lt;/code&gt; on it, and &lt;code&gt;t"hi" + " there"&lt;/code&gt; raises &lt;code&gt;TypeError&lt;/code&gt;. The library author has to provide a renderer, which is by design and which surprises people on the first day.&lt;/p&gt;

&lt;h2&gt;
  
  
  Caveats and gotchas worth knowing
&lt;/h2&gt;

&lt;p&gt;A few things tripped me up the first week, in order of how much time each one cost me.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You cannot mix &lt;code&gt;f&lt;/code&gt; and &lt;code&gt;t&lt;/code&gt; prefixes.&lt;/strong&gt; &lt;code&gt;ft"..."&lt;/code&gt; is a &lt;code&gt;SyntaxError&lt;/code&gt;. If you need both behaviors in one file, write two literals. The accepted prefix combinations are &lt;code&gt;t&lt;/code&gt;, &lt;code&gt;T&lt;/code&gt;, &lt;code&gt;rt&lt;/code&gt;, &lt;code&gt;Rt&lt;/code&gt;, &lt;code&gt;rT&lt;/code&gt;, &lt;code&gt;RT&lt;/code&gt;, &lt;code&gt;tr&lt;/code&gt;, &lt;code&gt;tR&lt;/code&gt;, &lt;code&gt;Tr&lt;/code&gt;, &lt;code&gt;TR&lt;/code&gt;. No others.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;Template&lt;/code&gt; does not implement &lt;code&gt;__len__&lt;/code&gt; or &lt;code&gt;__contains__&lt;/code&gt;.&lt;/strong&gt; This is deliberate — the value can change once you render it, and a library author may render to something other than a string. If you want length, render first.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;isinstance(x, Template)&lt;/code&gt; is the right check, not &lt;code&gt;isinstance(x, str)&lt;/code&gt;.&lt;/strong&gt; I wasted thirty minutes on a function that did &lt;code&gt;if not x:&lt;/code&gt; on a template, which calls &lt;code&gt;__bool__&lt;/code&gt;, which is always truthy for templates, so type-check explicitly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Empty static segments are still in &lt;code&gt;template.strings&lt;/code&gt;.&lt;/strong&gt; A literal &lt;code&gt;t"{a}{b}"&lt;/code&gt; produces &lt;code&gt;strings = ("", "", "")&lt;/code&gt;. Direct iteration over the template silently drops the empties, so &lt;code&gt;for item in template:&lt;/code&gt; already does the right thing for renderers; the empties only show up if you read &lt;code&gt;template.strings&lt;/code&gt; directly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The &lt;code&gt;expression&lt;/code&gt; field is the source text, not a variable lookup.&lt;/strong&gt; &lt;code&gt;t"{a + b}"&lt;/code&gt; gives an &lt;code&gt;Interpolation&lt;/code&gt; whose &lt;code&gt;expression&lt;/code&gt; is &lt;code&gt;"a + b"&lt;/code&gt; and whose &lt;code&gt;value&lt;/code&gt; is the evaluated result. Useful for debug logs; do not try to round-trip the expression back through &lt;code&gt;eval&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;There is no f-string to t-string converter.&lt;/strong&gt; A linter could rewrite trivial cases, but in general the migration is a behavior change and has to be reviewed by hand. I ported the SQL spots first because the security argument made the priority obvious; the rest can wait until the helpers exist for them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Subprocess support is still a draft.&lt;/strong&gt; &lt;a href="https://peps.python.org/pep-0787/" rel="noopener noreferrer"&gt;PEP 787&lt;/a&gt; proposes letting &lt;code&gt;subprocess.run(t"...", shell=True)&lt;/code&gt; shell-quote interpolated values automatically. As of 3.14.4 it is &lt;em&gt;deferred&lt;/em&gt; — the authors plan to revise after experimental implementations land in the 3.14 beta cycle, with a target of 3.15. For now, write your own &lt;code&gt;shlex.quote&lt;/code&gt; renderer if you need one.&lt;/p&gt;

&lt;h2&gt;
  
  
  When &lt;em&gt;not&lt;/em&gt; to use t-strings
&lt;/h2&gt;

&lt;p&gt;I keep seeing developers reach for t-strings everywhere because the security framing is compelling. Most code does not need them.&lt;/p&gt;

&lt;p&gt;Application code that builds a one-shot human-readable message (a print statement, an exception text, a debug log) should keep using f-strings. The reason f-strings are so popular is that they are the right tool for the boring 90% of string formatting. T-strings only pay for themselves when there is a &lt;em&gt;consumer&lt;/em&gt; of the literal that needs to inspect it. If the consumer is &lt;code&gt;print&lt;/code&gt;, an f-string is shorter, faster, and easier to read.&lt;/p&gt;

&lt;p&gt;The rule of thumb I am using: t-string the API, f-string the body. Library boundaries take templates; everything inside the function uses regular strings.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What are t-strings in Python?
&lt;/h3&gt;

&lt;p&gt;T-strings are a new string literal in Python 3.14, introduced by &lt;a href="https://peps.python.org/pep-0750/" rel="noopener noreferrer"&gt;PEP 750&lt;/a&gt;. The syntax mirrors f-strings — &lt;code&gt;t"hello {name}"&lt;/code&gt; — but the literal evaluates to a &lt;code&gt;string.templatelib.Template&lt;/code&gt; instance instead of a &lt;code&gt;str&lt;/code&gt;. The Template exposes the static fragments and interpolated values separately, so library code can intercept and transform them before final rendering.&lt;/p&gt;

&lt;h3&gt;
  
  
  How are t-strings different from f-strings?
&lt;/h3&gt;

&lt;p&gt;F-strings produce a &lt;code&gt;str&lt;/code&gt; immediately. T-strings produce a &lt;code&gt;Template&lt;/code&gt; object. F-strings are convenient for application code; t-strings are designed for library APIs that need to sanitize, escape, parameterize, or defer the interpolation. You can iterate a Template to walk the alternation of static strings and &lt;code&gt;Interpolation&lt;/code&gt; objects; you cannot do that with an f-string because the f-string is already collapsed into a flat string.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do t-strings prevent SQL injection?
&lt;/h3&gt;

&lt;p&gt;They do not prevent it on their own — they make a safe API expressible. Because the library function only ever sees the user input as &lt;code&gt;interpolation.value&lt;/code&gt;, never as part of the SQL fragment, you can replace each interpolation with a &lt;code&gt;?&lt;/code&gt; placeholder and pass the values through the database driver's parameter binding. The driver does the actual escaping. The structural change is that the unsafe path (raw f-string concatenation) is no longer the path of least resistance.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Python version supports t-strings?
&lt;/h3&gt;

&lt;p&gt;Python 3.14, released October 7, 2025, with the latest patch being 3.14.4 on April 7, 2026. T-strings are a syntactic feature, so there is no backport. A &lt;code&gt;t"..."&lt;/code&gt; literal will raise &lt;code&gt;SyntaxError&lt;/code&gt; on 3.13 and earlier.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can you pass a t-string anywhere a string is expected?
&lt;/h3&gt;

&lt;p&gt;No. &lt;code&gt;Template&lt;/code&gt; is not a subclass of &lt;code&gt;str&lt;/code&gt;. Passing one to &lt;code&gt;print()&lt;/code&gt; will print the repr of the Template, not the rendered text. Concatenation with &lt;code&gt;+&lt;/code&gt; raises &lt;code&gt;TypeError&lt;/code&gt;. The library that consumes the t-string has to provide a renderer. This is by design. Silently coercing to &lt;code&gt;str&lt;/code&gt; would defeat the security guarantees t-strings are built for.&lt;/p&gt;

&lt;h3&gt;
  
  
  Will t-strings replace f-strings?
&lt;/h3&gt;

&lt;p&gt;No. F-strings remain the right tool for application-level string formatting. T-strings target library and DSL authors. Most Python users will &lt;em&gt;write&lt;/em&gt; t-strings only when calling SQL, HTML, logging, i18n, or shell helpers, and will &lt;em&gt;consume&lt;/em&gt; them rarely.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://peps.python.org/pep-0750/" rel="noopener noreferrer"&gt;PEP 750 — Template Strings&lt;/a&gt; — the accepted proposal that introduced t-strings, with the full motivation, rationale, and reference implementation.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.python.org/3/library/string.templatelib.html" rel="noopener noreferrer"&gt;string.templatelib — Python 3.14.4 documentation&lt;/a&gt; — official module reference for &lt;code&gt;Template&lt;/code&gt; and &lt;code&gt;Interpolation&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.python.org/downloads/release/python-3144/" rel="noopener noreferrer"&gt;Python 3.14.4 release notes&lt;/a&gt; — the patch release used for examples in this post.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.python.org/3/whatsnew/3.14.html" rel="noopener noreferrer"&gt;What's new in Python 3.14&lt;/a&gt; — full changelog including t-strings, free-threading, and the experimental JIT.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://realpython.com/python-t-strings/" rel="noopener noreferrer"&gt;Real Python — Python 3.14: Template Strings&lt;/a&gt; — secondary tutorial with additional examples used to cross-check the SQL and HTML helpers.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://peps.python.org/pep-0787/" rel="noopener noreferrer"&gt;PEP 787 — Safer subprocess usage using t-strings&lt;/a&gt; — deferred proposal for &lt;code&gt;subprocess&lt;/code&gt; and &lt;code&gt;shlex&lt;/code&gt; integration.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Bottom line
&lt;/h2&gt;

&lt;p&gt;T-strings are a small syntax change with most of the impact concentrated in library APIs. Your daily &lt;code&gt;print(f"hello {name}")&lt;/code&gt; keeps working as before. But over the next few years, expect &lt;code&gt;sqlite3&lt;/code&gt;, &lt;code&gt;psycopg&lt;/code&gt;, &lt;code&gt;httpx&lt;/code&gt;, &lt;code&gt;subprocess&lt;/code&gt;, and the structured logging libraries to grow t-string-aware constructors. The code samples in this tutorial are short on purpose: once you understand &lt;code&gt;template.strings&lt;/code&gt; and &lt;code&gt;template.interpolations&lt;/code&gt;, every other helper is a variation on the same loop. Try it on the next SQL or HTML hot spot in your codebase. The diff is small, and the class of bug it removes is large.&lt;/p&gt;

</description>
      <category>python</category>
      <category>python314</category>
      <category>pep750</category>
      <category>tutorials</category>
    </item>
    <item>
      <title>Hetzner vs DigitalOcean 2026: Real Numbers After the Price Hike</title>
      <dc:creator>Maksim Danilchenko</dc:creator>
      <pubDate>Sun, 19 Apr 2026 02:14:11 +0000</pubDate>
      <link>https://dev.to/dmaxdev/hetzner-vs-digitalocean-2026-real-numbers-after-the-price-hike-35g0</link>
      <guid>https://dev.to/dmaxdev/hetzner-vs-digitalocean-2026-real-numbers-after-the-price-hike-35g0</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;Hetzner raised most cloud server prices by 30–37% on April 1, 2026 (steeper on some US tiers). Despite that, it's still 50–70% cheaper than DigitalOcean for equivalent CPU and RAM, and it includes 4–5× more bandwidth on the same tier. Recent migration write-ups land on roughly the same number: about $14K saved per year on a mid-sized stack. Switching is worth it if you're running your own MySQL/Postgres and Nginx; it isn't worth it if you depend on managed databases, App Platform, or Spaces. I run two production boxes on Hetzner from Cyprus and one droplet on DigitalOcean for a US-only side project, so the rest of this comes straight from current bills.&lt;/p&gt;

&lt;h2&gt;
  
  
  What changed on April 1, 2026
&lt;/h2&gt;

&lt;p&gt;Hetzner &lt;a href="https://www.hetzner.com/pressroom/statement-price-adjustment/" rel="noopener noreferrer"&gt;announced the price adjustment in late February&lt;/a&gt; and rolled it out a month later. The company cited rising hardware acquisition costs; &lt;a href="https://www.tomshardware.com/tech-industry/hetzner-to-raise-prices-by-up-to-37-percent-from-april-1" rel="noopener noreferrer"&gt;Tom's Hardware&lt;/a&gt; framed it against a 171% year-over-year jump in DRAM. The change applies to both new orders and existing products. There was no grandfathering.&lt;/p&gt;

&lt;p&gt;The increases aren't uniform:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cloud servers in Germany and Finland&lt;/strong&gt;: +30% to +37% depending on tier&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloud servers in the US&lt;/strong&gt;: broadly similar, with some tiers seeing larger jumps&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dedicated servers&lt;/strong&gt;: smaller bumps, mostly in setup fees&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storage Box and bandwidth pricing&lt;/strong&gt;: largely unchanged&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;DigitalOcean hasn't raised pricing in 2026. The gap narrowed, but it didn't close.&lt;/p&gt;

&lt;h2&gt;
  
  
  Current pricing: side by side
&lt;/h2&gt;

&lt;p&gt;This is a head-to-head on the tiers that come up the most in real billing tickets: small workhorse VMs, mid-sized API servers, and "I just want a Postgres host" boxes. All numbers are post-April-1 Hetzner pricing, converted at €1 = $1.07.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tier (Hetzner SKU)&lt;/th&gt;
&lt;th&gt;RAM / vCPU / Disk&lt;/th&gt;
&lt;th&gt;Hetzner Cloud (FSN/HEL)&lt;/th&gt;
&lt;th&gt;DigitalOcean Basic&lt;/th&gt;
&lt;th&gt;Hetzner Bandwidth&lt;/th&gt;
&lt;th&gt;DO Bandwidth&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Entry (CPX22)&lt;/td&gt;
&lt;td&gt;4 GB / 2 vCPU / 40 GB NVMe&lt;/td&gt;
&lt;td&gt;€7.99 / mo (~$8.55)&lt;/td&gt;
&lt;td&gt;$24 / mo&lt;/td&gt;
&lt;td&gt;20 TB&lt;/td&gt;
&lt;td&gt;4 TB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mid (CPX32)&lt;/td&gt;
&lt;td&gt;8 GB / 4 vCPU / 80 GB NVMe&lt;/td&gt;
&lt;td&gt;€13.99 / mo (~$14.97)&lt;/td&gt;
&lt;td&gt;$48 / mo&lt;/td&gt;
&lt;td&gt;20 TB&lt;/td&gt;
&lt;td&gt;5 TB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Workhorse (CPX42)&lt;/td&gt;
&lt;td&gt;16 GB / 8 vCPU / 160 GB NVMe&lt;/td&gt;
&lt;td&gt;€25.49 / mo (~$27.27)&lt;/td&gt;
&lt;td&gt;$96 / mo&lt;/td&gt;
&lt;td&gt;20 TB&lt;/td&gt;
&lt;td&gt;6 TB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Beefy (CPX52)&lt;/td&gt;
&lt;td&gt;32 GB / 16 vCPU / 240 GB NVMe&lt;/td&gt;
&lt;td&gt;€36.49 / mo (~$39.04)&lt;/td&gt;
&lt;td&gt;$188 / mo (DO General Purpose, 8 vCPU)&lt;/td&gt;
&lt;td&gt;20 TB&lt;/td&gt;
&lt;td&gt;6 TB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The pre-April Hetzner CPX21 (the spiritual ancestor of the entry tier) cost €5.83/mo, so €7.99 represents a +37% jump. Even after that bump, the Hetzner column is roughly a third of DO Basic at the low end, and at the Workhorse tier you get 2× the vCPUs on top of the price gap. You also get 4–5× the included bandwidth across every tier.&lt;/p&gt;

&lt;p&gt;The bandwidth point is the one that flips ROI for video, image-heavy SaaS, and game servers. DigitalOcean charges roughly $10/TB over the included quota (priced as $0.01/GB); Hetzner charges €1/TB. On a workload pushing 10 TB/month over the included tier, that's $100/month versus about €10/month, roughly $1,000/year in bandwidth savings on top of the base price gap.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance: closer than you'd guess
&lt;/h2&gt;

&lt;p&gt;Hetzner runs newer silicon. The CPX line uses AMD EPYC 7002 and 7003 (Rome and Milan); the dedicated AX line is on EPYC 9004 (Genoa). DigitalOcean's Premium AMD droplets run EPYC Milan and Genoa too, but the Basic droplets (the ones most people are actually paying for) sit on older Skylake and Cascade Lake Xeons.&lt;/p&gt;

&lt;p&gt;From benchmarks I've run myself and cross-checked against VPSBenchmarks: Hetzner CPX is ~25–40% faster on single-core CPU and 2× faster on disk IOPS than a same-priced DigitalOcean Basic droplet. Network throughput within the same datacenter is comparable on both; cross-region latency from Hetzner Falkenstein to a US-East user runs about 110ms, versus ~25ms on a DO NYC droplet.&lt;/p&gt;

&lt;p&gt;The latency number is the one that decides things for anyone outside Europe. If your audience is US-only, the Hetzner US datacenters in Ashburn and Hillsboro are real options now, but they're smaller and the EU-tuned support muscle doesn't fully reach them yet. For a Cyprus or EU-focused product, Falkenstein is the obvious win.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real migration numbers from the last six months
&lt;/h2&gt;

&lt;p&gt;These are from public write-ups by people who actually moved production traffic, not promo posts.&lt;/p&gt;

&lt;p&gt;Isa Yeter &lt;a href="https://isayeter.com/posts/digitalocean-to-hetzner-migration/" rel="noopener noreferrer"&gt;documented a full migration&lt;/a&gt;: 30 MySQL databases (248 GB), 34 Nginx vhosts, GitLab EE, Neo4j, hundreds of thousands of mobile users, going from $1,432/month on DigitalOcean to $233/month on a Hetzner AX162-R dedicated server with 48 cores and 256 GB DDR5. That's the headline $14K/year number making the rounds.&lt;/p&gt;

&lt;p&gt;The Talk Python infrastructure swap reported a similar pattern: a decade on DigitalOcean, then about $1,500/year saved by moving the same workload to Hetzner Cloud. byteiota's writeup landed at 60% off. The shape of the savings is consistent: half to two-thirds off regardless of stack size, because the underlying euro-per-vCPU-per-month math is the same whether you're running one box or twenty.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where DigitalOcean still wins
&lt;/h2&gt;

&lt;p&gt;Migration breakeven depends on more than the raw bill. DigitalOcean's PaaS layer is the part you actually pay for, and Hetzner doesn't have an equivalent.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Managed Databases&lt;/strong&gt;: DO's managed Postgres, MySQL, Redis, MongoDB, and Kafka are turnkey with point-in-time recovery, read replicas, and automatic failover. Hetzner gives you a bare VM and an &lt;code&gt;apt install postgresql&lt;/code&gt; problem.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;App Platform&lt;/strong&gt;: Heroku-style git-push deploys with autoscaling, build pipelines, and edge routing. Hetzner has Cloud Console; you bring your own CI/CD.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Spaces (S3-compatible object storage)&lt;/strong&gt;: Hetzner has Storage Boxes (FTP/SFTP/SMB), which aren't the same thing. If you need S3 semantics in Europe, you're looking at OVHcloud Object Storage, Scaleway, or a self-hosted MinIO on a Hetzner box.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One-click apps and droplet snapshots that work like AMIs&lt;/strong&gt;: DigitalOcean has invested in this for a decade. Hetzner snapshots work but feel less polished.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;24/7 chat support with US business hours coverage&lt;/strong&gt;: DO has it. Hetzner has email tickets and a community forum.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your team is two people and the half-day per month spent on database ops would otherwise go into shipping features, paying ~$60/month for DigitalOcean Managed Postgres on the smallest production tier is a defensible call. If you have a dedicated SRE or you genuinely enjoy &lt;code&gt;pg_basebackup&lt;/code&gt;, Hetzner wins on every other axis.&lt;/p&gt;

&lt;h2&gt;
  
  
  Migration playbook: the zero-downtime version
&lt;/h2&gt;

&lt;p&gt;The pattern that keeps showing up in successful migrations is the same six-phase outline. This is the abridged version; if you are moving real traffic, &lt;a href="https://isayeter.com/posts/digitalocean-to-hetzner-migration/" rel="noopener noreferrer"&gt;Isa Yeter's full guide&lt;/a&gt; is the most thorough recent reference.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Drop DNS TTL to 300s a week before the cutover&lt;/span&gt;
&lt;span class="c"&gt;# (Doing this on day-of is too late — caches lag.)&lt;/span&gt;
dig +short example.com
&lt;span class="c"&gt;# Verify TTL on registrar side, set to 300&lt;/span&gt;

&lt;span class="c"&gt;# 2. Provision the Hetzner box and bring it to parity&lt;/span&gt;
&lt;span class="c"&gt;# OS, packages, configs, secrets, deploy keys&lt;/span&gt;
&lt;span class="c"&gt;# Use a configuration tool you already trust — Ansible, Pulumi, or shell&lt;/span&gt;

&lt;span class="c"&gt;# 3. Set up MySQL/Postgres replication from DO → Hetzner&lt;/span&gt;
&lt;span class="c"&gt;# Old box = primary, new box = replica, async streaming&lt;/span&gt;
&lt;span class="c"&gt;# For MySQL: GTID-based replication&lt;/span&gt;
&lt;span class="c"&gt;# For Postgres: physical or logical replication&lt;/span&gt;

&lt;span class="c"&gt;# 4. Cut traffic by flipping DNS A records&lt;/span&gt;
&lt;span class="c"&gt;# Old box keeps running as a fallback for 24h&lt;/span&gt;

&lt;span class="c"&gt;# 5. Convert the old DO box to a reverse proxy&lt;/span&gt;
&lt;span class="c"&gt;# Anything still hitting the old IP gets forwarded to Hetzner&lt;/span&gt;
&lt;span class="c"&gt;# This handles cached resolvers without dropping a single request&lt;/span&gt;

&lt;span class="c"&gt;# 6. Tear down the DO box after 7 days of clean logs&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two things break in real migrations and never make it into the marketing case studies.&lt;/p&gt;

&lt;p&gt;First, MySQL &lt;code&gt;mysql.user&lt;/code&gt; schemas drift between minor versions, and a 5.7→8.0 jump will fail the slave promotion if you haven't done &lt;code&gt;mysql_upgrade --force&lt;/code&gt; and rebuilt the &lt;code&gt;sys&lt;/code&gt; schema. Test this on a staging copy.&lt;/p&gt;

&lt;p&gt;Second, application users that you granted &lt;code&gt;SUPER&lt;/code&gt; to during some emergency three years ago will quietly bypass &lt;code&gt;read_only = 1&lt;/code&gt; on the replica and write to the wrong master. Check &lt;code&gt;SHOW GRANTS&lt;/code&gt; for every account before you cut traffic, and revoke &lt;code&gt;SUPER&lt;/code&gt; from anything that isn't an admin. The Yeter writeup hit this on 24 accounts.&lt;/p&gt;

&lt;p&gt;GitLab webhooks are the third one if you are running GitLab. They store the absolute IP, not the hostname, and you have to do a bulk API rewrite after the cutover.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cyprus and the EU: latency, residency, and the boring win
&lt;/h2&gt;

&lt;p&gt;Hetzner is a German company with datacenters in Falkenstein, Nuremberg, and Helsinki. From Cyprus, latency to FSN runs about 60–80ms versus 110ms+ to DO Frankfurt. From any EU country, you're getting GDPR-clean data residency by default: no DPA acrobatics, no Standard Contractual Clauses for a US sub-processor, no awkward conversation with your enterprise customer's legal team.&lt;/p&gt;

&lt;p&gt;For startups based in Cyprus, Estonia, Portugal, or anywhere on the Blue Card / digital nomad track, this is a quietly useful side benefit. The EU AI Act and the data sovereignty pieces of the Digital Services Act both nudge companies toward keeping inference and customer data inside the EU. A Falkenstein box is the cheapest way to be compliant on day one without rearchitecting your stack later on.&lt;/p&gt;

&lt;p&gt;You might also like the &lt;a href="https://www.danilchenko.dev/posts/polars-vs-pandas/" rel="noopener noreferrer"&gt;Polars vs Pandas comparison&lt;/a&gt; if you're squeezing more out of a single Hetzner box on a data workload.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is Hetzner cheaper than DigitalOcean?
&lt;/h3&gt;

&lt;p&gt;Yes. Even after the April 1, 2026 price increase of 30–37%, Hetzner cloud servers cost roughly 50–70% less than equivalent DigitalOcean droplets on the same RAM and vCPU. The 4 GB / 2 vCPU tier is €7.99/month on Hetzner versus $24/month on DigitalOcean. Hetzner also includes 20 TB of bandwidth versus 4 TB on DO, which widens the gap further for traffic-heavy sites.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is Hetzner reliable?
&lt;/h3&gt;

&lt;p&gt;In production usage, yes. Hetzner runs three EU datacenters (Falkenstein, Nuremberg, Helsinki) and two US ones (Ashburn, Hillsboro), with a published uptime track record comparable to DigitalOcean. The differences are at the SLA paperwork layer (DigitalOcean publishes a 99.99% SLA, Hetzner's is less prominent) and at the support layer, where Hetzner is email-ticket-only versus DO's chat support.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do you migrate from DigitalOcean to Hetzner with zero downtime?
&lt;/h3&gt;

&lt;p&gt;The proven pattern: drop DNS TTL to 300 seconds a week ahead of the cutover, provision and configure the Hetzner box to full parity, set up MySQL/Postgres replication with the old box as primary, flip DNS, and convert the old box to a reverse proxy for cached-resolver traffic for 24 hours. Tear down the old box only after 7 days of clean logs. Real migrations of 30+ databases have completed in 24 hours with zero downtime using this exact sequence.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why is Hetzner so cheap?
&lt;/h3&gt;

&lt;p&gt;Three reasons. They own and operate their own datacenters in lower-cost regions of Germany and Finland (cheap power, cheap real estate). They run a flat catalog with no managed-service margin layered on top. And they've historically chosen newer-but-cheaper AMD EPYC silicon over the brand-name Intel Xeon parts that hyperscalers default to. After the April 2026 price hike they're still cheaper, just less dramatically so.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is Hetzner good for production workloads?
&lt;/h3&gt;

&lt;p&gt;For self-managed stacks: yes, and a lot of European startups have been on it for years. For workloads that lean heavily on managed services (managed databases, S3-compatible object storage with full API compatibility, autoscaling app platforms, edge networks), DigitalOcean, AWS, or GCP are still the right call. Hetzner is a "you do the ops" platform. That's both why it's cheap and why it isn't for everyone.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does the April 2026 price hike change the migration math?
&lt;/h3&gt;

&lt;p&gt;It compresses the payback period but doesn't eliminate the savings. If you were saving $1,000/month at the old prices, you're saving $700–800/month at the new prices on the same workload. A typical migration that took 40 engineering hours to execute now pays back in 3–5 months instead of 2–3. Still worth it for any stack where the original DO bill is over $200/month.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.hetzner.com/pressroom/statement-price-adjustment/" rel="noopener noreferrer"&gt;Hetzner — Statement on price adjustment as of April 1st 2026&lt;/a&gt; — official announcement&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.tomshardware.com/tech-industry/hetzner-to-raise-prices-by-up-to-37-percent-from-april-1" rel="noopener noreferrer"&gt;Tom's Hardware — German data center giant hikes prices up to 37%&lt;/a&gt; — independent reporting on the price hike&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://isayeter.com/posts/digitalocean-to-hetzner-migration/" rel="noopener noreferrer"&gt;Isa Yeter — DigitalOcean to Hetzner migration: $1,432 to $233/month&lt;/a&gt; — full zero-downtime playbook with real numbers&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://byteiota.com/digitalocean-to-hetzner-14k-saved-60-cost-cut-2026/" rel="noopener noreferrer"&gt;byteiota — DigitalOcean to Hetzner: $14K Saved, 60% Cost Cut (2026)&lt;/a&gt; — second migration story corroborating the savings ratio&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.hetzner.com/cloud" rel="noopener noreferrer"&gt;Hetzner Cloud Pricing&lt;/a&gt; — current per-tier pricing referenced in the comparison table&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.digitalocean.com/pricing/droplets" rel="noopener noreferrer"&gt;DigitalOcean Pricing&lt;/a&gt; — current droplet pricing referenced in the comparison table&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Bottom line
&lt;/h2&gt;

&lt;p&gt;The April 2026 price hike was Hetzner closing a gap that was always going to close: they were too cheap for the global semiconductor cycle they were absorbing. Even at the new prices, the math on a self-managed stack still lands in the same place: half to two-thirds off your DigitalOcean bill, with better silicon and more bandwidth thrown in. The catch: you have to like running your own databases, and you have to be okay with email-only support. If those two things are acceptable, the migration is one of the cleanest infrastructure wins of 2026. If they aren't, pay the DigitalOcean tax and ship features instead.&lt;/p&gt;

</description>
      <category>hetzner</category>
      <category>digitalocean</category>
      <category>cloudhosting</category>
      <category>vps</category>
    </item>
    <item>
      <title>Python 3.14 Free-Threading: Real Benchmarks, Real Breakage, Real Code</title>
      <dc:creator>Maksim Danilchenko</dc:creator>
      <pubDate>Mon, 13 Apr 2026 02:15:25 +0000</pubDate>
      <link>https://dev.to/dmaxdev/python-314-free-threading-real-benchmarks-real-breakage-real-code-3m5</link>
      <guid>https://dev.to/dmaxdev/python-314-free-threading-real-benchmarks-real-breakage-real-code-3m5</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;Python 3.14 makes free-threading officially supported. You get true thread-level parallelism for CPU-bound work, with up to 3.5x speedups on 4 cores. The single-threaded penalty dropped from ~40% in 3.13 to roughly 5-10%. But the library support isn't fully there yet: any C extension that hasn't opted in will silently re-enable the GIL. Here's how to install it, what actually works, and when it's worth the switch.&lt;/p&gt;

&lt;h2&gt;
  
  
  The GIL Is Finally Optional
&lt;/h2&gt;

&lt;p&gt;For over three decades, CPython's Global Interpreter Lock has been the answer to "why can't Python use all my cores?" The GIL ensures only one thread executes Python bytecode at a time. That keeps things simple but means CPU-bound code can't use multiple cores.&lt;/p&gt;

&lt;p&gt;Python 3.13 introduced an experimental free-threaded build. Python 3.14, released October 2025, promoted it to officially supported status via PEP 779. The implementation described in PEP 703 is now complete. Temporary workarounds in the interpreter have been replaced with permanent solutions, and the single-threaded performance hit has been slashed.&lt;/p&gt;

&lt;p&gt;Two things to know upfront:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Free-threading is supported but &lt;strong&gt;not the default build&lt;/strong&gt;. You still have to opt in.&lt;/li&gt;
&lt;li&gt;If you import a C extension that hasn't declared itself thread-safe, the interpreter quietly re-enables the GIL for the entire process. Your threads keep running, but they won't run in parallel.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  How to Install the Free-Threaded Build
&lt;/h2&gt;

&lt;p&gt;The free-threaded interpreter ships as a separate binary: &lt;code&gt;python3.14t&lt;/code&gt; (note the &lt;code&gt;t&lt;/code&gt; suffix).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;With uv (fastest method):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv python &lt;span class="nb"&gt;install &lt;/span&gt;3.14t
uv venv &lt;span class="nt"&gt;--python&lt;/span&gt; 3.14t
&lt;span class="nb"&gt;source&lt;/span&gt; .venv/bin/activate
python &lt;span class="nt"&gt;--version&lt;/span&gt;  &lt;span class="c"&gt;# Python 3.14.x (free-threading build)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you just read our &lt;a href="https://danilchenko.dev/posts/uv-vs-pip-vs-poetry/" rel="noopener noreferrer"&gt;uv vs pip vs Poetry comparison&lt;/a&gt;, you already know uv handles Python version management. The &lt;code&gt;3.14t&lt;/code&gt; variant is a first-class citizen.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;With the official installers (macOS/Windows):&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Download from &lt;a href="https://www.python.org/downloads/release/python-3143/" rel="noopener noreferrer"&gt;python.org/downloads&lt;/a&gt;. On macOS, the installer has an optional checkbox for the free-threaded build. On Windows, use &lt;code&gt;py install 3.14t&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Building from source (Linux):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/python/cpython.git
&lt;span class="nb"&gt;cd &lt;/span&gt;cpython
git checkout v3.14.3
./configure &lt;span class="nt"&gt;--disable-gil&lt;/span&gt; &lt;span class="nt"&gt;--prefix&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$HOME&lt;/span&gt;/.local/python3.14t
make &lt;span class="nt"&gt;-j&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;nproc&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
make &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Verify it works:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_is_gil_enabled&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;  &lt;span class="c1"&gt;# False = free-threading active
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If that prints &lt;code&gt;True&lt;/code&gt;, a C extension re-enabled the GIL. More on that in the breakage section.&lt;/p&gt;

&lt;h2&gt;
  
  
  Benchmarks: The Numbers That Matter
&lt;/h2&gt;

&lt;p&gt;I ran three CPU-bound benchmarks comparing &lt;code&gt;python3.14&lt;/code&gt; (GIL build) and &lt;code&gt;python3.14t&lt;/code&gt; (free-threaded) on a 4-core machine.&lt;/p&gt;

&lt;h3&gt;
  
  
  Test 1: Prime counting
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;concurrent.futures&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ThreadPoolExecutor&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;count_primes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;break&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;bench_threads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;num_threads&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;500_000&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="n"&gt;num_threads&lt;/span&gt;
    &lt;span class="n"&gt;ranges&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;num_threads&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;

    &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;ThreadPoolExecutor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_workers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;num_threads&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;count_primes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;ranges&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;elapsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Threads: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;num_threads&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, Primes: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
          &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Time: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;elapsed&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;s, GIL: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_is_gil_enabled&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="nf"&gt;bench_threads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Test 2: SHA-256 hashing
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;concurrent.futures&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ThreadPoolExecutor&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;hash_work&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;benchmark&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;hashlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sha256&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;hexdigest&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;ThreadPoolExecutor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_workers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hash_work&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;100_000&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;elapsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;4 threads, 400K hashes: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;elapsed&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Results
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;GIL build (1 thread)&lt;/th&gt;
&lt;th&gt;GIL build (4 threads)&lt;/th&gt;
&lt;th&gt;Free-threaded (1 thread)&lt;/th&gt;
&lt;th&gt;Free-threaded (4 threads)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Prime counting (500K)&lt;/td&gt;
&lt;td&gt;2.31s&lt;/td&gt;
&lt;td&gt;2.28s&lt;/td&gt;
&lt;td&gt;2.45s&lt;/td&gt;
&lt;td&gt;0.68s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SHA-256 (400K hashes)&lt;/td&gt;
&lt;td&gt;4.12s&lt;/td&gt;
&lt;td&gt;4.09s&lt;/td&gt;
&lt;td&gt;4.34s&lt;/td&gt;
&lt;td&gt;1.18s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Matrix multiply (pure Python)&lt;/td&gt;
&lt;td&gt;1.87s&lt;/td&gt;
&lt;td&gt;1.85s&lt;/td&gt;
&lt;td&gt;1.98s&lt;/td&gt;
&lt;td&gt;0.57s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;With the GIL, adding threads to CPU-bound Python code does nothing. Free-threaded, you get near-linear scaling up to your core count. The single-threaded overhead (about 6% in my tests) comes from the atomic operations CPython now uses instead of the GIL lock.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Breaks (and the Silent GIL Trap)
&lt;/h2&gt;

&lt;p&gt;When the free-threaded interpreter loads a C extension module that hasn't been marked as safe for concurrent use, it &lt;strong&gt;automatically re-enables the GIL for the entire process&lt;/strong&gt;. There's no warning or error message — your code keeps running, but threads take turns instead of running in parallel.&lt;/p&gt;

&lt;p&gt;You can detect this at runtime:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt;  &lt;span class="c1"&gt;# might re-enable the GIL
&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_is_gil_enabled&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GIL was re-enabled by an extension module&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Free-threading is active&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a backwards-compatibility safeguard. CPython can't know whether an extension's internal state is thread-safe, so it assumes the wrost. Extension authors need to explicitly opt in by setting &lt;code&gt;Py_mod_gil&lt;/code&gt; in their module definition.&lt;/p&gt;

&lt;h3&gt;
  
  
  Library Compatibility Right Now
&lt;/h3&gt;

&lt;p&gt;I checked the &lt;a href="https://py-free-threading.github.io/tracking/" rel="noopener noreferrer"&gt;py-free-threading tracker&lt;/a&gt; and &lt;a href="https://ft-checker.com" rel="noopener noreferrer"&gt;ft-checker.com&lt;/a&gt; in April 2026. Major library status:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Library&lt;/th&gt;
&lt;th&gt;Free-threaded wheels?&lt;/th&gt;
&lt;th&gt;GIL re-enabled?&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;NumPy 2.3+&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Improved in 2.3, still some edge cases&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pandas&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;td&gt;Some operations re-enable GIL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;scikit-learn 1.8+&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Free-threaded wheels on all platforms (ongoing optimization)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SciPy&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;td&gt;Core routines work, some submodules lag&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Matplotlib&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Plotting re-enables GIL (expected, not thread-safe)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PyArrow&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Good support since 18.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pydantic&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Works with free-threaded builds since v2.11&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FastAPI / Uvicorn&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Mostly no&lt;/td&gt;
&lt;td&gt;ASGI event loop + threads works&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;requests&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;I/O-bound, GIL irrelevant anyway&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SQLAlchemy&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;td&gt;Connection pools need care&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The Quansight Labs team and Meta's Python runtime group have been doing the heavy lifting on library compatibility. But if your stack includes niche C extensions — custom Cython modules or anything with hand-written CPython API calls — test before you deploy.&lt;/p&gt;

&lt;h2&gt;
  
  
  When Free-Threading Actually Helps
&lt;/h2&gt;

&lt;p&gt;Free-threading shines when your bottleneck is CPU-bound Python code running across multiple cores. Good use cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data processing pipelines where you transform chunks in parallel&lt;/li&gt;
&lt;li&gt;Pure-Python numerical computation (though you should probably use NumPy)&lt;/li&gt;
&lt;li&gt;Web servers handling CPU-heavy request processing alongside async I/O&lt;/li&gt;
&lt;li&gt;AI inference preprocessing: tokenization, feature extraction across batches&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It doesn't help when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your code is I/O-bound (async/await is still the right tool)&lt;/li&gt;
&lt;li&gt;You're already using NumPy/pandas for the heavy lifting (those release the GIL internally)&lt;/li&gt;
&lt;li&gt;Your C extensions re-enable the GIL anyway&lt;/li&gt;
&lt;li&gt;You need isolation between workers (use &lt;code&gt;multiprocessing&lt;/code&gt; or the new &lt;code&gt;InterpreterPoolExecutor&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The New InterpreterPoolExecutor
&lt;/h3&gt;

&lt;p&gt;Python 3.14 also shipped &lt;code&gt;concurrent.futures.InterpreterPoolExecutor&lt;/code&gt; (PEP 734). Each worker gets its own interpreter with isolated state: no shared memory, no GIL contention. Think of it as a lighter-weight &lt;code&gt;multiprocessing&lt;/code&gt; without the serialization overhead of IPC.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;concurrent.futures&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;InterpreterPoolExecutor&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;cpu_work&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;InterpreterPoolExecutor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_workers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cpu_work&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;10_000_000&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a better fit when you need true isolation. No worrying about thread safety at all.&lt;/p&gt;

&lt;h2&gt;
  
  
  Other Python 3.14 Features Worth Knowing
&lt;/h2&gt;

&lt;p&gt;Free-threading gets the headlines, but 3.14 packed in several other changes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Template strings (PEP 750)&lt;/strong&gt; let you write &lt;code&gt;t"Hello {name}"&lt;/code&gt; — like f-strings but for custom processing. Build SQL queries, HTML templates, and log messages with proper escaping.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Deferred annotation evaluation (PEP 649)&lt;/strong&gt; means annotations are no longer eagerly evaluated. Forward references just work. If you've ever fought &lt;code&gt;from __future__ import annotations&lt;/code&gt;, this fixes it properly.&lt;/p&gt;

&lt;p&gt;There's also &lt;code&gt;compression.zstd&lt;/code&gt; in the stdlib &lt;strong&gt;(PEP 784)&lt;/strong&gt; — Zstd compresses faster than gzip at similar ratios. And official macOS/Windows binaries now include a &lt;strong&gt;copy-and-patch JIT compiler&lt;/strong&gt;. Early days, but it shows where CPython is headed.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is Python 3.14 free-threading production-ready?
&lt;/h3&gt;

&lt;p&gt;For CPU-bound workloads where you control the dependency stack, yes. For complex applications with many C extensions, test thoroughly. The "officially supported" label means CPython commits to maintaining it, but third-party library coverage is still catching up.&lt;/p&gt;

&lt;h3&gt;
  
  
  Will free-threading become the default?
&lt;/h3&gt;

&lt;p&gt;PEP 703 laid out a three-phase plan. Phase 1 (experimental, 3.13) and Phase 2 (supported, 3.14) are done. Phase 3 would make free-threading the default build, but no specific version has been committed to. The timeline depends on how fast libraries adopt free-threaded builds.&lt;/p&gt;

&lt;h3&gt;
  
  
  How much slower is single-threaded code?
&lt;/h3&gt;

&lt;p&gt;About 5-10% compared to the GIL build, down from ~40% in 3.13. The overhead comes from atomic reference counting and per-object locks that replace the GIL's coarse-grained protection.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I use free-threading with Django/Flask?
&lt;/h3&gt;

&lt;p&gt;Yes, with caveats. ASGI servers like Uvicorn can benefit from mixed async + thread workloads. But web frameworks rarely bottleneck on CPU-bound Python code. Most of the time is spent waiting on databases and external APIs. Profile before optimizing.&lt;/p&gt;

&lt;h3&gt;
  
  
  What happens if I mix free-threaded and GIL-requiring packages?
&lt;/h3&gt;

&lt;p&gt;The GIL gets re-enabled for the whole process. You won't get an error. Your code just runs single-threaded like regular Python. Check &lt;code&gt;sys._is_gil_enabled()&lt;/code&gt; after imports to verify.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bottom Line
&lt;/h2&gt;

&lt;p&gt;The GIL removal is real, and it works. I've been running CPU-bound batch jobs on &lt;code&gt;python3.14t&lt;/code&gt; for a few months now, and the multi-core speedups are exactly what Python has needed for decades. The 6% single-threaded overhead is a reasonable trade.&lt;/p&gt;

&lt;p&gt;But don't rip out your &lt;code&gt;multiprocessing&lt;/code&gt; code just yet. Most libraries need another 6-12 months before most developers can switch without hitting the silent GIL re-enable. Check your deps with &lt;code&gt;sys._is_gil_enabled()&lt;/code&gt;, verify with the compatibility tracker, and start with isolated workloads where you control the stack.&lt;/p&gt;

&lt;p&gt;Free-threading works. Libraries just need time to catch up.&lt;/p&gt;

</description>
      <category>python</category>
      <category>freethreading</category>
      <category>gil</category>
      <category>concurrency</category>
    </item>
    <item>
      <title>How to Run Gemma 4 Locally With Ollama, llama.cpp, and vLLM</title>
      <dc:creator>Maksim Danilchenko</dc:creator>
      <pubDate>Sat, 11 Apr 2026 22:40:26 +0000</pubDate>
      <link>https://dev.to/dmaxdev/how-to-run-gemma-4-locally-with-ollama-llamacpp-and-vllm-3n44</link>
      <guid>https://dev.to/dmaxdev/how-to-run-gemma-4-locally-with-ollama-llamacpp-and-vllm-3n44</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;Google Gemma 4 dropped on April 2 under Apache 2.0 and it's genuinely good: the 31B dense model hit #3 on the Arena AI leaderboard, beating models 20x its size. You can run it locally with Ollama in about two minutes, or go the llama.cpp / vLLM route if you want more control. But there are real bugs right now, especially on Apple Silicon and with tool calling. This guide covers all three options, what hardware you actually need, and the workarounds for the issues I've hit so far.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Gemma 4 Is Worth Running Locally
&lt;/h2&gt;

&lt;p&gt;I've been running local models since the Llama 2 days, and Gemma 4 is the first time an open model has made me reconsider whether I need API access to frontier models for everyday coding tasks.&lt;/p&gt;

&lt;p&gt;Look at the benchmarks. Gemma 4 31B scores 89.2% on AIME 2026 (math), 80.0% on LiveCodeBench v6 (coding), and 84.3% on GPQA Diamond (science). Gemma 3 scored 20.8%, 29.1%, and 42.4% on those same tests. Every metric roughly tripled in one generation.&lt;/p&gt;

&lt;p&gt;The family comes in four sizes:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Parameters&lt;/th&gt;
&lt;th&gt;Active Params&lt;/th&gt;
&lt;th&gt;Min VRAM (Q4)&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;E2B&lt;/td&gt;
&lt;td&gt;2.3B&lt;/td&gt;
&lt;td&gt;2.3B&lt;/td&gt;
&lt;td&gt;~1.5 GB&lt;/td&gt;
&lt;td&gt;Mobile, Raspberry Pi&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;E4B&lt;/td&gt;
&lt;td&gt;4.5B&lt;/td&gt;
&lt;td&gt;4.5B&lt;/td&gt;
&lt;td&gt;~3 GB&lt;/td&gt;
&lt;td&gt;Quick local tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;26B MoE&lt;/td&gt;
&lt;td&gt;26B&lt;/td&gt;
&lt;td&gt;3.8B&lt;/td&gt;
&lt;td&gt;~14 GB&lt;/td&gt;
&lt;td&gt;Best bang per VRAM GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;31B Dense&lt;/td&gt;
&lt;td&gt;31B&lt;/td&gt;
&lt;td&gt;31B&lt;/td&gt;
&lt;td&gt;~18 GB&lt;/td&gt;
&lt;td&gt;Maximum quality&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The 26B MoE model is the sleeper hit here. It only activates 3.8B parameters per token but delivers reasoning quality close to the full 31B, and it fits in 14 GB of VRAM at Q4 quantization. If you're on a 16 GB GPU or a MacBook Pro with 18 GB unified memory, go with that one.&lt;/p&gt;

&lt;p&gt;All four variants ship under Apache 2.0. No usage restrictions, no commercial limitations, no weird "you can't use this to compete with Google" clauses that plagued earlier open model releases. (If you're on a Mac and want to explore Apple's built-in local AI too, see my &lt;a href="https://danilchenko.dev/posts/2026-04-06-apfel-review-free-local-ai-mac/" rel="noopener noreferrer"&gt;Apfel review&lt;/a&gt; — different beast, but it's free and already on your machine.)&lt;/p&gt;

&lt;h2&gt;
  
  
  Option 1: Ollama (Easiest)
&lt;/h2&gt;

&lt;p&gt;Ollama is the fastest way to get Gemma 4 running. Two commands and you're chatting.&lt;/p&gt;

&lt;h3&gt;
  
  
  Install Ollama
&lt;/h3&gt;

&lt;p&gt;On macOS:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;brew &lt;span class="nb"&gt;install &lt;/span&gt;ollama
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On Linux:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://ollama.com/install.sh | sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On Windows, download the installer from ollama.com.&lt;/p&gt;

&lt;p&gt;You need Ollama v0.20.0 or later for Gemma 4 support. Check with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pull and Run a Model
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# The 26B MoE — best quality-to-VRAM ratio&lt;/span&gt;
ollama run gemma4:26b

&lt;span class="c"&gt;# The small but capable 4B&lt;/span&gt;
ollama run gemma4:4b

&lt;span class="c"&gt;# The full 31B dense (need 20+ GB VRAM)&lt;/span&gt;
ollama run gemma4:31b

&lt;span class="c"&gt;# Tiny model for edge devices&lt;/span&gt;
ollama run gemma4:2b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Ollama handles downloading the GGUF, quantization selection, and memory management automatically. By default it picks a quantization that fits your available memory.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pick Your Quantization
&lt;/h3&gt;

&lt;p&gt;If you want more control over the quality/memory tradeoff:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Higher quality, more memory&lt;/span&gt;
ollama run gemma4:26b-q8_0

&lt;span class="c"&gt;# Lower memory, slightly less quality&lt;/span&gt;
ollama run gemma4:26b-q4_K_M

&lt;span class="c"&gt;# Middle ground&lt;/span&gt;
ollama run gemma4:26b-q5_K_M
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For the 31B model, Q4_K_M is the sweet spot. It keeps quality high while fitting in ~18 GB. Going to Q8 pushes you to ~28 GB, which means you need a 32 GB GPU or Mac with 32+ GB unified memory.&lt;/p&gt;

&lt;h3&gt;
  
  
  Use the API
&lt;/h3&gt;

&lt;p&gt;Ollama exposes an OpenAI-compatible API on port 11434:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://localhost:11434/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "gemma4:26b",
    "messages": [{"role": "user", "content": "Write a Python function to merge two sorted arrays"}]
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works with any OpenAI SDK client. Just point the base URL to &lt;code&gt;http://localhost:11434/v1&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:11434/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ollama&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# required but ignored
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemma4:26b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Explain quicksort in 3 sentences&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Known Ollama Issues (April 2026)
&lt;/h3&gt;

&lt;p&gt;I'm flagging these because they burned me:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Tool calling is broken in Ollama v0.20.0. The tool call parser crashes, and streaming drops tool calls entirely. If you need function calling, use vLLM instead for now.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If you're on an M-series Mac, don't set &lt;code&gt;OLLAMA_FLASH_ATTENTION=1&lt;/code&gt;. The 31B model will hang once your prompt exceeds ~500 tokens. Ollama's defaults work fine without it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Some general knowledge prompts cause the model to spit out an infinite stream of &lt;code&gt;&amp;lt;unused24&amp;gt;&lt;/code&gt; tokens. Tokenizer bug. If it happens, stop generation and rephrase your prompt. A fix is being tracked in llama.cpp issue #21321.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Option 2: llama.cpp (More Control)
&lt;/h2&gt;

&lt;p&gt;If you want raw performance, custom quantization, or you're deploying on hardware Ollama doesn't support well, llama.cpp gives you full control.&lt;/p&gt;

&lt;h3&gt;
  
  
  Build llama.cpp
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/ggerganov/llama.cpp
&lt;span class="nb"&gt;cd &lt;/span&gt;llama.cpp
cmake &lt;span class="nt"&gt;-B&lt;/span&gt; build &lt;span class="nt"&gt;-DGGML_CUDA&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ON  &lt;span class="c"&gt;# or -DGGML_METAL=ON for Mac&lt;/span&gt;
cmake &lt;span class="nt"&gt;--build&lt;/span&gt; build &lt;span class="nt"&gt;--config&lt;/span&gt; Release &lt;span class="nt"&gt;-j&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;nproc&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For CPU-only (no GPU acceleration):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cmake &lt;span class="nt"&gt;-B&lt;/span&gt; build
cmake &lt;span class="nt"&gt;--build&lt;/span&gt; build &lt;span class="nt"&gt;--config&lt;/span&gt; Release &lt;span class="nt"&gt;-j&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;nproc&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Download a GGUF Model
&lt;/h3&gt;

&lt;p&gt;Grab a pre-quantized model from Hugging Face. Unsloth provides well-tested GGUFs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 31B Q4_K_M — ~18 GB, good quality&lt;/span&gt;
huggingface-cli download unsloth/gemma-4-31B-it-GGUF &lt;span class="se"&gt;\&lt;/span&gt;
  gemma-4-31B-it-Q4_K_M.gguf &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--local-dir&lt;/span&gt; ./models

&lt;span class="c"&gt;# 26B MoE Q4_K_M — ~14 GB&lt;/span&gt;
huggingface-cli download unsloth/gemma-4-26B-MoE-it-GGUF &lt;span class="se"&gt;\&lt;/span&gt;
  gemma-4-26B-MoE-it-Q4_K_M.gguf &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--local-dir&lt;/span&gt; ./models
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Run Inference
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./build/bin/llama-cli &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-m&lt;/span&gt; ./models/gemma-4-31B-it-Q4_K_M.gguf &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s2"&gt;"Write a Rust function that implements a thread-safe LRU cache"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-n&lt;/span&gt; 512 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-ngl&lt;/span&gt; 99  &lt;span class="c"&gt;# offload all layers to GPU&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;-ngl 99&lt;/code&gt; flag offloads all layers to your GPU. If you don't have enough VRAM, lower this number and llama.cpp will split layers between GPU and CPU. For the 31B Q4 model, I'd start with &lt;code&gt;-ngl 40&lt;/code&gt; on a 16 GB GPU and adjust from there.&lt;/p&gt;

&lt;h3&gt;
  
  
  Run as a Server
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./build/bin/llama-server &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-m&lt;/span&gt; ./models/gemma-4-31B-it-Q4_K_M.gguf &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--host&lt;/span&gt; 0.0.0.0 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--port&lt;/span&gt; 8080 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-ngl&lt;/span&gt; 99 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-c&lt;/span&gt; 8192
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives you an OpenAI-compatible API at &lt;code&gt;http://localhost:8080/v1&lt;/code&gt;. Same client code as the Ollama example above, just change the port.&lt;/p&gt;

&lt;h3&gt;
  
  
  Performance Tips for llama.cpp
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Gemma 4 advertises 256K context, but on consumer hardware you're realistically looking at ~20K tokens before memory pressure kills throughput. Qwen 3.5 27B manages ~190K on the same hardware, a 10x difference. Set &lt;code&gt;-c&lt;/code&gt; conservatively. (Compression techniques like &lt;a href="https://danilchenko.dev/posts/2026-03-27-google-turboquant-llm-compression-6x-zero-accuracy-loss/" rel="noopener noreferrer"&gt;Google's TurboQuant&lt;/a&gt; may help here eventually.)&lt;/li&gt;
&lt;li&gt;On Mac, use &lt;code&gt;-DGGML_METAL=ON&lt;/code&gt; during build. Metal acceleration gives 2-3x speedup over CPU on M-series chips.&lt;/li&gt;
&lt;li&gt;Increasing &lt;code&gt;-b&lt;/code&gt; (batch size) can improve throughput for server workloads. I use &lt;code&gt;-b 512&lt;/code&gt; for my setup.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Option 3: vLLM (Production Serving)
&lt;/h2&gt;

&lt;p&gt;vLLM is the right choice if you're serving Gemma 4 to multiple users or building it into a production pipeline. It handles batching, paged attention, and continous batching automatically.&lt;/p&gt;

&lt;h3&gt;
  
  
  Install and Run
&lt;/h3&gt;

&lt;p&gt;The easiest path is Docker:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;--gpus&lt;/span&gt; all &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-v&lt;/span&gt; ~/.cache/huggingface:/root/.cache/huggingface &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-p&lt;/span&gt; 8000:8000 &lt;span class="se"&gt;\&lt;/span&gt;
  vllm/vllm-openai:latest &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--model&lt;/span&gt; google/gemma-4-31b-it &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--max-model-len&lt;/span&gt; 8192 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--gpu-memory-utilization&lt;/span&gt; 0.9
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or install directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;vllm&amp;gt;&lt;span class="o"&gt;=&lt;/span&gt;0.20.0
vllm serve google/gemma-4-31b-it &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--max-model-len&lt;/span&gt; 8192 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--gpu-memory-utilization&lt;/span&gt; 0.9
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This starts an OpenAI-compatible API on port 8000.&lt;/p&gt;

&lt;h3&gt;
  
  
  The vLLM Performance Bug
&lt;/h3&gt;

&lt;p&gt;Fair warning: there's a known performance issue with Gemma 4 on vLLM right now. The E4B model generates at only ~9 tokens/s on an RTX 4090. That's terrible for a 4B parameter model.&lt;/p&gt;

&lt;p&gt;The root cause is Gemma 4's hybrid attention architecture. It uses 50 sliding-window layers plus 10 global attention layers, each with different head dimensions. vLLM's FlashAttention implementation can't handle this dual-dimension layout, so it falls back to a much slower Triton attention kernel.&lt;/p&gt;

&lt;p&gt;The vLLM team is tracking this in issue #38887. Until it's fixed, you'll get better throughput from llama.cpp for single-user workloads. vLLM still wins when you're serving multiple concurrent users because of its batching, but the per-request latency is worse than it should be.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multi-GPU Setup
&lt;/h3&gt;

&lt;p&gt;For the 31B model on multiple GPUs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;vllm serve google/gemma-4-31b-it &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--tensor-parallel-size&lt;/span&gt; 2 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--max-model-len&lt;/span&gt; 16384 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--gpu-memory-utilization&lt;/span&gt; 0.9
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two 16 GB GPUs can serve the 31B model comfortably at BF16, which avoids any quantization quality loss.&lt;/p&gt;

&lt;h2&gt;
  
  
  Which Model Should You Pick?
&lt;/h2&gt;

&lt;p&gt;After a week of running all four variants, here's my take:&lt;/p&gt;

&lt;p&gt;Most people should start with the 26B MoE. It activates only 3.8B parameters but delivers 82.3% on GPQA and 77.1% on LiveCodeBench. It fits on a single 16 GB GPU at Q4. For coding assistance, general Q&amp;amp;A, and document analysis, it handles all of those well.&lt;/p&gt;

&lt;p&gt;The 31B dense is worth the VRAM if you have it. The jump from 26B MoE to 31B dense is noticeable on hard math and complex multi-step reasoning. If you have 24 GB VRAM (RTX 3090/4090) or 32+ GB unified memory on a Mac, run this one.&lt;/p&gt;

&lt;p&gt;I reach for the E4B when I want speed. Quick code completions, simple questions where I want sub-second responses. At ~3 GB VRAM, it runs comfortably alongside everything else on my machine.&lt;/p&gt;

&lt;p&gt;The E2B? It runs on a Raspberry Pi, which is cool, but the quality gap to E4B is too large for anything beyond simple tasks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hardware Cheat Sheet
&lt;/h2&gt;

&lt;p&gt;Here's what actually works based on my testing and community reports:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Hardware&lt;/th&gt;
&lt;th&gt;Best Model&lt;/th&gt;
&lt;th&gt;Quantization&lt;/th&gt;
&lt;th&gt;Tokens/s&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;RTX 4090 (24 GB)&lt;/td&gt;
&lt;td&gt;31B Dense&lt;/td&gt;
&lt;td&gt;Q4_K_M&lt;/td&gt;
&lt;td&gt;~35 t/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RTX 3090 (24 GB)&lt;/td&gt;
&lt;td&gt;31B Dense&lt;/td&gt;
&lt;td&gt;Q4_K_M&lt;/td&gt;
&lt;td&gt;~25 t/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RTX 4070 Ti (16 GB)&lt;/td&gt;
&lt;td&gt;26B MoE&lt;/td&gt;
&lt;td&gt;Q4_K_M&lt;/td&gt;
&lt;td&gt;~30 t/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mac M3 Pro (18 GB)&lt;/td&gt;
&lt;td&gt;26B MoE&lt;/td&gt;
&lt;td&gt;Q4_K_M&lt;/td&gt;
&lt;td&gt;~15 t/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mac M2 Ultra (64 GB)&lt;/td&gt;
&lt;td&gt;31B Dense&lt;/td&gt;
&lt;td&gt;Q8_0&lt;/td&gt;
&lt;td&gt;~20 t/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RTX 3060 (12 GB)&lt;/td&gt;
&lt;td&gt;E4B&lt;/td&gt;
&lt;td&gt;Q8_0&lt;/td&gt;
&lt;td&gt;~45 t/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Raspberry Pi 5 (8 GB)&lt;/td&gt;
&lt;td&gt;E2B&lt;/td&gt;
&lt;td&gt;Q4&lt;/td&gt;
&lt;td&gt;~3 t/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These numbers are from llama.cpp with full GPU offloading. Ollama performance is within 5-10% of these.&lt;/p&gt;

&lt;h2&gt;
  
  
  Connecting Gemma 4 to Your Editor
&lt;/h2&gt;

&lt;p&gt;Once you have a local Gemma 4 instance running (Ollama, llama.cpp server, or vLLM), you can use it as a coding assistant in most editors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;VS Code with Continue:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"models"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Gemma 4 26B Local"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ollama"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"gemma4:26b"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Neovim with avante.nvim or codecompanion.nvim:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Point the OpenAI-compatible endpoint to your local server. Both plugins accept a custom base URL.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Any tool that supports OpenAI API:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;Base URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://localhost:11434/v1  (Ollama)&lt;/span&gt;
&lt;span class="na"&gt;Base URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://localhost:8080/v1  (llama.cpp)&lt;/span&gt;
&lt;span class="na"&gt;Base URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://localhost:8000/v1  (vLLM)&lt;/span&gt;
&lt;span class="na"&gt;API Key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;not-needed"&lt;/span&gt; &lt;span class="s"&gt;(any string works)&lt;/span&gt;
&lt;span class="na"&gt;Model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gemma4:26b&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  How much VRAM do I need to run Gemma 4?
&lt;/h3&gt;

&lt;p&gt;It depends on the model variant. The E2B runs in under 1.5 GB. The E4B needs about 3 GB at Q4. The 26B MoE needs ~14 GB at Q4. The 31B dense needs ~18 GB at Q4_K_M. On Macs, unified memory counts as VRAM, so a 16 GB MacBook can run the 26B MoE.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I run Gemma 4 on CPU only?
&lt;/h3&gt;

&lt;p&gt;Yes, but it's slow. llama.cpp supports CPU inference natively. Expect 2-5 tokens per second for the 26B model on a modern desktop CPU. The E4B at ~8-12 tokens per second on CPU is usable for simple tasks.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is Gemma 4 better than Llama 3 for coding?
&lt;/h3&gt;

&lt;p&gt;On LiveCodeBench v6, Gemma 4 31B scores 80.0% versus Llama 3.3 70B's score in the low 60s. Gemma 4 is smaller and faster while producing better code. The 26B MoE at 77.1% also beats Llama 3.3 70B while using a fraction of the memory. And with &lt;a href="https://danilchenko.dev/posts/2026-04-08-meta-muse-spark-alexandr-wang-first-model/" rel="noopener noreferrer"&gt;Meta pivoting toward closed models with Muse Spark&lt;/a&gt;, Gemma 4 might be the best open alternative for a while.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does Gemma 4 support vision and audio?
&lt;/h3&gt;

&lt;p&gt;The E2B and E4B variants support multimodal input: images and audio. The larger 26B and 31B models are text-only. If you need local vision capabilities, the E4B is your best option in the Gemma 4 family.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why is Gemma 4 tool calling broken in Ollama?
&lt;/h3&gt;

&lt;p&gt;Gemma 4's hybrid attention architecture (mixing sliding-window and global attention layers with different head dimensions) exposed bugs in Ollama's tool call parser and streaming implementation. The Ollama team is working on a fix. For now, use vLLM or raw llama.cpp if you need function calling.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bottom Line
&lt;/h2&gt;

&lt;p&gt;I've tried every major open model release since Llama 2, and Gemma 4's 26B MoE is the first one where I stopped reaching for API keys during normal coding work. 14 GB of VRAM, no license restrictions, and benchmark scores that would've been frontier-tier eighteen months ago. The tooling has rough edges right now. Tool calling in Ollama is broken, vLLM has a performance regression, and Apple Silicon users need to dodge a Flash Attention bug. Those will get fixed. The model quality won't go backwards. Start with &lt;code&gt;ollama run gemma4:26b&lt;/code&gt; and see where it gets you.&lt;/p&gt;

</description>
      <category>gemma4</category>
      <category>ollama</category>
      <category>llamacpp</category>
      <category>vllm</category>
    </item>
  </channel>
</rss>
