<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: AI Explore</title>
    <description>The latest articles on DEV Community by AI Explore (@aiexplore369zoho).</description>
    <link>https://dev.to/aiexplore369zoho</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4006822%2Ff413777a-0ac2-47e6-a213-9bb7bf701085.png</url>
      <title>DEV Community: AI Explore</title>
      <link>https://dev.to/aiexplore369zoho</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/aiexplore369zoho"/>
    <language>en</language>
    <item>
      <title>samkhya v1.0: Plug Claude, GPT-4o-mini, or Local Ollama Into Your SQL Query Optimizer</title>
      <dc:creator>AI Explore</dc:creator>
      <pubDate>Sun, 28 Jun 2026 18:45:39 +0000</pubDate>
      <link>https://dev.to/aiexplore369zoho/samkhya-v10-plug-claude-gpt-4o-mini-or-local-ollama-into-your-sql-query-optimizer-3hfb</link>
      <guid>https://dev.to/aiexplore369zoho/samkhya-v10-plug-claude-gpt-4o-mini-or-local-ollama-into-your-sql-query-optimizer-3hfb</guid>
      <description>&lt;p&gt;TL;DR&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;samkhya v1.0 ships the first LLM-pluggable corrector backend for an embedded SQL query optimizer.&lt;/strong&gt; Plug &lt;strong&gt;Anthropic Claude&lt;/strong&gt; (claude-opus-4-7, claude-sonnet-4-6), &lt;strong&gt;OpenAI GPT-4o-mini&lt;/strong&gt;, or &lt;strong&gt;local Ollama&lt;/strong&gt; (llama3.2:1b) into the cardinality-estimation slot of DataFusion, DuckDB, or Polars via a 4-line HTTP wire contract. Two reference servers ship in the box: Python FastAPI (canonical) and Node TypeScript (broader operator appeal). It's all wrapped in a provable safety envelope so a hallucinating LLM cannot make your plan worse than the engine's native estimate. Apache-2.0, sole author. 13-crate Rust workspace.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;cargo add samkhya-core --features llm_http&lt;/code&gt; · &lt;code&gt;pip install samkhya&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/singhpratech/samkhya" rel="noopener noreferrer"&gt;github.com/singhpratech/samkhya&lt;/a&gt;— 10 crates on crates.io, Python wheel on PyPI&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Companion &lt;a href="https://dev.to/publications/samkhya-portable-feedback-driven-cardinality-correction-embedded-analytics"&gt;technical paper&lt;/a&gt; + &lt;a href="https://dev.to/publications/gpudb-gpu-resident-execution-engine-duckdb-cuda-metal"&gt;gpudb sibling release&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Measured numbers, all reported honestly.&lt;/strong&gt; &lt;strong&gt;40.95×&lt;/strong&gt; LpJoinBound tightness over AGM on the star-5 join family at p=1 (Wilcoxon p=1.73×10⁻⁶, all 30 cells dominated). LLM transport-floor P95 &lt;strong&gt;0.07–0.11 ms&lt;/strong&gt; across batch sizes 1/4/8/16/32— H1-A PASS, the LLM plug works. JOB-Slow head-to-head vs unmodified DataFusion 46 came in at &lt;strong&gt;1.038×&lt;/strong&gt; geomean wallclock (BCa 95% CI [1.026, 1.056], Wilcoxon p=3×10⁻⁶, 17 wins / 38 ties / 0 losses)— pre-registered ≥1.35× target &lt;strong&gt;falsified on magnitude&lt;/strong&gt;. Never-regress holds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Honestly projected, not yet measured.&lt;/strong&gt; Live-LLM end-to-end latency cells (Claude, GPT-4o-mini, Ollama) are documented as PROJECTED pending API-key budget and a 30-trial campaign run. The &lt;em&gt;mechanism&lt;/em&gt; is shipped and measured at the transport floor; the &lt;em&gt;headline magnitudes&lt;/em&gt; are paper-projections sized from public per-token latencies.&lt;/p&gt;

&lt;p&gt;Every embedded analytical engine on your laptop right now is committing the same crime in slow motion. DuckDB does it. DataFusion does it. Polars does it. ClickHouse-local does it. My own GPU engine &lt;a href="https://dev.to/publications/gpudb-gpu-resident-execution-engine-duckdb-cuda-metal"&gt;gpudb&lt;/a&gt; does it. The crime is this: every time the process dies, eight years of incredible cardinality-estimation research dies with it, and the next session starts over from zero. The HyperLogLog your midnight ELT job built? Gone. The Bloom filter that knew exactly which customer IDs lived in that 400 GB Parquet partition? Gone. The histogram that took twelve seconds to compute over the join key? Gone. The optimizer wakes up at 9 a.m. with no memory of anything it ever learned and goes scanning the same columns again.&lt;/p&gt;

&lt;p&gt;That waste isn't a bug. It's the inevitable consequence of a missing library— a library so obvious in hindsight that the absence is almost embarrassing. Apache DataSketches gave the world sketches but never wired them to a query optimizer. Iceberg's Puffin specification gave us a sidecar format but no producer/consumer library to fill it. AQO, the PostgreSQL-only adaptive estimator, has feedback but no portability between engines. The three pieces have been sitting on three different shelves for years, and nobody has bolted them together.&lt;/p&gt;

&lt;p&gt;I shipped &lt;strong&gt;samkhya v1.0.0&lt;/strong&gt; today to bolt them together. And to do four other things while I was at it.&lt;/p&gt;

&lt;p&gt;Repository: &lt;a href="https://github.com/singhpratech/samkhya" rel="noopener noreferrer"&gt;github.com/singhpratech/samkhya&lt;/a&gt;. Technical paper: &lt;a href="https://dev.to/publications/samkhya-portable-feedback-driven-cardinality-correction-embedded-analytics"&gt;/publications/samkhya-portable…&lt;/a&gt;. This is the engineering tour.&lt;/p&gt;

&lt;h2&gt;
  
  
  The five-layer stack, in plain English
&lt;/h2&gt;

&lt;p&gt;samkhya is सांख्य— Sanskrit for &lt;em&gt;"enumeration, counting"&lt;/em&gt;— the name of the classical darshana whose entire discipline is counting reality's constituents honestly. The library has exactly that job. It is a 13-crate Cargo workspace under a single Apache-2.0 license with the explicit §3 patent grant, structured as five replaceable layers, each failing safely toward the engine's native plan when the layer above is missing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 1— portable stats.&lt;/strong&gt; Five classical sketch families ship in v1.0: HyperLogLog (precisions 4–18, measured RSE 0.676% at p=14 / n=10⁶— comfortably below the Flajolet 2007 0.8125% envelope); Bloom filters via Kirsch-Mitzenmacher double hashing; Count-Min; equi-depth histograms; and a 2D correlated histogram that captures the pairwise column dependencies the four scalar sketches miss. Each sketch carries a stable &lt;code&gt;KIND&lt;/code&gt; tag (&lt;code&gt;samkhya.hll-v1&lt;/code&gt;, &lt;code&gt;samkhya.bloom-v1&lt;/code&gt;) and a &lt;code&gt;to_bytes&lt;/code&gt; / &lt;code&gt;from_bytes&lt;/code&gt; serialization contract. A Puffin sidecar produced by the Python wheel is &lt;em&gt;byte-identical&lt;/em&gt; to one produced by the DuckDB extension and fully readable by the DataFusion adapter. That byte-identity is the moat. No engine owns the stats. The sidecar does.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 2— feedback recorder.&lt;/strong&gt; The recorder hooks query execution at the adapter boundary, captures &lt;code&gt;(plan template, estimated rows, actual rows)&lt;/code&gt; triples, and writes them to a SQLite sidecar keyed by template. A per-template residual model— a gradient-boosted tree under 100 KB on disk in the default backend— learns the systematic bias between what the planner thought and what actually happened, then surfaces the correction as a &lt;em&gt;hint&lt;/em&gt;. This is the observe-and-hint pattern from Stillger's LEO (IBM, 2001), Marcus's Bao (SIGMOD 2021 Best Paper), and Anneser's AutoSteer (VLDB 2023)— the only learned-QO pattern with documented production deployment, full stop. Cold start sees the native plan. The recorder fires only when it has evidence.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 3— LpJoinBound. The never-regress guarantee, made provable.&lt;/strong&gt; This is the non-negotiable contract every other layer must honour. Every corrected estimate is bounded from above by a pessimistic ceiling derived from LpBound (Zhang et al., SIGMOD 2025 Best Paper)— LP relaxation over ℓp-norms of degree sequences, no machine learning involved. The samkhya refinement is strictly tighter than the Atserias-Grohe-Marx AGM bound (PODS 2008) on the star-5 join family at p=1. Not approximately tighter. Strictly tighter on every cell of the 30-cell evaluation grid. Wilcoxon signed-rank test gives W=0, p=1.73×10⁻⁶— a complete dominance result. Translated to wallclock: &lt;strong&gt;40.95× speedup over native DataFusion 46 LpBound tightness&lt;/strong&gt;, BCa 95% CI [30.93, 47.45]. A correction that breaches the LpJoinBound ceiling is rejected and the native estimate is used in its place. The worst case is silently degenerate. Never catastrophic. That is the whole point.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 4— GPU batch inference (opt-in, via gpudb).&lt;/strong&gt; Subplan enumeration is embarrassingly parallel: each candidate is an independent forward pass through a small GBT or PFN. CPU does this serially. The GPU collapses what would be a thousand-iteration loop into one CUDA or Apple Silicon Metal kernel launch. When samkhya is paired with my gpudb extension, the correction model scores thousands of subplan candidates in a single launch. Strictly opt-in: the default &lt;code&gt;cargo build --release --workspace&lt;/code&gt; links no CUDA, no Metal, no GPU runtime of any kind.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 5— the pluggable Corrector backend. This is the headline.&lt;/strong&gt; One Rust trait (&lt;code&gt;Corrector&lt;/code&gt;), four shipped backend slots:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;GBT (default).&lt;/strong&gt; Gradient-boosted-tree, sub-MB on disk, sub-millisecond inference, no external dependencies. The conservative classical bet, and what the cold-start safety analysis assumes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;TabPFN-2.5 (opt-in, &lt;code&gt;tabpfn_http&lt;/code&gt; feature).&lt;/strong&gt; Hollmann et al. ICLR 2023 + Prior Labs 2026— foundation tabular model. Measured P95 &lt;strong&gt;31.15 ms&lt;/strong&gt; at batch size 8, sequence length 128, on RTX 4090 Laptop (BCa 95% CI [29.39, 35.32]); q-error reduction vs GBT on synthetic 7.84% (BCa 95% CI [2.21, 14.62], p=1.04×10⁻⁵).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;LLM-via-HTTP (opt-in, &lt;code&gt;llm_http&lt;/code&gt; feature). The viral one.&lt;/strong&gt; &lt;code&gt;samkhya-core::residual::llm::LlmHttpCorrector&lt;/code&gt; calls an HTTP server you control. Two reference servers ship in &lt;code&gt;samkhya-gpudb/scripts/&lt;/code&gt;— &lt;code&gt;llm_infer_server.py&lt;/code&gt; (Python FastAPI, canonical) and &lt;code&gt;llm_infer_server.ts&lt;/code&gt; (Node TypeScript port, broader operator appeal). Documented backends: &lt;strong&gt;Anthropic Claude&lt;/strong&gt; (claude-opus-4-7, claude-sonnet-4-6), &lt;strong&gt;OpenAI GPT-4o-mini&lt;/strong&gt;, &lt;strong&gt;local Ollama&lt;/strong&gt; (llama3.2:1b via &lt;code&gt;http://127.0.0.1:11434&lt;/code&gt;). Plug your own foundation model in ~50 lines of glue. Wire contract is dead-simple: &lt;code&gt;POST /infer {"features": [...], "baseline_estimate": &amp;lt;u64&amp;gt;}&lt;/code&gt; → &lt;code&gt;{"estimate": &amp;lt;u64&amp;gt;}&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Dummy (transport-floor).&lt;/strong&gt; Echo backend for measuring wire overhead independent of model latency. Produced the H1-A PASS— P95 &lt;strong&gt;0.07–0.11 ms&lt;/strong&gt; across batch sizes 1/4/8/16/32, proving the LLM plug works at sub-millisecond cost when the model isn't the bottleneck.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The safety story is what makes this sane.&lt;/strong&gt; Every backend— Claude, GPT, Ollama, GBT, TabPFN, anything you write tomorrow— is clamped from above by the LpJoinBound ceiling at Layer 3. If a hallucinating LLM returns "the join is 10¹² rows," LpJoinBound says "the provable ceiling is 4.2 million" and the planner sees 4.2 million. A miscalibrated TabPFN, a stale GBT, an LLM with a cosmic-ray bit-flip— none of them can break never-regress, by construction. That's the contract the field has been waiting for: &lt;em&gt;let foundation models help with query planning, without giving them the keys&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's measured, what's projected— being explicit.&lt;/strong&gt; The transport-floor latency (0.07–0.11 ms P95) and the wire contract are MEASURED on the dummy backend (full receipts in &lt;code&gt;bench-results/19_llm_corrector.md&lt;/code&gt;). The live-LLM end-to-end latency cells— Claude (~1.2s P95 paper-projection), GPT-4o-mini (comparable), Ollama (latency-bounded by local hardware)— are &lt;strong&gt;PROJECTED&lt;/strong&gt; pending API-key budget and the 30-trial measurement campaign. The &lt;em&gt;mechanism&lt;/em&gt; ships in v1.0; the &lt;em&gt;headline live numbers&lt;/em&gt; are next-revision work. I would rather you know that now than discover it when you read the bench-results dossier.&lt;/p&gt;

&lt;p&gt;The LpJoinBound clamp, visualised. Claude (amber), GPT-4o-mini (cyan), and local Ollama (emerald) each contribute a cardinality estimate; the crystalline lens— the provable pessimistic envelope— refracts and bounds every stream before it reaches the query-plan tree below. A hallucinating LLM cannot exceed the ceiling. The worst case is the engine's native plan.&lt;/p&gt;

&lt;h2&gt;
  
  
  Now the part nobody else in this field tells you
&lt;/h2&gt;

&lt;p&gt;The standard playbook in learned cardinality estimation— for about eight years now, across roughly fifty papers— has been: hand-tune a system against a target workload, report the geometric mean of the wins in the headline, mention the losses in a §9 disclosures section that nobody reads. The credibility deficit that resulted from this playbook is exactly as large as the cumulative gap between those papers and the production deployments that followed them. Which is to say large. Naru is dead. NeuroCard is dead. MSCN is dead. DeepDB is dead. BayesCard is dead. None of them shipped into a production database. The 2021–2022 critique papers (&lt;em&gt;Are We Ready For Learned CE?&lt;/em&gt;, &lt;em&gt;In-depth Study of Learned CE&lt;/em&gt;) wrote the obituary collectively. The field has been quiet about why ever since.&lt;/p&gt;

&lt;p&gt;Before the WAVE4-F head-to-head against native DataFusion 46 on the IMDb Join-Order-Benchmark Slow subset (n=55 paired warm-cache, scale factor 1), I pre-registered three upper-bound performance claims. ≥1.6× geometric mean on join-heavy queries. ≥1.35× on aggregate-heavy. ≥1.50× overall headline.&lt;/p&gt;

&lt;p&gt;All three were &lt;strong&gt;falsified&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The measured geometric mean is 1.038× wallclock— BCa 95% CI [1.026, 1.056], Wilcoxon W=212 p=3.00×10⁻⁶. The Benjamini-Hochberg FDR procedure at q=0.05 rejects the null on 24 of 55 cells. The full record across the suite is &lt;strong&gt;17 wins / 38 ties / 0 losses&lt;/strong&gt;. Never-regress holds— that is the whole point of the LpJoinBound clamp— but the magnitude of the wins is small, and I am reporting it as such, in the headline section, where this sentence lives.&lt;/p&gt;

&lt;p&gt;The TabPFN-2.5 backend had two pre-registered hypotheses. H1-A (P95 below 50 ms) passed comfortably. H1-B (≥15% q-error reduction over GBT on synthetic) &lt;strong&gt;failed on magnitude&lt;/strong&gt;— measured reduction is 7.84%, statistically real, half the pre-registered effect size. The paper reports H1-B as falsified.&lt;/p&gt;

&lt;p&gt;I am telling you this here, in the launch post, on the first day, in the second-most-prominent section, because the alternative is the §9-disclosures playbook that hollowed out this field's credibility in the first place. The cost of admitting a falsified pre-reg in public is smaller than the cost of having someone discover it on their own three months later. The 17/38/0 record is the appropriate evidence to weigh: a real, modest, statistically significant, never-regress improvement, in the only kind of public paired-warm-cache head-to-head whose results would actually deserve to be believed.&lt;/p&gt;

&lt;p&gt;The interesting number in this release is not the JOB-Slow geomean. It is the &lt;strong&gt;40.95× LpJoinBound tightness over AGM on the star-5 family&lt;/strong&gt;. That is a theoretical-tightness result, with a wallclock translation on the synthetic topology, and it is the right way to communicate what this library actually contributes.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 1000 → 42 demo (thirty seconds to feel the mechanism)
&lt;/h2&gt;

&lt;p&gt;The repository ships a &lt;code&gt;stats_propagation_demo&lt;/code&gt; example that proves the end-to-end path in plain Rust. A 1000-row table wrapped in DataFusion 46's default &lt;code&gt;TableProvider&lt;/code&gt; reports &lt;code&gt;num_rows = 1000&lt;/code&gt; to the physical plan. Wrap the same provider with &lt;code&gt;SamkhyaTableProvider&lt;/code&gt; plus the optimizer rule, and the physical plan reports &lt;code&gt;num_rows = 42&lt;/code&gt;. The example prints, verbatim: &lt;em&gt;"without rule: 1000, with rule: 42."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The DataFusion integration is intentionally a five-line change:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;use&lt;/span&gt; &lt;span class="nn"&gt;datafusion&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;prelude&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;SessionContext&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;use&lt;/span&gt; &lt;span class="nn"&gt;samkhya_datafusion&lt;/span&gt;&lt;span class="p"&gt;::{&lt;/span&gt;&lt;span class="n"&gt;SamkhyaTableProvider&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SamkhyaOptimizerRule&lt;/span&gt;&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;SessionContext&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="nf"&gt;.state&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.add_optimizer_rule&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nn"&gt;Arc&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nn"&gt;SamkhyaOptimizerRule&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;default&lt;/span&gt;&lt;span class="p"&gt;()));&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;provider&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;SamkhyaTableProvider&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;wrap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inner_provider&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="nf"&gt;.with_puffin_sidecar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"orders.puffin"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="nf"&gt;.register_table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"orders"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nn"&gt;Arc&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;samkhya_leaves_seen&lt;/code&gt; diagnostic on the optimizer rule confirms the corrected stats reached the physical plan. No fork of DataFusion required. DataFusion 46's &lt;code&gt;Distribution&lt;/code&gt; framework already accepts external column statistics— samkhya simply supplies better ones.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is in the v1.0 box, plainly
&lt;/h2&gt;

&lt;p&gt;13 crates. Approximately 266 &lt;code&gt;#[test]&lt;/code&gt; blocks across the workspace. 17 property tests. ~31 million cargo-fuzz executions, zero crashes. Criterion microbenchmarks for sketches and Puffin I/O. &lt;code&gt;clippy -D warnings&lt;/code&gt; clean. The full workspace builds in under two minutes on a laptop with no network access. An ACM Artifact Evaluation v1.1 reviewer entry ships in &lt;a href="https://github.com/singhpratech/samkhya/blob/main/REPRODUCIBILITY.md" rel="noopener noreferrer"&gt;REPRODUCIBILITY.md&lt;/a&gt;— Functional, Reusable, Available badges all in reach. Full reproduction budget for the published numbers: roughly 90 minutes wallclock on the reference hardware. Apache-2.0 single license with explicit §3 patent grant, matching DataFusion, Iceberg, Arrow, and ClickHouse— every downstream user gets the same patent grant, not a dual-license toggle.&lt;/p&gt;

&lt;p&gt;Engine status, no hedge words:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;DataFusion&lt;/strong&gt;— three-layer integration (&lt;code&gt;SamkhyaTableProvider&lt;/code&gt; + &lt;code&gt;SamkhyaStatsExec&lt;/code&gt; + &lt;code&gt;SamkhyaOptimizerRule&lt;/code&gt;). Production. First-class target.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;DuckDB&lt;/strong&gt;— Rust-client integration via the &lt;code&gt;bundled&lt;/code&gt; feature, production. The cxx extension (cdylib + runtime &lt;code&gt;LOAD samkhya;&lt;/code&gt;) ships as staticlib+rlib in v1.0; cdylib waits on upstream DuckDB Issue #11638.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Polars&lt;/strong&gt;— Beta. Series-to-sketch helpers + &lt;code&gt;lazy_collect_with_feedback&lt;/code&gt; behind the &lt;code&gt;engine&lt;/code&gt; feature. Optimizer hook is upstream-blocked on Polars Issue #23345.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Postgres&lt;/strong&gt;— Scaffold only. pgrx-shaped, double-gated behind &lt;code&gt;pg_extension&lt;/code&gt; feature + &lt;code&gt;samkhya_pgrx_enabled&lt;/code&gt; rustc cfg, pinned to PG17. Real planner/executor hooks land in v1.1 after pgrx ≥ 0.13.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Iceberg&lt;/strong&gt;— Production. Puffin reader/writer with KIND-tag registration for all five sketch types.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Arrow&lt;/strong&gt;— Production. IPC round-trip helpers, byte-identical serialization for all five sketch types.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;gpudb&lt;/strong&gt;— CPU fallback production. GPU and TabPFN-2.5 HTTP backends opt-in behind &lt;code&gt;tabpfn_http&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Python&lt;/strong&gt;— Single abi3-py39 wheel on PyPI as &lt;code&gt;samkhya&lt;/code&gt;. Covers the portable-stats layer (sketches, Puffin reader/writer, &lt;code&gt;ColumnStats&lt;/code&gt;). Use case: dbt-style nightly ELT writes the Puffin sidecar next to the Parquet file; the morning's DuckDB/DataFusion ad-hoc queries inherit it for free. No Python ML stack required.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why this is the right shape now
&lt;/h2&gt;

&lt;p&gt;The cardinality-estimation field has spent eight years producing extraordinary research and almost no embedded-tier production code. The pattern is older than the learned-CE wave. Stillger's LEO at IBM in 2001 was the first feedback-driven query optimizer to reach a mainstream server-class DBMS— and it was the last. Every learned system since has carried the same shape: server-class assumption about a long-lived optimizer process, model footprint that doesn't fit the embedded tier (40–300 MB), inference latency the embedded tier cannot afford (5–50 ms), and a cold-start story that handwaves around the worst case.&lt;/p&gt;

&lt;p&gt;The 2021–2022 critique papers said this plainly. The production-database field then routed around it via adaptive query execution— a gorgeous technique that is &lt;em&gt;structurally inapplicable&lt;/em&gt; to engines without a long-lived process to adapt within. The embedded tier— DuckDB, DataFusion, Polars, gpudb— has been waiting for a library that addresses &lt;em&gt;its&lt;/em&gt; constraints, not the constraints of the systems whose obituaries those critique papers were writing.&lt;/p&gt;

&lt;p&gt;samkhya v1.0 is my bet that the three pieces an embedded engine actually needs— portable sketches, feedback-driven residuals, a provable safety envelope— are independent enough to ship as one library, and that the pluggable model-backend slot above them is the right level of abstraction to admit whatever the field consolidates on next. GBT default is the conservative classical bet. TabPFN-2.5 opt-in is the bet that foundation tabular models are real and that the right way to use them is behind a pessimistic envelope rather than as a replacement for one. The &lt;strong&gt;LLM-via-HTTP backend is the bet that the third wave is already here&lt;/strong&gt;— Claude, GPT-4o-mini, and Ollama are usable today, the wire contract is intentionally trivial so swapping providers is a 50-line change, and the safety envelope means the worst case is the engine's native plan. The library doesn't have to know what model you plugged in.&lt;/p&gt;

&lt;p&gt;The honest measurement is the second bet: that the field's credibility deficit is repairable by pre-registration and falsification reporting, and that the right way for a sole-author project to participate in that repair is to do it first and conspicuously. Falsified pre-regs are &lt;em&gt;information&lt;/em&gt;. The papers that hid theirs are the reason this field has the reputation it has.&lt;/p&gt;

&lt;h2&gt;
  
  
  What ships next, in the order I will close it
&lt;/h2&gt;

&lt;p&gt;The v1.0 limitations are named explicitly in §9 of the technical paper. The v1.1 roadmap closes them in the order they were named:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Tighter effect-size attribution on JOB-Slow— cold-cache and Parquet methodologies, larger n, larger memory ceiling. The 1.038× geomean is partly methodology and partly intrinsic; v1.1 separates the two with a clean experimental design.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Broader workload coverage starting with STATS-CEB (Han et al., VLDB 2022). JOB-Slow is the field's standard but it is one schema. STATS-CEB is the obvious next workload.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;In-process TabPFN-2.5 backend. The current HTTP round-trip eats 10–15 ms; collapsing it gets the P95 well under 20 ms.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Real Postgres planner/executor hooks. The scaffold is in v1.0. The implementation lands after pgrx ≥ 0.13.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Live-LLM end-to-end measurement campaign. The LLM-via-HTTP backend ships in v1.0 with a measured transport floor (P95 0.07–0.11 ms) and three documented providers (Claude, GPT-4o-mini, Ollama), but the 30-trial paired live-provider campaign hasn't run yet. ANTHROPIC_API_KEY and OPENAI_API_KEY need budget approval; Ollama just needs me to install it on the bench host. The mechanism is in v1.0; the measured live numbers are v1.1.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  One sentence each, for the three audiences who will read this
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;If you have ever wanted to put Claude or GPT-4o-mini into the query path&lt;/strong&gt; and stopped because nobody had built the safety envelope: that envelope is shipped. &lt;code&gt;cargo add samkhya-core --features llm_http&lt;/code&gt;, point &lt;code&gt;llm_infer_server.py&lt;/code&gt; at your favorite provider, and your DataFusion (or DuckDB, or Polars) optimizer will route every cardinality estimate through the LLM, clamped from above by a 2008-bound-tightening LP so a hallucination can never produce a worse plan than the engine's native one. That's the contract. That's the thing the field has been waiting for.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you ship DataFusion, DuckDB, Polars, or Iceberg code today&lt;/strong&gt; and you have ever cursed at a 1000-row estimate that should have been 42: &lt;code&gt;cargo add samkhya-core&lt;/code&gt; and the five-line DataFusion snippet above gets you there in an hour. No LLM involved by default. The GBT backend is the conservative classical bet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you are a database researcher who has been quietly hoping someone would build the embedded-tier learned-CE library&lt;/strong&gt; that fifty server-class papers couldn't: this is that library, the design is in &lt;a href="https://dev.to/publications/samkhya-portable-feedback-driven-cardinality-correction-embedded-analytics"&gt;the companion paper&lt;/a&gt;, the falsified pre-regs are in the headline section because that is where they belong, and the &lt;code&gt;bench-results/&lt;/code&gt; directory in the repository is where the receipts live— I would rather have the argument there than in the abstract.&lt;/p&gt;

&lt;p&gt;The library's whole job is to count reality's constituents honestly. The release is doing the same thing about its own measurements. Both are deliberate. Both are how this field starts to look like the production field it always wanted to be.&lt;/p&gt;

&lt;p&gt;samkhya v1.0.0— Apache-2.0. Repository: &lt;a href="https://github.com/singhpratech/samkhya" rel="noopener noreferrer"&gt;github.com/singhpratech/samkhya&lt;/a&gt;. Technical paper: &lt;a href="https://dev.to/publications/samkhya-portable-feedback-driven-cardinality-correction-embedded-analytics"&gt;/publications/samkhya-portable-feedback-driven-cardinality-correction-embedded-analytics&lt;/a&gt;. Companion to the prior &lt;a href="https://dev.to/publications/gpudb-gpu-resident-execution-engine-duckdb-cuda-metal"&gt;gpudb&lt;/a&gt; release— the two share the embedded-tier engine target, the GPU-optional architecture, and the single-Apache-2.0 license posture.&lt;/p&gt;

</description>
      <category>rust</category>
      <category>database</category>
      <category>opensource</category>
      <category>datascience</category>
    </item>
    <item>
      <title>On-Device AI Just Got Real</title>
      <dc:creator>AI Explore</dc:creator>
      <pubDate>Sun, 28 Jun 2026 18:02:49 +0000</pubDate>
      <link>https://dev.to/aiexplore369zoho/on-device-ai-just-got-real-397i</link>
      <guid>https://dev.to/aiexplore369zoho/on-device-ai-just-got-real-397i</guid>
      <description>&lt;p&gt;Apple's newest on-device model carries about 20 billion parameters, and on any given request it fires maybe one to four billion of them. That gap — 20B stored, roughly 3B running — is the whole story of 2026. The model that now ships inside the latest iPhone is no longer a shrunken, lobotomized cousin of the cloud model. It's a different kind of object: large in flash, small in motion, and it never phones home.&lt;/p&gt;

&lt;p&gt;For three years the on-device pitch was mostly aspirational. Demos ran, latency was rough, quality trailed the API by a generation, and every serious AI feature still resolved to a per-token bill in someone's datacenter. In mid-2026 that stopped being true. Two releases — Apple's third-generation Foundation Models at WWDC on June 8, and Google's Gemma 4 family on April 2 — quietly moved the floor. Genuinely useful agents now run on hardware you already own, offline, for free.&lt;/p&gt;

&lt;h2&gt;
  
  
  The economics nobody priced in
&lt;/h2&gt;

&lt;p&gt;Forget benchmarks for a second; the load-bearing fact here is accounting. When the model lives in the cloud, every inference is a metered event — input tokens, output tokens, a line item that scales linearly with usage and explodes the moment you wrap the model in an agent loop. Agentic workloads are the worst case for the token meter: a single "go do this task" can fan out into dozens of model calls as the agent plans, calls tools, retries, and re-reads its own output. The bill grows with your ambition.&lt;/p&gt;

&lt;p&gt;Move the model onto the device and the marginal cost of an inference is approximately &lt;strong&gt;$0&lt;/strong&gt;. No API key, no rate limit, no usage dashboard. You paid for the silicon once; every token after that is free in the only sense a product manager cares about — it doesn't show up on a monthly invoice that grows with your success. That single change rewrites which features are worth building. A background task that re-summarizes your inbox every five minutes is insane on a per-token plan and trivial on-device. So is an agent that quietly loops a hundred times to get one answer right.&lt;/p&gt;

&lt;p&gt;And it isn't only cost. On-device means &lt;em&gt;offline&lt;/em&gt; — the model works on a plane, in a tunnel, in a country where your cloud provider has no presence. And it means &lt;em&gt;private&lt;/em&gt; in the literal architectural sense: the data never leaves the NAND. For a calendar, a photo library, a health log, or a half-written message, "these bytes physically did not transit a network" is a far stronger guarantee than any privacy policy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sparse beats big: the architecture that did it
&lt;/h2&gt;

&lt;p&gt;The reason this works now isn't that someone discovered how to cram a frontier model into 3GB of RAM. It's that the model designs changed shape. The winning idea across both Apple and Google is the same: decouple how big the model &lt;em&gt;is&lt;/em&gt; from how much of it actually &lt;em&gt;runs&lt;/em&gt; on any given token.&lt;/p&gt;

&lt;p&gt;Apple's AFM 3 on-device model uses what the company calls &lt;strong&gt;Instruction-Following Pruning (IFP)&lt;/strong&gt;. The full ~20B-parameter model lives in flash. For a given request, the system activates only the relevant ~1-4B parameters, swapping the needed "experts" into DRAM on demand. The phone never holds the whole model in working memory — it streams the slice it needs. That's how a 20B model fits inside a memory budget that physically cannot hold 20B of active weights.&lt;/p&gt;

&lt;p&gt;Google's Gemma 4 attacks the same problem from two angles. The edge models — &lt;code&gt;E2B&lt;/code&gt; and &lt;code&gt;E4B&lt;/code&gt; — use "Per-Layer Embeddings" to keep the active footprint small: E4B carries roughly 8B total parameters but runs with about 4.5B effective. Its bigger sibling, a 26B mixture-of-experts, only lights up a fraction of its experts per token. MoE and per-layer tricks are Apple's IFP insight wearing different clothes — most of a large model is dead weight on any single token, so don't pay to run it.&lt;/p&gt;

&lt;p&gt;The hardware finally met the software halfway. The neural accelerators (NPUs) now standard in phones and laptops run 4-8B-class models at genuinely usable speeds. The practical question shifted from "can it run at all" to "which model fits this RAM tier" — and that's a routine product decision, not a research problem. Google says the Gemma 4 edge models run "completely offline with near-zero latency" not just on phones but on a Raspberry Pi and an NVIDIA Jetson Orin Nano; the prior generation's E4B reportedly fit in about 3GB of RAM.&lt;/p&gt;

&lt;h2&gt;
  
  
  These are not toy models anymore
&lt;/h2&gt;

&lt;p&gt;The capability jump is real, and it's broadest where it matters for everyday use: multimodality. AFM 3's on-device model is now multimodal — it takes images in, and Apple reports human raters preferred its image understanding about 61% of the time over the previous generation. Its on-device text-to-speech scored 4.24 on a 5-point mean-opinion scale versus 3.82 for the baseline — roughly the difference between "obviously a robot" and "fine, I'll actually listen to this." Gemma 4 ships native vision and audio, 128K context on the edge models, and 140+ languages.&lt;/p&gt;

&lt;p&gt;The open-model leaderboard backs the claim up. Google's 31B dense Gemma 4 lands around #3 among open models and its 26B MoE around #6 on LMArena's text board — and Google's own framing, that these "outcompete models 20x their size," is the whole thesis in one line. The point of a small model in 2026 isn't to match GPT-class frontier reasoning. It's to be good enough at the 90% of tasks that don't need it, while running for free in your pocket.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it still can't do
&lt;/h2&gt;

&lt;p&gt;The honest caveat: the device model is not the frontier model, and pretending otherwise is how you ship a disappointing feature. Hard multi-step reasoning, long-horizon coding, deep research across large corpora — those still belong in the cloud, where a much larger model with a big context budget earns its token bill. Treat the small-model benchmark numbers that float around — figures in the mid-80s on MMLU for 14B-class models, high-60s for sub-4B ones — with suspicion; MMLU is saturated and gameable, and a leaderboard score tells you almost nothing about whether the thing can hold a five-step plan together. The right mental model is a hybrid: the device handles the fast, private, high-frequency work and hands off to the cloud only when a task genuinely outgrows it. The interesting engineering of the next year is the routing layer that decides which is which.&lt;/p&gt;

&lt;h2&gt;
  
  
  Apple opened the gates
&lt;/h2&gt;

&lt;p&gt;The most underrated WWDC announcement wasn't the model — it was the door. Apple opened its Foundation Models framework to third-party and open models, with Swift packages for Anthropic's and Google's models on the way, and added agentic primitives plus on-device semantic search to the SDK. Translation: a developer can write an app against one local-first AI framework and let the device decide which model answers. That's the platform move. The model becomes a commodity inside it; the framework — the agent primitives, the semantic index over your own files, the routing — is the moat. Once the OS ships a free, private, capable model and a clean API to it, "add AI" stops meaning "add a cloud dependency and a billing relationship" and starts meaning "call a system function."&lt;/p&gt;

&lt;h2&gt;
  
  
  The take
&lt;/h2&gt;

&lt;p&gt;The cloud-AI era trained everyone to assume intelligence is a utility you rent by the token. 2026 is the year that assumption cracked at the edge — not because device models got as smart as the frontier (they didn't), but because sparse architectures finally made "large but cheap to run" a real category, and the economics of $0 marginal inference are too good to ignore for the enormous class of features that never needed a genius in the first place. The cloud keeps the hardest problems. The device quietly takes everything else — offline, private, and off the meter. That's not a demo anymore. It's the new default, and most software hasn't been rewritten to assume it yet. The teams that rewrite first will look, briefly, like magicians.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>llm</category>
      <category>edgecomputing</category>
    </item>
    <item>
      <title>The Coding-Agent Arms Race: Who Survives the H1-2026 Shakeout</title>
      <dc:creator>AI Explore</dc:creator>
      <pubDate>Sun, 28 Jun 2026 18:02:46 +0000</pubDate>
      <link>https://dev.to/aiexplore369zoho/the-coding-agent-arms-race-who-survives-the-h1-2026-shakeout-36po</link>
      <guid>https://dev.to/aiexplore369zoho/the-coding-agent-arms-race-who-survives-the-h1-2026-shakeout-36po</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR —&lt;/strong&gt; Coding agents stopped being a checkbox in your IDE and turned into a four-way platform war in the first half of 2026. Anthropic is winning the model-and-product fight, OpenAI is winning distribution, and Cognition is winning the enterprise. The real moats are model cadence, install base, and price — not features. Pick your agent like you'd pick a vendor you might have to leave, because one of these companies is going to whipsaw your workflow before the year is out.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;On June 9, 2026, Anthropic shipped two new models — Fable 5 and a restricted, higher-tier &lt;strong&gt;Mythos 5&lt;/strong&gt; — and stood up a new pricing class above Opus at &lt;code&gt;$10/$50&lt;/code&gt; per million tokens. The same day, OpenAI quietly added "Migrate to Codex" flows designed to import your Claude Code setup with a couple of clicks. Two of the most valuable companies on Earth, shipping on the same Tuesday, fighting over the exact same thing: the cursor in your terminal.&lt;/p&gt;

&lt;p&gt;That is not a feature race anymore. That is a war for the developer's keyboard, and H1 2026 was the year it got bloody. If you're still picking a coding agent the way you'd pick a linter, you're underpricing the decision. The agent you wire into your workflow today is a bet on which platform survives — and several of them won't.&lt;/p&gt;

&lt;h2&gt;
  
  
  The numbers that ended the "feature" era
&lt;/h2&gt;

&lt;p&gt;Follow the money and the stakes get obvious fast. Claude Code's annualized revenue reportedly went from roughly &lt;strong&gt;$1B in November 2025 to about $2.5B by February 2026&lt;/strong&gt; — a single product line, doing 2.5x in a quarter. Anthropic as a whole exited 2025 near $9B in run-rate; Dario Amodei confirmed a &lt;strong&gt;$19B run-rate&lt;/strong&gt; at the Morgan Stanley TMT conference on March 4, 2026, and tech press pegged it near &lt;strong&gt;$30B by April&lt;/strong&gt; — roughly 80x growth in a couple of years. The Series G — about $30B raised at a ~$380B post-money valuation in February — is the kind of number you only justify if you believe coding agents are infrastructure, not novelty.&lt;/p&gt;

&lt;p&gt;Cognition, the company behind Devin, raised &lt;strong&gt;more than $1B at a $26B post-money valuation on May 27, 2026&lt;/strong&gt; — up roughly 2.5x in eight months from the $10.2B it carried in September 2025. Its ARR run-rate sits around &lt;strong&gt;$492M&lt;/strong&gt;, with enterprise usage reportedly growing 50% month-over-month for six straight months and a customer list that includes Mercedes-Benz, NASA, and Goldman Sachs. When NASA is letting an autonomous agent touch code, the "is this a toy" conversation is over.&lt;/p&gt;

&lt;p&gt;You don't see capital and revenue move like this around features. You see it move like this around platforms — the kind people get locked into.&lt;/p&gt;

&lt;h2&gt;
  
  
  The model train now leaves every six weeks
&lt;/h2&gt;

&lt;p&gt;The single most underrated dynamic of 2026 is cadence. Anthropic alone shipped &lt;strong&gt;Opus 4.5&lt;/strong&gt; on November 24, 2025 — the first model over 80% on SWE-bench Verified at 80.9%, priced at &lt;code&gt;$5/$25&lt;/code&gt; per million tokens — then &lt;strong&gt;Opus 4.7&lt;/strong&gt; on April 16, &lt;strong&gt;Opus 4.8&lt;/strong&gt; on May 28 (which became the default within days and added a &lt;code&gt;$10/$50&lt;/code&gt; "fast mode"), and the Fable/Mythos drop on June 9. That's a new frontier roughly every six weeks.&lt;/p&gt;

&lt;p&gt;OpenAI answered with &lt;strong&gt;GPT-5.5&lt;/strong&gt; on April 23, reportedly hitting around 88.7% on SWE-bench Verified and 82.7% on Terminal-Bench 2.0 at &lt;code&gt;$5/$30&lt;/code&gt;, with a GPT-5.5 Pro tier at &lt;code&gt;$30/$180&lt;/code&gt;. Google moved Jules onto a Gemini 3 Flash base on January 30 and onto Gemini 3.1 Pro for paid users by March 9. Three labs, all re-baselining their agents every few weeks.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Cadence is a moat the way a treadmill is a moat: it doesn't stop, and the cost of falling off compounds. A lab that ships a frontier model every six weeks can absorb a competitor's big launch in a month. A lab that ships twice a year cannot.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is why I'd bet against any pure-play coding-agent startup that doesn't own a model. If your differentiation is the harness around someone else's weights, your roadmap is hostage to a release schedule you don't control — and your margins evaporate the moment the model underneath you gets cheaper or smarter without you.&lt;/p&gt;

&lt;h2&gt;
  
  
  Distribution is the moat nobody wants to admit
&lt;/h2&gt;

&lt;p&gt;Here's the contrarian part: the best &lt;em&gt;model&lt;/em&gt; is not guaranteed to win. The best &lt;em&gt;distribution&lt;/em&gt; usually does, and that's where OpenAI is dangerous. Codex is no longer a CLI — it's a CLI plus macOS and Windows apps plus mobile, and on &lt;strong&gt;June 25, 2026, Codex Remote went GA&lt;/strong&gt;, letting you drive a Mac or Windows host straight from the ChatGPT app on your phone. OpenAI is plugging an autonomous coding agent into the single largest consumer AI install base on the planet, then adding a &lt;code&gt;$100/mo&lt;/code&gt; Codex Pro tier to monetize the power users. The Codex CLI itself is free, and by some reports it leads Terminal-Bench 2.1 at 83.4% against Claude Code's 78.9% and Gemini CLI's 70.7%.&lt;/p&gt;

&lt;p&gt;Google's play is the same logic by another route: bundle. Jules went GA in August 2025, shipped a CLI and API in October, and now leans on Gemini subscriptions — free at 15 tasks/day, $19.99 AI Pro at 100/day, AI Ultra at 300/day. Google doesn't need Jules to be the best agent. It needs Jules to be the &lt;em&gt;default&lt;/em&gt; one already sitting inside an account you pay for.&lt;/p&gt;

&lt;p&gt;Anthropic's counter is that its product is genuinely ahead. Claude Code in 2026 added &lt;code&gt;/code-review&lt;/code&gt;, an &lt;code&gt;/ultrareview&lt;/code&gt; cloud bug-hunting fleet, "dynamic workflows" where Claude scripts dozens to hundreds of subagents on its own, scheduled cloud Routines, native binaries, and Artifacts. That's the most sophisticated agent surface shipping today. The open question is whether product depth beats a billion-user front door. History says it usually doesn't.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Windsurf saga is the warning label
&lt;/h2&gt;

&lt;p&gt;If you want a single story that captures why your agent choice is a strategic bet, it's Windsurf. In May 2025, OpenAI had a roughly &lt;strong&gt;$3B deal&lt;/strong&gt; to acquire it. By &lt;strong&gt;July 11, 2025, the deal collapsed.&lt;/strong&gt; Within days, Google paid about &lt;strong&gt;$2.4B to license Windsurf's tech and hire CEO Varun Mohan and his co-founder into DeepMind&lt;/strong&gt; — a reverse-acquihire that pulled the brains out and left the body behind. On July 14, Cognition acquired what remained.&lt;/p&gt;

&lt;p&gt;Then watch what happened to the product. Under Cognition it shipped Windsurf 2.0 on April 15, 2026, got &lt;strong&gt;rebranded to "Devin Desktop" on June 2&lt;/strong&gt;, and its Cascade engine is being replaced by a Rust rewrite called "Devin Local" — reportedly about 30% more token-efficient — with &lt;strong&gt;Cascade reaching end-of-life on July 1, 2026.&lt;/strong&gt; If you built your team's workflow on Windsurf in early 2025, you have since survived a failed acquisition, a brain drain, a new owner, a rebrand, and the sunsetting of your core engine — in under eighteen months.&lt;/p&gt;

&lt;p&gt;That's not a freak event. That's the base rate for this market right now. Anyone building on a venture-backed agent should assume at least one ownership shock, one rename, and one engine swap inside their planning horizon.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pricing whiplash is the new normal
&lt;/h2&gt;

&lt;p&gt;The other thing that should scare you is how fast the rules change underneath you. Anthropic first imposed weekly usage limits on August 28, 2025 to loud backlash (it claimed under 5% of users were affected). Then in 2026 it reversed hard: 5-hour limits were reportedly doubled around May 6, and weekly limits were raised about 50%, effective through &lt;strong&gt;6 PM PDT on July 13, 2026&lt;/strong&gt; — a move widely read as defensive against OpenAI's Codex push. Anthropic hasn't said whether the higher ceiling survives past that date. Plan your June around a limit that might revert in July.&lt;/p&gt;

&lt;p&gt;On the other side, Cognition cut Devin's price from &lt;strong&gt;$500 to $20/mo&lt;/strong&gt; with Devin 2.0 in April 2025, then on April 14, 2026 retired its Core and Team plans and pushed self-serve onto quota tiers — Free, $20 Pro, $200 Max, and Teams. (Worth noting: the current product is Devin 2.2, shipped February 24, 2026; there is no "Devin 3" — v3 is the API. Don't let a vendor's marketing math confuse you.)&lt;/p&gt;

&lt;p&gt;When a market is consolidating and growth is the only metric that matters, price is a weapon, not a number on a page. Expect it to swing — up via rate limits, down via land-grab discounts — with very little notice.&lt;/p&gt;

&lt;h2&gt;
  
  
  So who's actually winning?
&lt;/h2&gt;

&lt;p&gt;My ranking, stated plainly and happy to be wrong in three months:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Anthropic leads on model-plus-product.&lt;/strong&gt; The cadence is unmatched, the agent surface is the deepest, and Claude Code's revenue curve is the most convincing single data point in the category. The risk is distribution — it has the best terminal agent and the smallest front door.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;OpenAI leads on distribution and is closing the product gap.&lt;/strong&gt; Codex Remote plus mobile plus a free CLI plus the "Migrate to Codex" funnel is a coordinated assault on Anthropic's installed base. If Codex's Terminal-Bench lead holds, this gets very close.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cognition is winning the enterprise autonomy bet.&lt;/strong&gt; A $26B valuation, $492M ARR, and NASA-grade logos say the "fire-and-forget agent" thesis is landing where budgets are biggest — even if Devin is narrower than a general assistant.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Google is the slow-moving giant that wins by default.&lt;/strong&gt; Jules doesn't have to be first; it has to be already paid for inside Workspace and Gemini subscriptions. Never bet against bundling.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The companies I'd worry about are the harness-only startups with no model and no distribution. Windsurf showed how that movie ends.&lt;/p&gt;

&lt;h2&gt;
  
  
  What builders should actually do about lock-in
&lt;/h2&gt;

&lt;p&gt;Stop optimizing for the best agent this quarter and start optimizing for the cheapest &lt;em&gt;exit&lt;/em&gt;. Concretely:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keep your workflow in portable formats.&lt;/strong&gt; The most encouraging trend of H1 2026 is the move toward open interop — Devin Local now ships an Agent Client Protocol that lets third-party agents (Claude's agent, Codex, OpenCode) plug in, and MCP support is spreading across Jules, Claude Code, and the rest. Build your context, your tool definitions, and your review steps around MCP and protocol layers, not around one vendor's proprietary config. The agent should be swappable; your scaffolding shouldn't move when it is.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Treat the model as a commodity input.&lt;/strong&gt; A frontier model every six weeks means whoever you favor today will be leapfrogged by lunchtime. Wire your stack so swapping the underlying model is a config change, not a migration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Assume the price changes.&lt;/strong&gt; Don't architect a process that only pencils out at a promotional rate. If your team's economics break when a $20 plan becomes a quota plan, or when a weekly limit reverts on July 13, you've built on sand.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Don't marry a single-product startup with your critical path.&lt;/strong&gt; Use the best tool, absolutely — but keep a tested fallback warm. The cost of running two agents in parallel is trivial next to the cost of an unplanned engine sunset.&lt;/p&gt;

&lt;h2&gt;
  
  
  The closing take
&lt;/h2&gt;

&lt;p&gt;The uncomfortable truth of this market is that there is no safe pick — only hedged ones. Anthropic has the best product and the cadence to defend it, but the smallest distribution. OpenAI has the front door and is sprinting to close the product gap. Cognition owns the enterprise but rides a narrower thesis. Google wins the people who never chose at all. Every one of them will change a price, a limit, or a product name on you before this year ends, and at least one well-funded name in this space won't make it to 2027 intact.&lt;/p&gt;

&lt;p&gt;So pick the agent that's best for the work in front of you today — and build everything around it as if you'll have to leave. In an arms race, loyalty is a liability. Portability is the only real moat you actually control.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>programming</category>
      <category>aiagents</category>
    </item>
  </channel>
</rss>
