<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Josh Green</title>
    <description>The latest articles on DEV Community by Josh Green (@josh_green_dev).</description>
    <link>https://dev.to/josh_green_dev</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3811550%2Fe3376e9e-5467-4ed6-a3f6-b4698a6721f0.png</url>
      <title>DEV Community: Josh Green</title>
      <link>https://dev.to/josh_green_dev</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/josh_green_dev"/>
    <language>en</language>
    <item>
      <title>Every Medium Publication That Accepts 3D Content (2026 Map)</title>
      <dc:creator>Josh Green</dc:creator>
      <pubDate>Sat, 30 May 2026 07:39:47 +0000</pubDate>
      <link>https://dev.to/josh_green_dev/every-medium-publication-that-accepts-3d-content-2026-map-4ni1</link>
      <guid>https://dev.to/josh_green_dev/every-medium-publication-that-accepts-3d-content-2026-map-4ni1</guid>
      <description>&lt;p&gt;If you create 3D content — printing guides, WebGL tutorials, Three.js projects, CAD workflows — and want to publish on Medium, you have a discovery problem. There is no central 3D publication on the platform.&lt;/p&gt;

&lt;p&gt;I mapped every Medium publication that accepts 3D-related content. Here is the short version for developers.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcggqgy5gpunlwff51jmk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcggqgy5gpunlwff51jmk.png" alt="Where to publish 3D content" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  For WebGL / Three.js / Browser 3D
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;JavaScript in Plain English&lt;/strong&gt; (180K followers) — best fit for Three.js, WebGL, browser rendering. Submit by following the pub and sending drafts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Level Up Coding&lt;/strong&gt; (73K followers) — Three.js is an explicit topic. Email &lt;a href="mailto:submit@gitconnected.com"&gt;submit@gitconnected.com&lt;/a&gt; with your draft link.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ITNEXT&lt;/strong&gt; (55K followers) — deep technical dives on rendering pipelines and WebGL optimization.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffje5cl3e59nbwkfvw5zu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffje5cl3e59nbwkfvw5zu.png" alt="Publications ranked" width="800" height="457"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  For 3D Printing / Maker Content
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Geek Culture&lt;/strong&gt; (33K followers) — most 3D-friendly general tech pub. Explicitly lists VR/AR/MR. Published 3D printing construction articles.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Startup&lt;/strong&gt; (739K followers) — massive audience. Frame your content as discovery/comparison stories.&lt;/p&gt;

&lt;h2&gt;
  
  
  For Design / UX
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;UX Collective&lt;/strong&gt; (483K followers) — 3D fits when framed as spatial UX, AR product previews, 3D viewer design.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bootcamp&lt;/strong&gt; (166K followers) — case studies and design tutorials with a 3D angle.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbmdqo1t5j5xa3hof3o6v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbmdqo1t5j5xa3hof3o6v.png" alt="Decision flowchart" width="800" height="457"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Easy Entry Points
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;ILLUMINATION&lt;/strong&gt; (250K+) — welcoming to new writers, broad tech.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dev Genius&lt;/strong&gt; (10K) — publishes daily, all skill levels.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff5apgdxxz92283beyqma.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff5apgdxxz92283beyqma.png" alt="Tag strategy" width="800" height="457"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Tags That Actually Work
&lt;/h2&gt;

&lt;p&gt;The right tags make or break discovery:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Virtual Reality&lt;/strong&gt; (494K followers) — highest reach for any 3D content&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Augmented Reality&lt;/strong&gt; (89K) — strong for spatial computing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;3D Printing&lt;/strong&gt; (2.1K) — small but targeted&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Three.js&lt;/strong&gt; (682) — tiny but engaged developers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;WebGL&lt;/strong&gt; (1.2K) — technical web graphics&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Formula: 1 broad tag + 1 niche tag + 2-3 topic tags.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Gap
&lt;/h2&gt;

&lt;p&gt;There is no dominant 3D publication on Medium. The space is wide open. Until someone builds one, place your articles in existing tech and design pubs and match their editorial angle.&lt;/p&gt;

&lt;p&gt;For previewing 3D files for article screenshots, I use &lt;a href="https://geometryviewer.com/windows-3d-viewer-alternative" rel="noopener noreferrer"&gt;GeometryViewer&lt;/a&gt; — browser-based, handles STL/OBJ/GLB, no install.&lt;/p&gt;

&lt;p&gt;Full guide with submission links for all 15 publications: &lt;a href="https://joshgreen-dev.github.io/2026/05/30/where-to-publish-3d-content-on-medium.html" rel="noopener noreferrer"&gt;joshgreen-dev.github.io&lt;/a&gt;&lt;/p&gt;

</description>
      <category>webgl</category>
      <category>threedprinting</category>
      <category>medium</category>
      <category>writing</category>
    </item>
    <item>
      <title>Microsoft 3D Viewer Dies July 1 — The STL Gap Nobody Is Talking About</title>
      <dc:creator>Josh Green</dc:creator>
      <pubDate>Thu, 28 May 2026 23:40:06 +0000</pubDate>
      <link>https://dev.to/josh_green_dev/microsoft-3d-viewer-dies-july-1-the-stl-gap-nobody-is-talking-about-2580</link>
      <guid>https://dev.to/josh_green_dev/microsoft-3d-viewer-dies-july-1-the-stl-gap-nobody-is-talking-about-2580</guid>
      <description>&lt;p&gt;Here's something that should bother every developer who works with 3D files: Microsoft is permanently removing 3D Viewer from the Microsoft Store on July 1, 2026.&lt;/p&gt;

&lt;p&gt;Their official replacement suggestion? Babylon.js Sandbox. A browser-based viewer that &lt;strong&gt;doesn't support STL files&lt;/strong&gt; — the single most common format in 3D printing, CNC machining, and CAD export workflows.&lt;/p&gt;

&lt;p&gt;Let that sink in. The world's largest OS vendor is dropping 3D file support and pointing users to a tool that can't open the most popular 3D format.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Full Death Timeline
&lt;/h2&gt;

&lt;p&gt;This isn't a one-off. Microsoft has systematically killed every 3D tool they built:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;2017: "3D for Everyone!" — Creators Update launches
2023: Windows Mixed Reality deprecated
2024: HoloLens 2 production stopped
2024: Paint 3D removed from Store (November 4)
2024: FBX support permanently disabled in 3D Viewer (CVE-2024-20677)
2026: 3D Viewer deprecated (February)
2026: 3D Viewer removed from Store (July 1) ← YOU ARE HERE
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Nine years from "3D for Everyone" to "3D for Nobody."&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Developers Should Care
&lt;/h2&gt;

&lt;p&gt;If you build anything that outputs 3D files — CAD tools, slicers, generative design, AI mesh generation, game asset pipelines — your Windows users just lost their default preview tool.&lt;/p&gt;

&lt;p&gt;"Just use Blender" is the wrong answer. Blender is a 200MB professional content creation suite. Your user doesn't want to install Blender to preview an STL. They want to double-click and see the model.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Browser-Based Alternative
&lt;/h2&gt;

&lt;p&gt;I've switched to &lt;a href="https://geometryviewer.com" rel="noopener noreferrer"&gt;GeometryViewer&lt;/a&gt; for day-to-day 3D file preview. Here's why it works for my workflow:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Format coverage that actually matters:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;STL (binary + ASCII), OBJ, GLB/GLTF, 3MF, FBX, PLY, STEP, DAE&lt;/li&gt;
&lt;li&gt;Drag and drop — no account, no install&lt;/li&gt;
&lt;li&gt;Works offline as a PWA (install once, works without internet)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Developer-friendly features:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Embeddable via &lt;code&gt;&amp;lt;iframe&amp;gt;&lt;/code&gt; — one line to add a 3D viewer to any webpage&lt;/li&gt;
&lt;li&gt;Shareable URLs — send a model preview link, recipient sees it instantly&lt;/li&gt;
&lt;li&gt;Measurement tools, cross-sections, material simulation&lt;/li&gt;
&lt;li&gt;No WebGL server-side rendering — everything runs client-side&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The STL gap filled:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Proper normal handling (no inverted faces on manifold models)&lt;/li&gt;
&lt;li&gt;Material preview that shows what a 3D print will actually look like&lt;/li&gt;
&lt;li&gt;Handles large meshes without choking (tested with 50M+ triangle scans)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What Happens to Existing Installs?
&lt;/h2&gt;

&lt;p&gt;Microsoft clarified: existing 3D Viewer installations won't be auto-deleted. The app keeps working. But:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No security patches after July 1&lt;/li&gt;
&lt;li&gt;Can't reinstall after a clean Windows install or new PC&lt;/li&gt;
&lt;li&gt;The FBX parser already has a known RCE vulnerability (CVE-2024-20677, CVSS 7.8) that Microsoft "fixed" by disabling FBX entirely&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Running unpatched software that already had an RCE is not a plan. It's a liability.&lt;/p&gt;

&lt;h2&gt;
  
  
  If You're Building 3D Tools
&lt;/h2&gt;

&lt;p&gt;Consider this the canary in the coal mine. Microsoft is not coming back to desktop 3D. Their 3D strategy is now:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Azure Remote Rendering&lt;/strong&gt; (enterprise, cloud-based)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Partnership with Meta&lt;/strong&gt; (Xbox branding on Quest hardware)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Copilot&lt;/strong&gt; (AI-generated 3D via text prompts — not file viewing)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Desktop 3D file viewing is officially an open-source problem now. If your app generates 3D output and your users are on Windows, you need to either:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bundle your own viewer&lt;/li&gt;
&lt;li&gt;Point them to a browser-based solution like &lt;a href="https://geometryviewer.com" rel="noopener noreferrer"&gt;GeometryViewer&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Or accept that they'll have no way to preview your files&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The era of "Windows handles 3D files natively" ended in February 2026. Plan accordingly.&lt;/p&gt;

</description>
      <category>windows</category>
      <category>threedprinting</category>
      <category>webdev</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Why DDR5 Bandwidth Kills Dual-LLM Inference on APUs (Benchmarks Inside)</title>
      <dc:creator>Josh Green</dc:creator>
      <pubDate>Thu, 28 May 2026 15:43:47 +0000</pubDate>
      <link>https://dev.to/josh_green_dev/why-ddr5-bandwidth-kills-dual-llm-inference-on-apus-benchmarks-inside-42p1</link>
      <guid>https://dev.to/josh_green_dev/why-ddr5-bandwidth-kills-dual-llm-inference-on-apus-benchmarks-inside-42p1</guid>
      <description>&lt;p&gt;Did you know that a 35-billion-parameter model can generate tokens at the same compute cost as a 4B model? That single fact made me abandon a multi-model agent architecture I'd spent a weekend building. But I had to run the benchmarks first to understand why.&lt;/p&gt;

&lt;p&gt;Here's the full breakdown, with commands, numbers, and the architectural reason it all falls apart on shared-memory hardware.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Discovery That Changed Everything
&lt;/h2&gt;

&lt;p&gt;I'd been running &lt;code&gt;qwen3.6:35b&lt;/code&gt; on my Minisforum UM790Pro for weeks -- it's my daily driver for everything from coding to running &lt;a href="https://geometryviewer.com" rel="noopener noreferrer"&gt;GeometryViewer&lt;/a&gt; for 3D model previews. 17.8 tokens/second -- genuinely usable for interactive work. But I kept wondering: could I run a lightweight sidecar model alongside it for quick classification and tool-calling in an agent pipeline?&lt;/p&gt;

&lt;p&gt;Before I even started benchmarking, I dug into what &lt;code&gt;qwen3.6:35b&lt;/code&gt; actually is under the hood. It's a Mixture of Experts model: 256 total experts with only 8 activated per token. The architecture also incorporates SSM (State Space Model) components alongside traditional attention -- Mamba-style layers that handle certain sequence patterns more efficiently than pure transformers.&lt;/p&gt;

&lt;p&gt;The math hit me: 8 out of 256 experts means each token only touches roughly 4-5B parameters worth of compute. The model carries 36 billion parameters of &lt;em&gt;knowledge&lt;/em&gt;, but its per-token &lt;em&gt;cost&lt;/em&gt; is comparable to a small dense model. I was planning to run a separate 4B model for "fast tasks" next to a model that already operates at 4B-class speed.&lt;/p&gt;

&lt;p&gt;But I had to prove it with numbers.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fjoshgreen-dev.github.io%2Fimages%2Fllm-benchmarks%2Fchart-baselines.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fjoshgreen-dev.github.io%2Fimages%2Fllm-benchmarks%2Fchart-baselines.png" alt="Baseline benchmark results — single model performance" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Hardware and Ollama Setup
&lt;/h2&gt;

&lt;p&gt;The UM790Pro specs that matter for this experiment:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CPU:&lt;/strong&gt; AMD Ryzen 9 7940HS (Zen 4, 8C/16T)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;iGPU:&lt;/strong&gt; AMD Radeon 780M (12 RDNA 3 compute units)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RAM:&lt;/strong&gt; 96 GB DDR5-5600 (~80 GB/s bandwidth)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPU memory pool:&lt;/strong&gt; 2 GB dedicated VRAM + 46 GB GTT = 48 GB GPU-accessible&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fjoshgreen-dev.github.io%2Fimages%2Fllm-benchmarks%2Fum790pro-official-product.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fjoshgreen-dev.github.io%2Fimages%2Fllm-benchmarks%2Fum790pro-official-product.jpg" alt="The Minisforum UM790Pro — compact but surprisingly capable" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That 48 GB GPU pool sounds enormous until you realize it's carved from the same DDR5 that the CPU also uses. There is no separate GDDR6 bus. Everything -- CPU inference, GPU inference, KV caches, OS operations -- flows through one 80 GB/s pipe.&lt;/p&gt;

&lt;p&gt;Four models under test, managed through Ollama:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Pull the models&lt;/span&gt;
ollama pull qwen3.6:35b
ollama pull gemma4-e2b-abliterated
ollama pull qwen3:4b-instruct
ollama pull qwen2.5:1.5b

&lt;span class="c"&gt;# Check what's loaded and where&lt;/span&gt;
ollama ps
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;ollama ps&lt;/code&gt; shows you which models are in memory and whether they're on GPU or CPU. For forcing CPU-only inference (critical for these tests), you pass &lt;code&gt;num_gpu&lt;/code&gt; as a model parameter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Force a model onto CPU -- zero GPU layers&lt;/span&gt;
curl http://localhost:11434/api/generate &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
  "model": "gemma4-e2b-abliterated",
  "prompt": "Explain quicksort in 3 sentences.",
  "options": { "num_gpu": 0 }
}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Setting &lt;code&gt;num_gpu: 0&lt;/code&gt; tells Ollama to offload zero layers to the GPU, keeping the entire model in system RAM for CPU-only inference. This is how I isolated CPU vs GPU performance and tested mixed configurations.&lt;/p&gt;

&lt;p&gt;To verify VRAM allocation, &lt;code&gt;ollama ps&lt;/code&gt; gives you the breakdown:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;NAME                          SIZE     PROCESSOR    UNTIL
qwen3.6:35b                   32.2 GB  100% GPU     4 minutes from now
gemma4-e2b-abliterated:latest  4.1 GB  100% GPU     4 minutes from now
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On a discrete NVIDIA card you'd cross-reference with &lt;code&gt;nvidia-smi&lt;/code&gt;, but on an AMD APU the GTT allocation is only visible through &lt;code&gt;ollama ps&lt;/code&gt; or by reading &lt;code&gt;/sys/kernel/debug/dri/0/amdgpu_gem_info&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Benchmark Results
&lt;/h2&gt;

&lt;p&gt;Every test used identical prompts fired simultaneously at both models. I measured generation throughput (tokens/second) across solo and dual-model runs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Solo Baselines
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Parameters&lt;/th&gt;
&lt;th&gt;GPU (tok/s)&lt;/th&gt;
&lt;th&gt;CPU (tok/s)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;qwen3.6:35b&lt;/td&gt;
&lt;td&gt;36B (MoE)&lt;/td&gt;
&lt;td&gt;17.8&lt;/td&gt;
&lt;td&gt;--&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gemma4-e2b-abliterated&lt;/td&gt;
&lt;td&gt;4.6B&lt;/td&gt;
&lt;td&gt;42.9&lt;/td&gt;
&lt;td&gt;28.7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;qwen3:4b-instruct&lt;/td&gt;
&lt;td&gt;4B&lt;/td&gt;
&lt;td&gt;26.2&lt;/td&gt;
&lt;td&gt;19.6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;qwen2.5:1.5b&lt;/td&gt;
&lt;td&gt;1.5B&lt;/td&gt;
&lt;td&gt;--&lt;/td&gt;
&lt;td&gt;53.4&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fjoshgreen-dev.github.io%2Fimages%2Fllm-benchmarks%2Fernie-benchmark-results.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fjoshgreen-dev.github.io%2Fimages%2Fllm-benchmarks%2Fernie-benchmark-results.png" alt="Benchmark results comparison" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Dual-Model Runs
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Both on GPU&lt;/strong&gt; -- qwen3.6:35b + gemma4-e2b:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Solo&lt;/th&gt;
&lt;th&gt;Dual&lt;/th&gt;
&lt;th&gt;Performance Hit&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;qwen3.6:35b (GPU)&lt;/td&gt;
&lt;td&gt;17.8&lt;/td&gt;
&lt;td&gt;13.1&lt;/td&gt;
&lt;td&gt;-26%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gemma4-e2b (GPU)&lt;/td&gt;
&lt;td&gt;42.9&lt;/td&gt;
&lt;td&gt;25.3&lt;/td&gt;
&lt;td&gt;-41%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;GPU + tiny CPU&lt;/strong&gt; -- qwen3.6:35b (GPU) + qwen2.5:1.5b (CPU):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Solo&lt;/th&gt;
&lt;th&gt;Dual&lt;/th&gt;
&lt;th&gt;Performance Hit&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;qwen3.6:35b (GPU)&lt;/td&gt;
&lt;td&gt;17.8&lt;/td&gt;
&lt;td&gt;14.9&lt;/td&gt;
&lt;td&gt;-16%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;qwen2.5:1.5b (CPU)&lt;/td&gt;
&lt;td&gt;53.4&lt;/td&gt;
&lt;td&gt;26.2&lt;/td&gt;
&lt;td&gt;-51%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;GPU + medium CPU&lt;/strong&gt; -- qwen3.6:35b (GPU) + gemma4-e2b (CPU, &lt;code&gt;num_gpu=0&lt;/code&gt;):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Solo&lt;/th&gt;
&lt;th&gt;Dual&lt;/th&gt;
&lt;th&gt;Performance Hit&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;qwen3.6:35b (GPU)&lt;/td&gt;
&lt;td&gt;17.8&lt;/td&gt;
&lt;td&gt;13.0&lt;/td&gt;
&lt;td&gt;-27%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gemma4-e2b (CPU)&lt;/td&gt;
&lt;td&gt;28.7&lt;/td&gt;
&lt;td&gt;13.4&lt;/td&gt;
&lt;td&gt;-53%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;GPU + large-context CPU&lt;/strong&gt; -- qwen3.6:35b (GPU) + qwen3:4b-instruct (CPU, &lt;code&gt;num_gpu=0&lt;/code&gt;):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Solo&lt;/th&gt;
&lt;th&gt;Dual&lt;/th&gt;
&lt;th&gt;Performance Hit&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;qwen3.6:35b (GPU)&lt;/td&gt;
&lt;td&gt;17.8&lt;/td&gt;
&lt;td&gt;11.6&lt;/td&gt;
&lt;td&gt;-35%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;qwen3:4b-instruct (CPU)&lt;/td&gt;
&lt;td&gt;19.6&lt;/td&gt;
&lt;td&gt;11.1&lt;/td&gt;
&lt;td&gt;-43%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That last combination was the worst. The 4B instruct model supports 256K context, and its KV cache ballooned to 24.2 GB. Combined with the 35B model's 32 GB GPU allocation, we were saturating every available byte of bandwidth.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why It Happens: One Bus to Rule Them All
&lt;/h2&gt;

&lt;p&gt;On a discrete GPU setup, the CPU reads model weights from DDR5 over its memory controller while the GPU reads from its own GDDR6 over a completely separate bus (often 300+ GB/s). Two independent pipes, no contention.&lt;/p&gt;

&lt;p&gt;On an APU, both the Zen 4 CPU cores and the RDNA 3 compute units share a single memory controller connected to the same DDR5 DIMMs. The theoretical peak is ~80 GB/s, and that bandwidth is divided between every consumer.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fjoshgreen-dev.github.io%2Fimages%2Fllm-benchmarks%2Fernie-memory-architecture.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fjoshgreen-dev.github.io%2Fimages%2Fllm-benchmarks%2Fernie-memory-architecture.png" alt="Memory architecture bottleneck" width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;DDR5-5600 (96 GB) -- ~80 GB/s shared
       |
  +----+----+
  |         |
CPU cores  780M iGPU
(Zen 4)    (12 CUs)
  |         |
 model      model
weights    weights
  |         |
  +-- SAME MEMORY CONTROLLER --+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;LLM inference is almost entirely memory-bound. Each generated token requires streaming the model's weights through the compute units. A 35B MoE model activating 8 experts per token still needs to &lt;em&gt;read&lt;/em&gt; those expert weights from memory every single time. When a CPU-side model is doing the same thing simultaneously, the two streams compete for the same bandwidth.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fjoshgreen-dev.github.io%2Fimages%2Fllm-benchmarks%2Fernie-bottleneck.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fjoshgreen-dev.github.io%2Fimages%2Fllm-benchmarks%2Fernie-bottleneck.png" alt="Bottleneck explanation" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Even the "best" dual-model result (35B GPU + 1.5B CPU) cost 16% on the big model. The 1.5B model is tiny enough that its memory footprint barely dents bandwidth -- but it still halved its own throughput because the 35B model was dominating the bus.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Agent Framework Problem
&lt;/h2&gt;

&lt;p&gt;My original goal was a planner-executor agent setup: the 35B model reasons about what to do, a small model handles tool calls. Sounds efficient in theory.&lt;/p&gt;

&lt;p&gt;In practice, agent frameworks are sequential. The planner generates a plan, &lt;em&gt;then&lt;/em&gt; the executor runs a tool, &lt;em&gt;then&lt;/em&gt; the planner evaluates the result. At any given moment, only one model is actively generating. The other sits idle in memory, consuming VRAM or RAM that could instead feed the active model a larger context window.&lt;/p&gt;

&lt;p&gt;Combined with the MoE insight -- the 35B model already runs at small-model speeds -- the dual-model architecture solves a problem that does not exist on this hardware.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bonus: Finding Orphan Blobs in Ollama
&lt;/h2&gt;

&lt;p&gt;While investigating model storage during this project, I found 12.9 GB of wasted disk space. Ollama uses content-addressed storage under &lt;code&gt;~/.ollama/models/&lt;/code&gt;, so multiple model tags can reference the same weight blob. But when you delete a model, the blob sometimes lingers.&lt;/p&gt;

&lt;p&gt;Here's how to find orphans:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Collect every blob hash referenced by a manifest&lt;/span&gt;
find ~/.ollama/models/manifests &lt;span class="nt"&gt;-name&lt;/span&gt; &lt;span class="s1"&gt;'*'&lt;/span&gt; &lt;span class="nt"&gt;-type&lt;/span&gt; f &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-exec&lt;/span&gt; &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-oh&lt;/span&gt; &lt;span class="s1"&gt;'sha256:[a-f0-9]*'&lt;/span&gt; &lt;span class="o"&gt;{}&lt;/span&gt; &lt;span class="se"&gt;\;&lt;/span&gt; | &lt;span class="nb"&gt;sort&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /tmp/referenced_blobs.txt

&lt;span class="c"&gt;# 2. List every blob on disk&lt;/span&gt;
&lt;span class="nb"&gt;ls&lt;/span&gt; ~/.ollama/models/blobs/ | &lt;span class="nb"&gt;sed&lt;/span&gt; &lt;span class="s1"&gt;'s/-/:/g'&lt;/span&gt; | &lt;span class="nb"&gt;sort&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /tmp/disk_blobs.txt

&lt;span class="c"&gt;# 3. Find blobs on disk that no manifest references&lt;/span&gt;
&lt;span class="nb"&gt;comm&lt;/span&gt; &lt;span class="nt"&gt;-13&lt;/span&gt; /tmp/referenced_blobs.txt /tmp/disk_blobs.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Any hash that appears in the output of step 3 is an orphan. There's no &lt;code&gt;ollama prune&lt;/code&gt; command yet, so you delete them manually. On my system this reclaimed nearly 13 GB from a single forgotten blob.&lt;/p&gt;

&lt;p&gt;Also worth knowing: &lt;code&gt;qwen3.6:35b&lt;/code&gt;, &lt;code&gt;qwen3.6:latest&lt;/code&gt;, and &lt;code&gt;qwen3.6:35b-nothink&lt;/code&gt; all resolve to the same 23.9 GB blob. Ollama's content-addressing means you're not actually tripling your disk usage by pulling multiple tags of the same weights.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Verdict
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fjoshgreen-dev.github.io%2Fimages%2Fllm-benchmarks%2Fernie-conclusion-poster.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fjoshgreen-dev.github.io%2Fimages%2Fllm-benchmarks%2Fernie-conclusion-poster.png" alt="Conclusion summary" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you're running local LLMs on a shared-memory APU (any AMD APU, any Intel with Arc iGPU, any machine without a discrete GPU), here's the takeaway:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;One model at a time.&lt;/strong&gt; The memory bus is your bottleneck, and dual-model inference taxes it regardless of CPU/GPU split.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MoE models are your best friend on this hardware.&lt;/strong&gt; You get large-model reasoning quality at small-model inference cost. No need for a sidecar.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use your surplus RAM for context, not extra models.&lt;/strong&gt; A single 35B MoE with a 64K context window is more useful than two models fighting over bandwidth.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Watch for Ollama's iGPU memory reporting bug&lt;/strong&gt; (&lt;a href="https://github.com/ollama/ollama/issues/14953" rel="noopener noreferrer"&gt;#14953&lt;/a&gt;) -- loading multiple models can trigger OOM crashes because Ollama misjudges available iGPU memory.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit your blob storage.&lt;/strong&gt; Orphan blobs from deleted models add up fast.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The UM790Pro with 96 GB of DDR5 is genuinely impressive hardware for local inference. 17.8 tok/s from a 35B-class model on an integrated GPU, in a box the size of a paperback. Just don't try to make it do two things at once.&lt;/p&gt;

&lt;p&gt;If you're into 3D printing and web dev like me, check out &lt;a href="https://geometryviewer.com" rel="noopener noreferrer"&gt;GeometryViewer&lt;/a&gt; -- a browser-based 3D model viewer I built that runs great on this same hardware. And you can find my other projects on &lt;a href="https://github.com/joshgreen-dev" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Tested on: Minisforum UM790Pro, Ryzen 9 7940HS, 96 GB DDR5-5600, Ollama v0.9.x, Ubuntu Linux.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>minipc</category>
      <category>selfhosted</category>
    </item>
    <item>
      <title>Free LLMs on OpenRouter Keep Going 404. I Fixed It With 120 Lines of Python</title>
      <dc:creator>Josh Green</dc:creator>
      <pubDate>Sun, 08 Mar 2026 17:54:08 +0000</pubDate>
      <link>https://dev.to/josh_green_dev/free-llms-on-openrouter-keep-going-404-i-fixed-it-with-120-lines-of-python-43i1</link>
      <guid>https://dev.to/josh_green_dev/free-llms-on-openrouter-keep-going-404-i-fixed-it-with-120-lines-of-python-43i1</guid>
      <description>&lt;p&gt;I built a small pipeline on OpenClaw to stay on top of 3D printing news.&lt;/p&gt;

&lt;p&gt;Nothing fancy — a Python script that pulls from YouTube, RSS feeds, and Reddit, uses a free LLM to summarize what's worth reading, and emails me a digest. I use OpenRouter's free tier because I'm cheap and the models are good enough for summarization.&lt;/p&gt;

&lt;p&gt;It worked great. For about two weeks.&lt;/p&gt;

&lt;p&gt;Then I started getting errors.&lt;/p&gt;




&lt;h2&gt;
  
  
  The problem nobody talks about
&lt;/h2&gt;

&lt;p&gt;Here's something I didn't fully appreciate until it bit me: free models on OpenRouter change constantly. Models get added, removed, rate-limited into uselessness, or quietly replaced with different versions. If you hardcode your model list — which every tutorial tells you to do — you're building on sand.&lt;/p&gt;

&lt;p&gt;One morning I woke up to this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[06:03] LLM HTTP 404 [openai/gpt-oss-120b:free]: model not found
[06:03] LLM HTTP 429 [nousresearch/hermes-3-llama-3.1-405b:free]: rate limited
[06:03] LLM HTTP 404 [mistralai/mistral-small-3.1-24b-instruct:free]: model not found
[06:03] All free models exhausted — returning empty
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Three of my six hardcoded models were dead. The pipeline silently produced nothing. I missed a week of content before I noticed.&lt;/p&gt;

&lt;p&gt;Hardcoded lists are technical debt. Free model availability is a moving target. These two facts collide badly.&lt;/p&gt;


&lt;h2&gt;
  
  
  The fix: treat the model list as a live data source
&lt;/h2&gt;

&lt;p&gt;OpenRouter has a public endpoint — no auth required — that returns their full model catalog:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;GET https://openrouter.ai/api/v1/models
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;It returns ~346 models right now. Filtering to free ones with decent context windows gives you 10-15 candidates. The question is: which ones are actually worth using?&lt;/p&gt;

&lt;p&gt;I wanted to rank them. My criteria:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Context window&lt;/strong&gt; — longer is better for summarization. A 262K context model can swallow an entire article thread without chunking.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model size&lt;/strong&gt; — bigger models write better. A 70B model beats a 7B model for prose quality.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Historical reliability&lt;/strong&gt; — has this model actually worked when I've called it before?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That last one is the one nobody tracks. So I built tracking.&lt;/p&gt;


&lt;h2&gt;
  
  
  model-registry.py — the discovery layer
&lt;/h2&gt;

&lt;p&gt;The registry script runs once every 6 hours. It:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Checks if the cache (&lt;code&gt;~/.openclaw/free-models.json&lt;/code&gt;) is fresh — if yes, exits in &amp;lt;100ms (just a file stat)&lt;/li&gt;
&lt;li&gt;If stale, hits the OpenRouter catalog and scores every free model:
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;score_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context_length&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;context_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context_length&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# caps at 200
&lt;/span&gt;    &lt;span class="n"&gt;size_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_size_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;             &lt;span class="c1"&gt;# regex: 405b=200, 70b=140, 8b=50...
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;context_score&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;size_score&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ol&gt;
&lt;li&gt;Takes the top 10, writes them to &lt;code&gt;free-models.json&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Logs a diff — "Added: X, Removed: Y since last run"&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The diff log is where it gets interesting. On my first run after building this, I discovered two models I'd never heard of that scored in my top 6. One of them — &lt;code&gt;qwen/qwen3-next-80b-a3b-instruct:free&lt;/code&gt; — has a 262K context window and an 80B parameter count. It's now my primary model. It wasn't in any tutorial I'd read.&lt;/p&gt;


&lt;h2&gt;
  
  
  model-metrics.py — the performance layer
&lt;/h2&gt;

&lt;p&gt;HTTP 200 doesn't mean the model was useful. A model can return 200 with three sentences of hallucinated nonsense that breaks your JSON parser downstream.&lt;/p&gt;

&lt;p&gt;So I added tracking at two levels:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Level 1 — HTTP success:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;t0&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;urllib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;urlopen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;90&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="bp"&gt;...&lt;/span&gt;
    &lt;span class="nf"&gt;record_metric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;success&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                 &lt;span class="n"&gt;latency_ms&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;t0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                 &lt;span class="n"&gt;output_len&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;urllib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HTTPError&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;record_metric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;success&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                 &lt;span class="n"&gt;latency_ms&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;...,&lt;/span&gt; &lt;span class="n"&gt;error_code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;Level 2 — parse success (&lt;code&gt;parse_ok&lt;/code&gt;):&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After every call that expects structured JSON, I record whether the downstream parsing succeeded:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;call_free_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claim_extraction&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;update_parse_ok&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# output was actually usable
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;
&lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;JSONDecodeError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;update_parse_ok&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# model returned garbage
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;code&gt;parse_ok&lt;/code&gt; is the metric I care about most. It answers: was this model actually useful, not just technically responsive?&lt;/p&gt;

&lt;p&gt;After a week of pipeline runs, I get a table like this:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Model                                      calls  ok%  p_ok%  avg_ms  errors
meta-llama/llama-3.3-70b-instruct:free       47   94%   88%   1240ms
qwen/qwen3-next-80b-a3b-instruct:free        31   97%   91%   1180ms
openai/gpt-oss-120b:free                     12   58%   42%   1890ms  5×404
nousresearch/hermes-3-llama-3.1-405b:free    8    62%   55%   2100ms  3×404
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The last two models look fine on paper (they're large, they have long context) but they're dying constantly. Their scores get penalized:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;score_penalty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stats_entry&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;stats_entry&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ok_pct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt;   &lt;span class="c1"&gt;# heavy penalty
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;70&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;85&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mf"&gt;0.9&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;               &lt;span class="c1"&gt;# no penalty
&lt;/span&gt;
&lt;span class="n"&gt;final_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;catalog_score&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nf"&gt;score_penalty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;historical_stats&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;When the registry next refreshes, those models sink to the bottom of the fallback chain. Automatically. Without me touching anything.&lt;/p&gt;


&lt;h2&gt;
  
  
  The result
&lt;/h2&gt;

&lt;p&gt;The pipeline now:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Discovers new free models within 6 hours of them appearing on OpenRouter&lt;/li&gt;
&lt;li&gt;Drops dead models from the rotation within one pipeline run&lt;/li&gt;
&lt;li&gt;Prioritizes models with proven parse reliability, not just raw specs&lt;/li&gt;
&lt;li&gt;Costs $0.00 extra — one public HTTP GET every 6 hours&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The whole thing is ~250 lines across two files. No pip dependencies for the registry itself (stdlib only — &lt;code&gt;json&lt;/code&gt;, &lt;code&gt;urllib&lt;/code&gt;, &lt;code&gt;sqlite3&lt;/code&gt;). The metrics use SQLite so they survive reboots and redeploys.&lt;/p&gt;


&lt;h2&gt;
  
  
  Grab the code
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;model-registry.py&lt;/code&gt; and &lt;code&gt;model-metrics.py&lt;/code&gt; — both standalone, drop them next to any script that calls OpenRouter:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Replace your hardcoded list with this:
&lt;/span&gt;&lt;span class="n"&gt;REGISTRY_PATH&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;home&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.openclaw&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;free-models.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;_FALLBACK&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meta-llama/llama-3.3-70b-instruct:free&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;load_free_models&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;REGISTRY_PATH&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_text&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
        &lt;span class="n"&gt;models&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;models&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])]&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;models&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;pass&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_FALLBACK&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;FREE_MODELS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;load_free_models&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Run the registry as a preflight step before any pipeline that uses free models. If the cache is fresh, it exits immediately. If it's stale, it updates in ~1 second.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python3 model-registry.py &lt;span class="nt"&gt;--max-age&lt;/span&gt; 21600   &lt;span class="c"&gt;# refresh if &amp;gt;6h old&lt;/span&gt;
python3 your-pipeline.py                     &lt;span class="c"&gt;# now uses fresh model list&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;




&lt;p&gt;The thing I keep thinking about: I built this to find 3D printing news: the RepRap machines that print their own parts. Then foraging for news made me realize I needed this algorithm. Now the algorithm helps me find better news about the Van Neuman probe itself. It's turtles all the way down — but at least they're free turtles.&lt;/p&gt;



&lt;p&gt;Full code on GitHub Gist: 

&lt;/p&gt;
&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;





</description>
      <category>python</category>
      <category>llm</category>
      <category>tutorial</category>
      <category>openclaw</category>
    </item>
    <item>
      <title>Never Had Such a Good Grip on Funcional Prints Before</title>
      <dc:creator>Josh Green</dc:creator>
      <pubDate>Sat, 07 Mar 2026 12:47:50 +0000</pubDate>
      <link>https://dev.to/josh_green_dev/never-had-such-a-good-grip-on-funcional-prints-before-i-never-had-such-a-good-grip-on-funcional-1mnl</link>
      <guid>https://dev.to/josh_green_dev/never-had-such-a-good-grip-on-funcional-prints-before-i-never-had-such-a-good-grip-on-funcional-1mnl</guid>
      <description>&lt;p&gt;I bought my Bambu Lab P1S last year after moving to Budapest. Cheap rent, fast internet, and suddenly I had space for a hobby that wasn't just staring at VS Code for 14 hours a day.&lt;/p&gt;

&lt;p&gt;Everyone warned me I’d print "a few useful things then revert to benchies and baby Yodas." They weren’t entirely wrong, I definitely have a small army of calibration cubes I should throw out. But prints like the one I saw today remind me why I got into this in the first place.&lt;/p&gt;

&lt;p&gt;Someone &lt;a href="https://lemmy.world/post/43897576" rel="noopener noreferrer"&gt;designed and printed a custom hand control hoods backing&lt;/a&gt; for their road bike. Not a phone mount. Not a GoPro adapter. The actual rubberized part your palms rest on for hours during a ride.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Problem with Stock Parts&lt;/strong&gt;&lt;br&gt;
I spent three years doing Shopify theme development, and one thing that stuck with me: most products are designed for the mythical "average user." That works fine if you’re selling t-shirts. It doesn’t work when you’re talking about close contact human anatomy and high-performance equipment.&lt;/p&gt;

&lt;p&gt;Road bike hoods are a perfect example. They’re mass-manufactured to fit some statistical middle ground of hand size and grip preference. My hands are slightly larger than average — not enough that I need custom everything, but enough that stock hoods always feel slightly off after a century ride.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why This Print Matters&lt;/strong&gt;&lt;br&gt;
Being able to model your own hood backing and iterate on the shape changes everything. Too thick? Adjust the model. Want more texture in one area? Add it. Need a different angle for your specific bars? Measure twice, print once.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Become a Medium member&lt;/strong&gt;&lt;br&gt;
This is the kind of application that justifies the entire printer for me. Not the ability to make plastic trinkets cheaper than Amazon, but the ability to make things that don't exist for sale anywhere.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Material Question&lt;/strong&gt;&lt;br&gt;
Here's where it gets interesting. Hoods need specific properties — grip when your hands are sweaty, some give so they're comfortable, but enough structure that they don't deform when you're pulling hard on a climb. Weather resistance matters too.&lt;/p&gt;

&lt;p&gt;PLA won't survive a wet ride. ABS might work but it's not exactly pleasant against your skin. TPU seems like the obvious choice, though getting the right shore hardness would take some testing. Maybe nylon with a soft-touch coating?&lt;/p&gt;

&lt;p&gt;I haven’t printed cycling components myself yet. My P1S mostly runs functional prints — brackets, organizers, the occasional prototype for a friend’s product display. But seeing functional bike parts like this makes me want to branch out.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's Next&lt;/strong&gt;&lt;br&gt;
I need to find a local bike shop that's 3D-printing-curious. Or just suck it up and start modeling my own solutions to the minor annoyances that aren't worth designing and injection-molding, but are absolutely worth a weekend of CAD and a $2 print.&lt;/p&gt;

&lt;p&gt;If you’ve done cycling components, hit me up. I’d love to know what materials actually hold up in the real world — not just in theory, but after six months of road grime and sweat.&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
