<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Clément SAILLANT</title>
    <description>The latest articles on DEV Community by Clément SAILLANT (@clment_saillant_753b0b54).</description>
    <link>https://dev.to/clment_saillant_753b0b54</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3885365%2Fbd1bf8ec-dba6-4a59-84e5-24798573dd03.png</url>
      <title>DEV Community: Clément SAILLANT</title>
      <link>https://dev.to/clment_saillant_753b0b54</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/clment_saillant_753b0b54"/>
    <language>en</language>
    <item>
      <title>Open-sourcing a full fine-tuning pipeline for embedded engineering — training toolkit + 35-domain MoE-LoRA model</title>
      <dc:creator>Clément SAILLANT</dc:creator>
      <pubDate>Sat, 18 Apr 2026 02:23:26 +0000</pubDate>
      <link>https://dev.to/clment_saillant_753b0b54/open-sourcing-a-full-fine-tuning-pipeline-for-embedded-engineering-training-toolkit-35-domain-466b</link>
      <guid>https://dev.to/clment_saillant_753b0b54/open-sourcing-a-full-fine-tuning-pipeline-for-embedded-engineering-training-toolkit-35-domain-466b</guid>
      <description>&lt;p&gt;At L'Électron Rare we build FineFab — a local-first, multi-machine AI-native platform for manufacturing and electronics engineering. This week we open-sourced the full fine-tuning pipeline: training toolkit and output model. Here's what it looks like, and why we built it this way.&lt;br&gt;
The frustration that started it&lt;br&gt;
Every embedded engineer I know has the same story with generalist LLMs.&lt;br&gt;
You ask GPT-4 to review an STM32 peripheral configuration and it confidently suggests a timer channel mapping that doesn't exist on that MCU family. You ask Claude to debug a SPICE .AC simulation and it hallucinates .PRINT syntax. You ask Gemini to fix a KiCad footprint and it describes Eagle shortcuts. These aren't edge cases — they're the modal failure of big generalist models in narrow technical domains.&lt;br&gt;
After six months of living this in our consulting work — embedded systems for cultural and performance industries, escape rooms, live shows, industrial prototypes — we decided to do something about it.&lt;br&gt;
Two public releases, one week&lt;br&gt;
16/04 — KIKI-Mac_tunner (training toolkit)&lt;br&gt;
MLX fine-tuning toolkit for Mac Studio, designed to distill Claude Opus reasoning into Mistral Large 123B. Apache 2.0. Runs on Apple Silicon, takes advantage of unified memory for the adapter stage.&lt;br&gt;
17/04 — micro-kiki-v3 (model)&lt;br&gt;
A cognitive LLM stack specialized in embedded systems engineering. Not a flat fine-tune — a routed architecture built on top of Qwen3.5-35B-A3B (MoE, 256 experts, 3B active per token).&lt;br&gt;
Both Apache 2.0. The full pipeline is open, not just the artifact.&lt;br&gt;
Architecture — why routed stacks instead of one big fine-tune&lt;br&gt;
The design intuition is simple. Fine-tuning one monolithic model on a mixed embedded corpus smears the distinctive patterns of each sub-discipline. Training one LoRA stack per domain and picking the relevant stack(s) at inference preserves those patterns.&lt;/p&gt;

&lt;p&gt;Domain router — classifier selects top-4 among 35 domain-specific LoRA stacks per request.&lt;br&gt;
Base model — Qwen3.5-35B-A3B (MoE 256 experts, 3B active/token). LoRA rank 16 on q/k/v/o projections, top-2 routing per stack.&lt;br&gt;
Null-space projection between stacks reduces catastrophic forgetting when combining domains.&lt;br&gt;
Negotiator (CAMP + Catfish) arbitrates conflicting stack outputs — typical case: STM32 power-on sequencing vs. EMC suppression guidance, both technically correct but domain-priority-dependent.&lt;br&gt;
Anti-bias layer (KnowBias + RBD) before output.&lt;br&gt;
Aeon memory (Atlas graph + Trace log) for cross-session persistence.&lt;/p&gt;

&lt;p&gt;Context 262K tokens, GGUF, runs on llama.cpp / Ollama / LM Studio.&lt;br&gt;
35 domains covered&lt;br&gt;
Conversation (chat-fr, reasoning), code (Python, TypeScript, C/C++, Rust, shell, SQL), infrastructure (Docker, DevOps, LLM-ops, ML-training), electronics (KiCad DSL, KiCad PCB, SPICE, components, power, EMC, DSP), hardware (embedded, STM32, IoT, PlatformIO), CAD (FreeCAD), web (frontend, backend), plus music-audio, math, security.&lt;br&gt;
35 is pragmatic, not exhaustive. v4 will likely add RF and MEMS.&lt;br&gt;
Dataset — built honestly&lt;br&gt;
clemsail/micro-kiki-v3-dataset — 489K instruction-following examples, Apache 2.0.&lt;/p&gt;

&lt;p&gt;50,116 real Claude CLI sessions captured on our 5-node P2P mesh during actual embedded consulting work (GrosMac Apple M5, Tower 28 threads, CILS i7, KXKM-AI RTX 4090, VM bootstrap).&lt;br&gt;
2,529 Codex/Copilot sessions from 4 workstations.&lt;br&gt;
364,045 examples from 19 filtered open-source HF datasets (CodeFeedback, French-Alpaca, Electronics StackExchange, stm32-hal-dataset, JITX open-components-database, etc.).&lt;br&gt;
Opus teacher distillation for chat-fr and reasoning.&lt;br&gt;
32 original curated seed sets.&lt;/p&gt;

&lt;p&gt;Two points of honesty about this:&lt;/p&gt;

&lt;p&gt;The Claude CLI logs come from our own work, not clients. Everything went through a filter pass before inclusion.&lt;br&gt;
This is not a Meta-scale dataset. The strength is authenticity — examples map to how engineers actually use assistants in real debugging sessions. The weakness is coverage variance: some domains are thinner than others (DSP, RF, EMC).&lt;/p&gt;

&lt;p&gt;Infrastructure — 5-node P2P mesh&lt;br&gt;
The 50K+ Claude CLI examples were captured across five heterogeneous machines:&lt;br&gt;
NodeHardwareRoleGrosMacApple M5, 16 GBDev + P2P bridge, LAN + TailscaleVM6.8 GB RAM, 4 CPUDocker host (29+ containers), P2P bootstrapTower31 GB RAM, 28 threadsLangfuse, LiteLLM, Piper TTS, OpenAI proxyCILS16 GB RAM, i7Ollama inference, most stable nodeKXKM-AI62 GB RAM, RTX 4090GPU inference, Unsloth, Qdrant, fine-tuning&lt;br&gt;
Ed25519 auth, DHT discovery. The mesh itself is part of the product, not just a side-effect.&lt;br&gt;
What I'd do differently&lt;/p&gt;

&lt;p&gt;Routing is manual right now. You pick which LoRA adapter(s) to load based on your task. Dynamic routing (learned classifier or attention-based expert selection) is on the v4 roadmap.&lt;br&gt;
Benchmark suite is internal. I have a held-out eval set and internal scores, but nothing reproducible-in-public. v4 will ship a benchmark suite you can run against the base Qwen3.5 for a reproducible comparison.&lt;br&gt;
Languages: trained on French + English interleaved. Most of our customer base is francophone. If you need English-only quality, YMMV.&lt;/p&gt;

&lt;p&gt;The meta-story&lt;br&gt;
L'Électron Rare is building FineFab publicly, component by component. Related repos in the ecosystem:&lt;/p&gt;

&lt;p&gt;Kill_LIFE — spec-first agentic methodology (BMAD agents, gates, evidence packs)&lt;br&gt;
mascarade — multi-machine agentic LLM orchestration (P2P mesh, 8 providers)&lt;br&gt;
KiC-AI — AI-powered PCB design assistant for KiCad&lt;br&gt;
prima-cpp — distributed LLM inference, CUDA + ZMQ&lt;/p&gt;

&lt;p&gt;Full org: github.com/L-electron-Rare.&lt;br&gt;
What I want from you&lt;/p&gt;

&lt;p&gt;Benchmarks against base Qwen3.5 / GPT-4 / Claude on embedded-specific tasks. Community runs matter more than my internal eval.&lt;br&gt;
Edge cases where the router picks the wrong stack — feedback directly improves v4.&lt;br&gt;
Memory/inference regressions on your hardware. Q4_K_M works cleanly on Apple Silicon 32 GB+ and RTX 4090; other configs untested.&lt;br&gt;
Domains we missed. We'll add in v4.&lt;/p&gt;

&lt;p&gt;Everything is Apache 2.0. Fork it, benchmark it, break it. That's the point.&lt;br&gt;
Discussion thread open on HF: micro-kiki-v3/discussions/1.&lt;/p&gt;

&lt;p&gt;"I would rather be a cyborg than a goddess." — Donna Haraway&lt;/p&gt;

</description>
      <category>embedded</category>
      <category>opensource</category>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
