Open-sourcing a full fine-tuning pipeline for embedded engineering — training toolkit + 35-domain MoE-LoRA model

Clément SAILLANT — Sat, 18 Apr 2026 02:23:26 +0000

At L'Électron Rare we build FineFab — a local-first, multi-machine AI-native platform for manufacturing and electronics engineering. This week we open-sourced the full fine-tuning pipeline: training toolkit and output model. Here's what it looks like, and why we built it this way.
The frustration that started it
Every embedded engineer I know has the same story with generalist LLMs.
You ask GPT-4 to review an STM32 peripheral configuration and it confidently suggests a timer channel mapping that doesn't exist on that MCU family. You ask Claude to debug a SPICE .AC simulation and it hallucinates .PRINT syntax. You ask Gemini to fix a KiCad footprint and it describes Eagle shortcuts. These aren't edge cases — they're the modal failure of big generalist models in narrow technical domains.
After six months of living this in our consulting work — embedded systems for cultural and performance industries, escape rooms, live shows, industrial prototypes — we decided to do something about it.
Two public releases, one week
16/04 — KIKI-Mac_tunner (training toolkit)
MLX fine-tuning toolkit for Mac Studio, designed to distill Claude Opus reasoning into Mistral Large 123B. Apache 2.0. Runs on Apple Silicon, takes advantage of unified memory for the adapter stage.
17/04 — micro-kiki-v3 (model)
A cognitive LLM stack specialized in embedded systems engineering. Not a flat fine-tune — a routed architecture built on top of Qwen3.5-35B-A3B (MoE, 256 experts, 3B active per token).
Both Apache 2.0. The full pipeline is open, not just the artifact.
Architecture — why routed stacks instead of one big fine-tune
The design intuition is simple. Fine-tuning one monolithic model on a mixed embedded corpus smears the distinctive patterns of each sub-discipline. Training one LoRA stack per domain and picking the relevant stack(s) at inference preserves those patterns.

Domain router — classifier selects top-4 among 35 domain-specific LoRA stacks per request.
Base model — Qwen3.5-35B-A3B (MoE 256 experts, 3B active/token). LoRA rank 16 on q/k/v/o projections, top-2 routing per stack.
Null-space projection between stacks reduces catastrophic forgetting when combining domains.
Negotiator (CAMP + Catfish) arbitrates conflicting stack outputs — typical case: STM32 power-on sequencing vs. EMC suppression guidance, both technically correct but domain-priority-dependent.
Anti-bias layer (KnowBias + RBD) before output.
Aeon memory (Atlas graph + Trace log) for cross-session persistence.

Context 262K tokens, GGUF, runs on llama.cpp / Ollama / LM Studio.
35 domains covered
Conversation (chat-fr, reasoning), code (Python, TypeScript, C/C++, Rust, shell, SQL), infrastructure (Docker, DevOps, LLM-ops, ML-training), electronics (KiCad DSL, KiCad PCB, SPICE, components, power, EMC, DSP), hardware (embedded, STM32, IoT, PlatformIO), CAD (FreeCAD), web (frontend, backend), plus music-audio, math, security.
35 is pragmatic, not exhaustive. v4 will likely add RF and MEMS.
Dataset — built honestly
clemsail/micro-kiki-v3-dataset — 489K instruction-following examples, Apache 2.0.

50,116 real Claude CLI sessions captured on our 5-node P2P mesh during actual embedded consulting work (GrosMac Apple M5, Tower 28 threads, CILS i7, KXKM-AI RTX 4090, VM bootstrap).
2,529 Codex/Copilot sessions from 4 workstations.
364,045 examples from 19 filtered open-source HF datasets (CodeFeedback, French-Alpaca, Electronics StackExchange, stm32-hal-dataset, JITX open-components-database, etc.).
Opus teacher distillation for chat-fr and reasoning.
32 original curated seed sets.

Two points of honesty about this:

The Claude CLI logs come from our own work, not clients. Everything went through a filter pass before inclusion.
This is not a Meta-scale dataset. The strength is authenticity — examples map to how engineers actually use assistants in real debugging sessions. The weakness is coverage variance: some domains are thinner than others (DSP, RF, EMC).

Infrastructure — 5-node P2P mesh
The 50K+ Claude CLI examples were captured across five heterogeneous machines:
NodeHardwareRoleGrosMacApple M5, 16 GBDev + P2P bridge, LAN + TailscaleVM6.8 GB RAM, 4 CPUDocker host (29+ containers), P2P bootstrapTower31 GB RAM, 28 threadsLangfuse, LiteLLM, Piper TTS, OpenAI proxyCILS16 GB RAM, i7Ollama inference, most stable nodeKXKM-AI62 GB RAM, RTX 4090GPU inference, Unsloth, Qdrant, fine-tuning
Ed25519 auth, DHT discovery. The mesh itself is part of the product, not just a side-effect.
What I'd do differently

Routing is manual right now. You pick which LoRA adapter(s) to load based on your task. Dynamic routing (learned classifier or attention-based expert selection) is on the v4 roadmap.
Benchmark suite is internal. I have a held-out eval set and internal scores, but nothing reproducible-in-public. v4 will ship a benchmark suite you can run against the base Qwen3.5 for a reproducible comparison.
Languages: trained on French + English interleaved. Most of our customer base is francophone. If you need English-only quality, YMMV.

The meta-story
L'Électron Rare is building FineFab publicly, component by component. Related repos in the ecosystem:

Kill_LIFE — spec-first agentic methodology (BMAD agents, gates, evidence packs)
mascarade — multi-machine agentic LLM orchestration (P2P mesh, 8 providers)
KiC-AI — AI-powered PCB design assistant for KiCad
prima-cpp — distributed LLM inference, CUDA + ZMQ

Full org: github.com/L-electron-Rare.
What I want from you

Benchmarks against base Qwen3.5 / GPT-4 / Claude on embedded-specific tasks. Community runs matter more than my internal eval.
Edge cases where the router picks the wrong stack — feedback directly improves v4.
Memory/inference regressions on your hardware. Q4_K_M works cleanly on Apple Silicon 32 GB+ and RTX 4090; other configs untested.
Domains we missed. We'll add in v4.

Everything is Apache 2.0. Fork it, benchmark it, break it. That's the point.
Discussion thread open on HF: micro-kiki-v3/discussions/1.

"I would rather be a cyborg than a goddess." — Donna Haraway

DEV Community: Clément SAILLANT

Open-sourcing a full fine-tuning pipeline for embedded engineering — training toolkit + 35-domain MoE-LoRA model