Disclaimer: this is my personal experience. English is my second language — bear with the rough edges.
Who I am
My name is Davyd. I'm from Russia, and for the last few years I've been running the technical product team for an AI direction. Title on the badge: CPTO AI.
I've never built a personal brand or written articles before — I just worked. Sharing now, while I have the time and the mood for it.
The problem space
Working with GPUs and HPC in Russian reality over the last few years means:
Western AI platforms are off the table. Run:AI (acquired by NVIDIA), NVIDIA Base Command, HPE ML Development Environment, AWS SageMaker — all exist on the market, just not for us.
New hardware is expensive and slow to get. The cost of a mistake grows faster than the budget.
Companies have already bought the racks, but they don't know how to measure utilization, who's using what, or when GPUs are sitting idle. On a yearly horizon, that's hundreds of millions of rubles burned for nothing.
NVIDIA support in the usual sense doesn't exist. Racks, monitoring, FinOps — you assemble it all yourself.
My stance from that context: squeeze the maximum out of what you have. Everything else follows. Working under these constraints is genuinely interesting. It drives me.
Episode 1: Slurm + Kubernetes via CRD — July 2024
In July 2024 I sketched the architecture for bridging Slurm and Kubernetes through CRDs.
Original architecture sketch. File metadata: modified 2024–07–29.
The idea:
K8s is a great general-purpose control plane, but it has no dynamic GPU management: it sees the resource, but can't really claim and release it on the fly.
Slurm is a mature HPC scheduler with real GPU prioritization and queues.
Bridge them: slurmd registers inside K8s, Slurm jobs materialize as CRD objects. The user sees one API, K8s holds the lifecycle, Slurm distributes the GPUs.
I went to defend this architecture with leadership. No budget, no team. It didn't land.
What happened next: in August 2025, SchedMD — the people who actually build Slurm — started tagging Slinky, a Kubernetes operator for Slurm. First stable release v1.0.0 on November 20, 2025. That's 16 months after my sketch.
To be honest: I didn't build Slinky before SchedMD did. Their implementation is more detailed and more thought-through than my sketch: it has concrete CRD primitives (Controller, NodeSet, LoginSet), it solves engineering problems I hadn't even set up — accounting, SSSD, hybrid mode, the full Slurm feature set. It's 1000+ commits of a working operator, not a diagram.
What I had was the right direction and the right integration point: a CRD bridge between K8s and Slurm, with slurmd talking to the K8s API both ways. The exact coupling that Slinky NodeSet is built around today.
Run:AI as a reference architecture — I knew it. We couldn't buy it, they don't sell to our market. That was the whole driver of the task: you had to invent your own, because you couldn't buy one ready-made.
Lesson: a correct hypothesis without resources and packaging loses to time. You can spot an industry direction 16 months ahead of a GA release from the developers of Slurm themselves — and still not get internal funding. That says less about the hypothesis than about how corporations filter early bets.
Episode 2: AI Services — a shared foundation for a product portfolio
The company I worked at already had an AI product portfolio: computer vision, speech, conversational AI, and others. Each with its own installation, its own ops team, its own monitoring. Each one multi-tenant in its own right.
I proposed consolidating them onto a single infrastructure layer: one K8s installation, shared GPU MIG sharding, shared monitoring and billing. Products stay independent, but they share the infra.
Solo — from architecture to production:
Rewrote the installation of the first product (computer vision) for a modern stack, taught it to work with MIG in K8s.
Proved the multi-tenancy concept on a shared installation: client data doesn't cross, SLAs hold, production infra is sized for scale.
Designed the architecture so that subsequent products plug in with minimal rework — not a new stack for each one, but a new service on an existing foundation.
The unit economics are simple: N products → 1 ops team, 1 monitoring, 1 FinOps. Instead of N × N.
Lesson: in a company with multiple AI products, a shared infra layer gives you more than yet another product. If you can lay it down in advance, the difference in cost of ownership over time is a multiple, not a percentage.
Episode 3: AI Platform — defended idea, team from a pivot, production
Companies kept coming to me with the same pain: "We have GPUs, we don't understand what's going on with them." It rarely turned into real projects — not everyone had the resources for a full platform.
I collected those conversations, did market analysis, put together the decks, and went to defend the idea of a unified AI platform: our own FinOps, real utilization tracking, idle alerts, billing, a marketplace.
Again, no budget. What I got instead was the team from another project that hadn't taken off over several years — they'd been building cloud gaming (renting out remote GPU machines for games). Modern stack, a couple of V100s, real ops experience. I re-pointed them at the new goal.
Peak team size: 20+ engineers. Currently 3, after restructuring.
Audit and cleanup of the inherited stack
Threw out duplicated functionality and unnecessary CRDs.
Replaced JupyterHub with our own spawner — more flexible, native integration with our RBAC.
Dropped OpenCost — it was limited in how it calculated things. We built our own utilization metrics on top of Prometheus + DCGM Exporter.
What we built
A marketplace of open-source AI tools with a single role model. Fixed the ClearML integration at the RBAC level.
Inference and code-assistant services mapped to GPUs: spin up in ~60 seconds with their own UI or API.
Service restarts with state preserved.
Local model storage — the platform runs without access to external registries. Critical for closed environments.
Inference +30% through optimization for llama.cpp / ollama / vLLM.
Workspace — multiple services on the same network, ~60 seconds to spin up, enables fine-tuning of open-source models.
Local agentic systems with our own RAG and no-code builder; agents can be wired into workspace pipelines.
Real-time dashboards for CPU / RAM / GPU / DISK: availability, utilization, GPU temperature, minimal latency.
Billing across all resources.
Our own algorithm for calculating GPU utilization efficiency as a percentage — I couldn't find an analog that worked for our infrastructure.
Our own role model across the entire platform.
What this gives the business
One installation — dozens of teams.
Transparent FinOps: you can see who consumes what and where things idle.
Zero dependency on an external cloud — critical for regulated environments.
Time for a new team to onboard: hours, not weeks.
What I took from all of this
Infrastructure is the underrated layer of the AI stack. Everyone's writing bots, agents, and the application layer. Almost nobody's working on what all of it runs on. And that's exactly what defines product unit economics over the long run.
Under constraints, the winner is whoever squeezes the most out of the hardware they have, not whoever has the bigger procurement budget. xAI / Anthropic / OpenAI are a different mode. For the rest of us — FinOps and engineering discipline.
A correct idea without packaging and political support loses to time. Slurm + K8s via CRD in July 2024 vs Slinky v1.0.0 in November 2025 — 16 months. And I still walked into budget meetings and walked out empty-handed. That's an important lesson about how fast corporations actually move on early bets.
Where I am now
In Russia, budgets are tightening, part of the industry is leaving, access to Western tooling isn't getting easier, and neither is the political or infrastructural perimeter. And in my view, that's exactly what makes the engineering and product experience here meaningful: every decision has to stand on its own two feet, without crutches from ready-made Western services.
For collaboration — reach out.

Top comments (0)