Bharath Nelapatla

Posted on May 25

OpenShift Virtualization Migration Advisor — Local-First, Powered by Gemma 4 26B MoE

#devchallenge #gemmachallenge #ai #gemma

Gemma 4 Challenge: Build With Gemma 4 Submission

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

OpenShift Virtualization Migration Advisor — a local-first assessment tool that ingests legacy hypervisor configurations (VMware .vmx, libvirt domain XML, OVF, RHV/oVirt exports) and produces a structured migration report for moving workloads to Red Hat OpenShift Virtualization.

The problem it solves is specific and unglamorous: enterprises consolidating off vSphere and legacy KVM have a discovery bottleneck. Their VM inventories live in config files that contain infrastructure secrets — storage paths, VLAN topology, encryption key references, FIPS posture, licence keys. Sending those to a hosted LLM is a non-starter for regulated workloads.

So I built the assessment to run entirely on the host machine. Paste a config or upload an inventory → get a six-section migration report covering inventory mapping, OpenShift Virt primitive equivalents (VirtualMachine, DataVolume, NetworkAttachmentDefinition, StorageClass), compatibility risk flags, MTV-vs-virt-v2v tooling recommendation, effort sizing, and security posture preservation. Nothing leaves the box.

Demo

The tool generated this report from a single VMware .vmx file, fully on-device using Gemma 4 26B MoE via Ollama. Screenshots below; full generated output included so judges can see the actual model output, not a curated highlight.

Local Gradio UI. Pick a sample → click Generate → report streams in from local Gemma 4.

Six-section migration report streamed from Gemma 4 26B MoE running locally. No cloud API calls.

Full Report (generated locally)

From samples/vmware-web-tier.vmx — 311.7s on a 16 GB-VRAM laptop with 85% CPU offload

> Source detected: VMware vSphere (.vmx)
> Model: gemma4:26b
> Generated locally in: 311.7s on a consumer laptop GPU + CPU offload

1. Inventory Summary

Name	vCPU	Memory	Disk(s)	Network(s)	Guest OS	Notes
web-tier-01	4 (2 sockets × 2 cores)	8192 MiB	web-tier-01.vmdk	VLAN-100-DMZ	rhel9-64	Production Nginx/PHP-FPM; requires UEFI Secure Boot; vSphere encrypted.

2. OpenShift Virtualization Equivalents

Resource	OpenShift Virtualization Primitive	Configuration Details
Compute	`VirtualMachine`	`spec.template.spec.domain.cpu`: 2 sockets, 2 cores; `spec.template.spec.domain.resources.requests.memory: 8Gi`
Firmware	`VirtualMachine`	`spec.template.spec.domain.firmware: efi` (Secure Boot enabled)
Storage	`DataVolume`	`storageClassName: [not specified in source]`; requires conversion from `pvscsi` to `virtio-scsi`
Network	`NetworkAttachmentDefinition`	Target VLAN-100-DMZ via Multus CNI

3. Compatibility & Risk Flags

Critical — SCSI Controller Mismatch: Source uses pvscsi. OpenShift Virt requires virtio-scsi. Driver injection or kernel reconfiguration may be necessary if the RHEL 9 initramfs lacks virtio drivers (unlikely for RHEL 9).
Critical — Encryption / KMS Breakage: Source uses vSphere-specific KMS (vsphere-kms-key-001). This encryption cannot be migrated. Data must be decrypted during migration or re-encrypted using OpenShift-native methods (encrypted PVCs).
High — Network Driver Mismatch: vmxnet3 must be replaced with virtio for optimal performance and compatibility.
High — UEFI Secure Boot: OpenShift Virt supports UEFI; the migration pipeline must ensure the VirtualMachine object is correctly configured with firmware: efi and a bootloader compatible with the new virtual hardware.
Medium — CD-ROM dependency: The rhel-9.4.iso is attached. Must be manually re-attached via a ConfigMap or DataVolume if required for post-migration configuration.

4. Migration Path

web-tier-01: MTV (Migration Toolkit for Virtualization).
Justification: MTV is the recommended path for vSphere-to-OpenShift migrations — it automates the conversion of VMDK to DataVolume and handles the mapping of vSphere networks to NetworkAttachmentDefinition.

5. Effort & TCO Notes

Effort (web-tier-01): S Justification: Single, modern Linux guest (RHEL 9) with standard hardware requirements and no complex PCI passthrough.
TCO / Licensing:
- Requires OpenShift Container Platform (OCP) subscription
- RHEL guest licensing applies (standard RHEL / RHEL for Virtual Datacenters)
Consolidation: If other web-tier nodes exist in the source, migrate them in a single MTV plan to reduce migration window and operational overhead.

6. Security & Compliance

Source Security Feature	OpenShift Virtualization Equivalent
vSphere Encryption (KMS)	Encrypted PVCs (via StorageClass / CSI) or `dm-crypt` within the Guest OS
VLAN-100-DMZ Isolation	`NetworkAttachmentDefinition` + `NetworkPolicy` for micro-segmentation
UEFI Secure Boot	`VirtualMachine` spec `firmware: efi` with Secure Boot enabled
Production Workload Isolation	Namespace-level isolation in OpenShift

Code

Repository: https://github.com/Bharathtrainer/openshift-migration-advisor

How I Used Gemma 4

I chose Gemma 4 26B MoE (gemma4:26b) after starting on 31B Dense and discovering it was the wrong tool for this workload.

The honest path: I picked 31B Dense first because the highest-quality reasoning seemed like the obvious choice for infrastructure assessment. Two problems surfaced on real-world inputs:

Ollama Flash Attention prefill stall on Dense (ollama#15350) hangs the 31B variant on prompts beyond ~3–4K tokens. A multi-VM datacenter inventory blows past that on the first VM. The bug is specific to Dense's hybrid sliding+global attention; MoE handles the same prompts cleanly.
Active-parameter efficiency. 26B MoE activates ~4B parameters per token versus 31B for Dense. On a consumer laptop GPU, that's the difference between a model that works (with some CPU offload) and one that doesn't fit at all.

What I kept from picking MoE over Dense:

256K context window — enough to ingest an entire small-datacenter inventory in one shot
Stable long-prompt prefill on Ollama's current build
Native reasoning mode via the <|think|> system-prompt token
Workable throughput on consumer hardware — generation runs even when 85% of layers spill to CPU

Honest performance note: the report above generated in 311.7 seconds on a 16 GB-VRAM laptop GPU with 85% CPU offload (ollama ps confirms the split). On a workstation with 24+ GB VRAM the same generation should land in 30–60 seconds. This is exactly the kind of detail you want a tool to expose, not hide — local AI's pitch is data sovereignty, and the tradeoff is hardware-dependent latency. Field engineers running this for offline assessment will accept 5 minutes for a report they can't legally send to a cloud API.

When MoE is not the right pick: short, single-turn, hard math/code reasoning where Dense's per-token capacity matters more than throughput. For long, structured, enterprise-document reasoning over large configs, MoE wins. That's the call this build makes, and the rationale is documented in the README with the GitHub issue link, not vibes.

One Gemma 4-specific detail worth flagging: I follow the recommended sampling (temperature=1.0, top_p=0.95, top_k=64) and set OLLAMA_FLASH_ATTENTION=1 + OLLAMA_KV_CACHE_TYPE=q4_0 to keep the KV cache compact enough for a 16K context window. Those four config values are the difference between this running at usable speed and not running at all.

Built entirely on a laptop. No cloud API key was used at any point in the construction of this submission. The report you see above was generated by Gemma 4 running on the same machine.

DEV Community