César Sepúlveda Barra

Posted on Feb 24 • Originally published at Medium

I Built a Kubernetes Operator That Programs My Cisco Router

#cisco #kubernetes #automation #networking

I wrote a Kubernetes operator in Go that talks to a Cisco 4331 router via RESTCONF. It creates VLANs, DHCP pools, and ACLs on the router, all triggered by kubectl apply. Pods get their IPs directly from the router's DHCP server, and inter-VLAN traffic is controlled by real ACLs running on real hardware. This is the full walkthrough.

1. Why?

Kubernetes networking is, by default, flat. Every pod can reach every other pod. That's fine for many workloads, but in plenty of scenarios you actually want segmentation. You want the database on a different network than the web servers. You want firewall rules between them.

There are better tools for this

I want to be honest from the start: if you need network segmentation in Kubernetes today, use Cilium or Calico. They provide NetworkPolicy enforcement, eBPF based segmentation, encryption, observability. They work in software, they scale, and thousands of companies run them in production. That's the right answer for most people.

If you're deep in the Cisco world, ACI with Nexus switches is the official enterprise play. It integrates natively with Kubernetes, gives you policy-driven microsegmentation, multi-tenant networking, full visibility. But it requires Nexus 9000 hardware and APIC controllers, and that's a serious investment.

So why did I do this?

I had a Cisco 4331 router and a Catalyst switch on my desk. Not data center gear. Just mid-range networking equipment, the kind of thing you can pick up on eBay for a couple hundred bucks. The 4331 runs IOS-XE 16.12 and has RESTCONF enabled: a REST API for managing the router configuration over HTTPS.

I wanted to see if I could wire that into Kubernetes. Not through some vendor plugin, but through a custom operator that I wrote from scratch. The idea was simple: define VLANs and policies as Kubernetes CRDs, and let the operator program the router to make them real.

A pod says "I belong to VLAN 10." The operator creates the subinterface on the router, sets up DHCP, writes the ACLs, and the pod gets an IP from the router. No CLI sessions. No separate workflow. Just YAML.

This was never about competing with Cilium or replacing ACI. It was about proving that the operator pattern can extend Kubernetes to control physical network infrastructure, even with equipment that wasn't designed for this.

2. Physical Architecture

This runs on real hardware. No simulations, no GNS3, no virtual routers.

The Lab

Component	Model	IP	Role
Router	Cisco 4331 (IOS-XE 16.12)	192.168.200.1	Gateway, DHCP, ACLs, RESTCONF API
Switch	Cisco Catalyst	192.168.200.2	L2 switching, VLAN trunks
Node 1	Mini PC (8 vCPU, 32 GB)	192.168.200.11	K3s server + worker
Node 2	Mini PC (8 vCPU, 24 GB)	192.168.200.12	K3s server + worker
Node 3	Mini PC (4 vCPU, 16 GB)	192.168.200.13	K3s server + worker

Network Topology

Two NICs per Node

Each node has two network interfaces. This is key to the whole design.

NIC1 (enp1s0): Management. Goes to an access port on the switch, VLAN 1, network 192.168.200.0/24. Carries all the K3s traffic: API server, etcd, Flannel overlay.
NIC2 (enp2s0): Trunk. Goes to a trunk port on the switch that allows VLANs 10, 20, 30. This is the VLAN data plane. Pods attach to this interface via macvlan.

Switch Ports

Ports connected to NIC2 on each node: trunk mode, allowing tagged traffic for VLANs 10, 20, 30
Ports connected to NIC1 on each node: access mode, VLAN 1 (management)
Uplink to router Gi0/0/0: trunk mode (802.1Q), carrying all VLANs

Router on a Stick

The router's GigabitEthernet0/0/0 is a trunk. It has subinterfaces for each VLAN (.10, .20, .30), each one with its own IP, DHCP pool, and ACL. All inter-VLAN traffic goes through the router, where ACLs decide what passes and what gets dropped.

Two Networks per Pod

Every pod ends up with two interfaces:

Pod
├── eth0:  10.42.x.x  (Flannel overlay: K8s API, DNS, Services)
└── net1: 172.16.x.x  (VLAN via macvlan: app data traffic)

Kubernetes internal stuff (API, DNS, service discovery) goes over Flannel. Application traffic between VLANs goes through the physical network, through the router, through real ACLs.

3. Step by Step

Starting from a completely clean cluster. No operator installed, no VLANs, no demo apps. Everything from scratch.

3.1 Install the Operator

helm install cisco-restconf-operator ./charts/cisco-restconf-operator \
    --set image.tag=v0.8.1 \
    --set router.username=admin \
    --set router.password='1234'

NAME: cisco-restconf-operator
LAST DEPLOYED: Tue Feb 24 15:08:32 2026
NAMESPACE: default
STATUS: deployed

One command. That's it. Here's what it created:

Three CRDs registered in the Kubernetes API:

$ kubectl get crd | grep cisco
ciscorouterconfigs.cisco.io    2026-02-24T18:08:32Z
ciscovlans.cisco.io            2026-02-24T18:08:32Z
vlanpolicies.cisco.io          2026-02-24T18:08:32Z

The operator pod plus a DaemonSet running on all three nodes (handles VLAN sub-interfaces and the DHCP CNI daemon):

$ kubectl get pods -n cisco-operator-system
NAME                                                         READY   STATUS
cisco-restconf-operator-controller-manager-74bdd6c5c-x48f2   1/1     Running
cisco-restconf-operator-vlan-setup-6qgdw                     2/2     Running
cisco-restconf-operator-vlan-setup-7dj8f                     2/2     Running
cisco-restconf-operator-vlan-setup-gldg4                     2/2     Running

Two secrets: one with the router credentials, one with TLS certs for the mutating webhook:

$ kubectl get secret -n cisco-operator-system | grep -E "router|webhook"
cisco-restconf-operator-webhook-tls   kubernetes.io/tls   2
cisco-router-credentials              Opaque              2

And here's the interesting part. The Helm chart also creates a CiscoRouterConfig object, and the operator immediately connects to the router via RESTCONF:

$ kubectl get ciscorouterconfigs -o wide
NAME      HOST            CONNECTED   MESSAGE                                  AGE
default   192.168.200.1   true        Connected to LAB-ROUTER (IOS-XE 16.12)   26s

Look at the status:

status:
  connected: true
  hostname: LAB-ROUTER
  lastConnected: "2026-02-24T18:08:35Z"
  message: Connected to LAB-ROUTER (IOS-XE 16.12)
  version: "16.12"

The operator is alive, talking to the router, and reporting back its hostname and firmware version. It will re-validate the connection every 5 minutes.

3.2 Create VLANs

Three VLANs for the demo:

# 00-vlans.yaml
apiVersion: cisco.io/v1alpha1
kind: CiscoVLAN
metadata:
  name: vlan-10
spec:
  vlanId: 10
  cidr: "172.16.10.0/24"
  gateway: "172.16.10.1"
  dhcpRange:
    start: "172.16.10.100"
    end: "172.16.10.200"
---
apiVersion: cisco.io/v1alpha1
kind: CiscoVLAN
metadata:
  name: vlan-20
spec:
  vlanId: 20
  cidr: "172.16.20.0/24"
  gateway: "172.16.20.1"
  dhcpRange:
    start: "172.16.20.100"
    end: "172.16.20.200"
---
apiVersion: cisco.io/v1alpha1
kind: CiscoVLAN
metadata:
  name: vlan-30
spec:
  vlanId: 30
  cidr: "172.16.30.0/24"
  gateway: "172.16.30.1"
  dhcpRange:
    start: "172.16.30.100"
    end: "172.16.30.200"

$ kubectl apply -f 00-vlans.yaml
ciscovlan.cisco.io/vlan-10 created
ciscovlan.cisco.io/vlan-20 created
ciscovlan.cisco.io/vlan-30 created

A few seconds later, all three are Active:

$ kubectl get ciscovlans -o wide
NAME      VLAN-ID   CIDR             GATEWAY       STATE    PODS   AGE
vlan-10   10        172.16.10.0/24   172.16.10.1   Active          18s
vlan-20   20        172.16.20.0/24   172.16.20.1   Active          18s
vlan-30   30        172.16.30.0/24   172.16.30.1   Active          18s

The status on each VLAN tells you exactly what got created on the router:

status:
  state: Active
  routerSubinterface: GigabitEthernet0/0/0.10
  message: VLAN 10 configured on GigabitEthernet0/0/0.10
  lastReconciled: "2026-02-24T18:09:13Z"

So what actually happened on the router? I queried it via RESTCONF to confirm.

Subinterfaces created, each one with its own IP and ip nat inside:

GigabitEthernet0/0/0       (trunk, no IP)
GigabitEthernet0/0/0.10    172.16.10.1/24
GigabitEthernet0/0/0.20    172.16.20.1/24
GigabitEthernet0/0/0.30    172.16.30.1/24

DHCP pools, one per VLAN:

{ "id": "VLAN10_POOL", "network": "172.16.10.0/24", "default-router": "172.16.10.1" }
{ "id": "VLAN20_POOL", "network": "172.16.20.0/24", "default-router": "172.16.20.1" }
{ "id": "VLAN30_POOL", "network": "172.16.30.0/24", "default-router": "172.16.30.1" }

Base ACLs. This is important. By default, each VLAN can talk to itself and to the internet, but is blocked from reaching any other VLAN:

VLAN10_ACL:
  10  permit ip   172.16.10.0 0.0.0.255 -> 172.16.10.0 0.0.0.255    (intra-VLAN ok)
  20  deny   ip   172.16.10.0 0.0.0.255 -> 172.16.20.0 0.0.0.255    (block VLAN 20)
  30  deny   ip   172.16.10.0 0.0.0.255 -> 172.16.30.0 0.0.0.255    (block VLAN 30)
1000  permit ip   any -> any                                          (internet ok)

VLAN20_ACL:
  10  permit ip   172.16.20.0 0.0.0.255 -> 172.16.20.0 0.0.0.255
  20  deny   ip   172.16.20.0 0.0.0.255 -> 172.16.10.0 0.0.0.255
  30  deny   ip   172.16.20.0 0.0.0.255 -> 172.16.30.0 0.0.0.255
1000  permit ip   any -> any

VLAN30_ACL:
  10  permit ip   172.16.30.0 0.0.0.255 -> 172.16.30.0 0.0.0.255
  20  deny   ip   172.16.30.0 0.0.0.255 -> 172.16.10.0 0.0.0.255
  30  deny   ip   172.16.30.0 0.0.0.255 -> 172.16.20.0 0.0.0.255
1000  permit ip   any -> any

One kubectl apply. The router now has subinterfaces, DHCP pools, and ACLs for three VLANs. I didn't type a single command on the router CLI.

3.3 Deploy Applications (Before Policy)

The demo uses three pods:

postgres on VLAN 20: a PostgreSQL database
hello-world on VLAN 10: a Flask app that tries to connect to PostgreSQL
no-hello-world on VLAN 30: the exact same Flask app, but on a different VLAN

# 04-apps.yaml (simplified)
apiVersion: v1
kind: Pod
metadata:
  name: hello-world
  namespace: demo
  annotations:
    cisco.io/vlan: vlan-10          # <-- This is all you need
spec:
  containers:
    - name: hello-world
      image: 192.168.100.254:5000/hello-world:v1
      env:
        - name: DB_HOST
          value: "postgres-vlan.demo.svc.cluster.local"

The pod only has cisco.io/vlan: vlan-10. A mutating webhook intercepts pod creation and injects the Multus network annotation automatically. The app uses DNS (postgres-vlan.demo.svc.cluster.local) to find the database. The operator creates a headless Service that resolves to the database's DHCP-assigned VLAN IP.

$ kubectl apply -f 04-apps.yaml
namespace/demo created
pod/postgres created
pod/hello-world created
pod/no-hello-world created

$ kubectl get pods -n demo -o wide
NAME             READY   STATUS    IP           NODE
hello-world      1/1     Running   10.42.1.97   node-2
no-hello-world   1/1     Running   10.42.1.98   node-2
postgres         1/1     Running   10.42.0.93   node-1

The VLAN CRDs now show one active pod each:

$ kubectl get ciscovlans -o wide
NAME      VLAN-ID   CIDR             GATEWAY       STATE    PODS
vlan-10   10        172.16.10.0/24   172.16.10.1   Active   1
vlan-20   20        172.16.20.0/24   172.16.20.1   Active   1
vlan-30   30        172.16.30.0/24   172.16.30.1   Active   1

I port-forwarded both Flask apps to my laptop:

kubectl port-forward pod/hello-world -n demo 8080:8080 &
kubectl port-forward pod/no-hello-world -n demo 8081:8080 &

Open http://localhost:8080 and http://localhost:8081. Both apps have a dashboard that auto-refreshes every second, showing whether the database is reachable.

Both show DB UNREACHABLE. Red badge on both.

// hello-world (VLAN 10) at http://localhost:8080/api/status
{
    "db_reachable": false,
    "db_error": "connection failed: No route to host",
    "net1_ip": "172.16.10.2",
    "pod": "hello-world"
}

// no-hello-world (VLAN 30) at http://localhost:8081/api/status
{
    "db_reachable": false,
    "db_error": "connection failed: No route to host",
    "net1_ip": "172.16.30.2",
    "pod": "no-hello-world"
}

That's the base ACL doing its job. VLANs 10 and 30 both have deny rules blocking traffic to VLAN 20 where PostgreSQL lives. The segmentation is real, enforced on the router hardware, not in software.

3.4 Apply a VLANPolicy

Now let's selectively open access. We want hello-world (VLAN 10) to reach PostgreSQL (VLAN 20) on TCP 5432, but keep no-hello-world (VLAN 30) blocked.

# 01-policy.yaml
apiVersion: cisco.io/v1alpha1
kind: VLANPolicy
metadata:
  name: app-to-db
spec:
  source: vlan-10
  destination: vlan-20
  rules:
    - protocol: tcp
      ports: [5432]

$ kubectl apply -f 01-policy.yaml
vlanpolicy.cisco.io/app-to-db created

$ kubectl get vlanpolicies -o wide
NAME        SOURCE    DESTINATION   STATE     ACL-ENTRIES   AGE
app-to-db   vlan-10   vlan-20       Applied   10            9s

The operator updated both ACLs on the router:

status:
  state: Applied
  aclEntries: 10
  message: "ACL rules applied: VLAN10_ACL (5 entries), VLAN20_ACL (5 entries)"

Here's what the ACLs look like now on the router. The operator inserted permit rules before the existing deny rules:

VLAN10_ACL (after policy):
  10  permit tcp  172.16.10.0 -> 172.16.20.0 eq 5432    << NEW: allow TCP 5432
  20  permit ip   172.16.10.0 -> 172.16.10.0             (intra-VLAN)
  30  deny   ip   172.16.10.0 -> 172.16.20.0             (block everything else to VLAN 20)
  40  deny   ip   172.16.10.0 -> 172.16.30.0             (block VLAN 30)
1000  permit ip   any -> any                              (internet)

VLAN20_ACL (after policy):
  10  permit tcp  172.16.20.0 -> 172.16.10.0 established << NEW: return traffic only
  20  permit ip   172.16.20.0 -> 172.16.20.0
  30  deny   ip   172.16.20.0 -> 172.16.10.0
  40  deny   ip   172.16.20.0 -> 172.16.30.0
1000  permit ip   any -> any

Two things to notice:

VLAN10_ACL now has a permit tcp ... eq 5432 at sequence 10, before the deny. VLAN 10 can initiate TCP connections to VLAN 20 on the PostgreSQL port.
VLAN20_ACL has an established rule. This allows return traffic from connections that VLAN 10 started, but VLAN 20 cannot initiate new connections to VLAN 10.

VLAN30_ACL didn't change at all. No policy mentions VLAN 30, so it stays fully blocked.

3.5 The Result

Back to the browsers. The dashboards refresh every second.

hello-world (VLAN 10) at http://localhost:8080

{
    "db_reachable": true,
    "db_host": "postgres-vlan.demo.svc.cluster.local",
    "latency_ms": 7.2,
    "net1_ip": "172.16.10.2",
    "pod": "hello-world"
}

Green badge. DB REACHABLE, 7.2ms. It can also write and read data:

// GET http://localhost:8080/db
{
    "status": "SUCCESS",
    "inserted": {
        "id": 1,
        "created_at": "2026-02-24T18:14:04.158103"
    },
    "recent_messages": [
        {
            "content": "Hello from 172.16.10.2 at 18:14:04",
            "id": 1
        }
    ]
}

no-hello-world (VLAN 30) at http://localhost:8081

{
    "db_reachable": false,
    "db_error": "connection failed: timeout expired",
    "net1_ip": "172.16.30.2",
    "pod": "no-hello-world"
}

Red badge. DB UNREACHABLE. Same Docker image. Same code. Same DB_HOST env var. The only difference is one line in the YAML: cisco.io/vlan: vlan-30 instead of vlan-10. The router drops the traffic.

// GET http://localhost:8081/db
{
    "status": "ERROR",
    "error": "connection failed: No route to host"
}

4. What's Next

The full source code is on GitHub: github.com/csepulveda/cisco-kubernetes

Let me repeat what I said at the beginning: this is not production software. The code is tied to a specific router model (Cisco 4331), a specific IOS-XE version (16.12), and a specific physical setup. If you actually need network segmentation in Kubernetes, go with Cilium or Calico. If you're in the Cisco ecosystem and have the budget, ACI with Nexus is what it's designed for. Those are real, tested, supported solutions.

This project is something else. It's a lab experiment. A proof of concept. And the thing it proves is this: Kubernetes operators can manage anything that has an API.

The operator pattern is a control loop. It watches for desired state (your CRDs) and makes actual state match. Most operators manage stuff inside the cluster: databases, message queues, certificates. But nothing says it has to stay inside. The reconciliation loop doesn't care whether it's creating a Pod or configuring a physical router 3 meters away.

In this case it's a Cisco router. But it could be a firewall, a load balancer, a DNS provider, a cloud networking service, some legacy system with SOAP endpoints. If it has an API and it has state, you can write an operator for it.

That's what I find exciting about this. I wired a mid-range Cisco 4331 (not a Nexus 9000, not an ACI fabric, just a regular router) into a Kubernetes control plane using nothing but RESTCONF and Go. No vendor SDK, no proprietary integration, no expensive hardware. Just the operator pattern doing what it does best: turning YAML into reality.

Built with Go, Kubebuilder, K3s, and a lot of patience debugging YANG models at 2am.

DEV Community