I wrote a Kubernetes operator in Go that talks to a Cisco 4331 router via RESTCONF. It creates VLANs, DHCP pools, and ACLs on the router, all triggered by kubectl apply. Pods get their IPs directly from the router's DHCP server, and inter-VLAN traffic is controlled by real ACLs running on real hardware. This is the full walkthrough.
1. Why?
Kubernetes networking is, by default, flat. Every pod can reach every other pod. That's fine for many workloads, but in plenty of scenarios you actually want segmentation. You want the database on a different network than the web servers. You want firewall rules between them.
There are better tools for this
I want to be honest from the start: if you need network segmentation in Kubernetes today, use Cilium or Calico. They provide NetworkPolicy enforcement, eBPF based segmentation, encryption, observability. They work in software, they scale, and thousands of companies run them in production. That's the right answer for most people.
If you're deep in the Cisco world, ACI with Nexus switches is the official enterprise play. It integrates natively with Kubernetes, gives you policy-driven microsegmentation, multi-tenant networking, full visibility. But it requires Nexus 9000 hardware and APIC controllers, and that's a serious investment.
So why did I do this?
I had a Cisco 4331 router and a Catalyst switch on my desk. Not data center gear. Just mid-range networking equipment, the kind of thing you can pick up on eBay for a couple hundred bucks. The 4331 runs IOS-XE 16.12 and has RESTCONF enabled: a REST API for managing the router configuration over HTTPS.
I wanted to see if I could wire that into Kubernetes. Not through some vendor plugin, but through a custom operator that I wrote from scratch. The idea was simple: define VLANs and policies as Kubernetes CRDs, and let the operator program the router to make them real.
A pod says "I belong to VLAN 10." The operator creates the subinterface on the router, sets up DHCP, writes the ACLs, and the pod gets an IP from the router. No CLI sessions. No separate workflow. Just YAML.
This was never about competing with Cilium or replacing ACI. It was about proving that the operator pattern can extend Kubernetes to control physical network infrastructure, even with equipment that wasn't designed for this.
2. Physical Architecture
This runs on real hardware. No simulations, no GNS3, no virtual routers.
The Lab
| Component | Model | IP | Role |
|---|---|---|---|
| Router | Cisco 4331 (IOS-XE 16.12) | 192.168.200.1 | Gateway, DHCP, ACLs, RESTCONF API |
| Switch | Cisco Catalyst | 192.168.200.2 | L2 switching, VLAN trunks |
| Node 1 | Mini PC (8 vCPU, 32 GB) | 192.168.200.11 | K3s server + worker |
| Node 2 | Mini PC (8 vCPU, 24 GB) | 192.168.200.12 | K3s server + worker |
| Node 3 | Mini PC (4 vCPU, 16 GB) | 192.168.200.13 | K3s server + worker |
Network Topology
Two NICs per Node
Each node has two network interfaces. This is key to the whole design.
-
NIC1 (
enp1s0): Management. Goes to an access port on the switch, VLAN 1, network192.168.200.0/24. Carries all the K3s traffic: API server, etcd, Flannel overlay. -
NIC2 (
enp2s0): Trunk. Goes to a trunk port on the switch that allows VLANs 10, 20, 30. This is the VLAN data plane. Pods attach to this interface via macvlan.
Switch Ports
- Ports connected to NIC2 on each node: trunk mode, allowing tagged traffic for VLANs 10, 20, 30
- Ports connected to NIC1 on each node: access mode, VLAN 1 (management)
- Uplink to router Gi0/0/0: trunk mode (802.1Q), carrying all VLANs
Router on a Stick
The router's GigabitEthernet0/0/0 is a trunk. It has subinterfaces for each VLAN (.10, .20, .30), each one with its own IP, DHCP pool, and ACL. All inter-VLAN traffic goes through the router, where ACLs decide what passes and what gets dropped.
Two Networks per Pod
Every pod ends up with two interfaces:
Pod
├── eth0: 10.42.x.x (Flannel overlay: K8s API, DNS, Services)
└── net1: 172.16.x.x (VLAN via macvlan: app data traffic)
Kubernetes internal stuff (API, DNS, service discovery) goes over Flannel. Application traffic between VLANs goes through the physical network, through the router, through real ACLs.
3. Step by Step
Starting from a completely clean cluster. No operator installed, no VLANs, no demo apps. Everything from scratch.
3.1 Install the Operator
helm install cisco-restconf-operator ./charts/cisco-restconf-operator \
--set image.tag=v0.8.1 \
--set router.username=admin \
--set router.password='1234'
NAME: cisco-restconf-operator
LAST DEPLOYED: Tue Feb 24 15:08:32 2026
NAMESPACE: default
STATUS: deployed
One command. That's it. Here's what it created:
Three CRDs registered in the Kubernetes API:
$ kubectl get crd | grep cisco
ciscorouterconfigs.cisco.io 2026-02-24T18:08:32Z
ciscovlans.cisco.io 2026-02-24T18:08:32Z
vlanpolicies.cisco.io 2026-02-24T18:08:32Z
The operator pod plus a DaemonSet running on all three nodes (handles VLAN sub-interfaces and the DHCP CNI daemon):
$ kubectl get pods -n cisco-operator-system
NAME READY STATUS
cisco-restconf-operator-controller-manager-74bdd6c5c-x48f2 1/1 Running
cisco-restconf-operator-vlan-setup-6qgdw 2/2 Running
cisco-restconf-operator-vlan-setup-7dj8f 2/2 Running
cisco-restconf-operator-vlan-setup-gldg4 2/2 Running
Two secrets: one with the router credentials, one with TLS certs for the mutating webhook:
$ kubectl get secret -n cisco-operator-system | grep -E "router|webhook"
cisco-restconf-operator-webhook-tls kubernetes.io/tls 2
cisco-router-credentials Opaque 2
And here's the interesting part. The Helm chart also creates a CiscoRouterConfig object, and the operator immediately connects to the router via RESTCONF:
$ kubectl get ciscorouterconfigs -o wide
NAME HOST CONNECTED MESSAGE AGE
default 192.168.200.1 true Connected to LAB-ROUTER (IOS-XE 16.12) 26s
Look at the status:
status:
connected: true
hostname: LAB-ROUTER
lastConnected: "2026-02-24T18:08:35Z"
message: Connected to LAB-ROUTER (IOS-XE 16.12)
version: "16.12"
The operator is alive, talking to the router, and reporting back its hostname and firmware version. It will re-validate the connection every 5 minutes.
3.2 Create VLANs
Three VLANs for the demo:
# 00-vlans.yaml
apiVersion: cisco.io/v1alpha1
kind: CiscoVLAN
metadata:
name: vlan-10
spec:
vlanId: 10
cidr: "172.16.10.0/24"
gateway: "172.16.10.1"
dhcpRange:
start: "172.16.10.100"
end: "172.16.10.200"
---
apiVersion: cisco.io/v1alpha1
kind: CiscoVLAN
metadata:
name: vlan-20
spec:
vlanId: 20
cidr: "172.16.20.0/24"
gateway: "172.16.20.1"
dhcpRange:
start: "172.16.20.100"
end: "172.16.20.200"
---
apiVersion: cisco.io/v1alpha1
kind: CiscoVLAN
metadata:
name: vlan-30
spec:
vlanId: 30
cidr: "172.16.30.0/24"
gateway: "172.16.30.1"
dhcpRange:
start: "172.16.30.100"
end: "172.16.30.200"
$ kubectl apply -f 00-vlans.yaml
ciscovlan.cisco.io/vlan-10 created
ciscovlan.cisco.io/vlan-20 created
ciscovlan.cisco.io/vlan-30 created
A few seconds later, all three are Active:
$ kubectl get ciscovlans -o wide
NAME VLAN-ID CIDR GATEWAY STATE PODS AGE
vlan-10 10 172.16.10.0/24 172.16.10.1 Active 18s
vlan-20 20 172.16.20.0/24 172.16.20.1 Active 18s
vlan-30 30 172.16.30.0/24 172.16.30.1 Active 18s
The status on each VLAN tells you exactly what got created on the router:
status:
state: Active
routerSubinterface: GigabitEthernet0/0/0.10
message: VLAN 10 configured on GigabitEthernet0/0/0.10
lastReconciled: "2026-02-24T18:09:13Z"
So what actually happened on the router? I queried it via RESTCONF to confirm.
Subinterfaces created, each one with its own IP and ip nat inside:
GigabitEthernet0/0/0 (trunk, no IP)
GigabitEthernet0/0/0.10 172.16.10.1/24
GigabitEthernet0/0/0.20 172.16.20.1/24
GigabitEthernet0/0/0.30 172.16.30.1/24
DHCP pools, one per VLAN:
{ "id": "VLAN10_POOL", "network": "172.16.10.0/24", "default-router": "172.16.10.1" }
{ "id": "VLAN20_POOL", "network": "172.16.20.0/24", "default-router": "172.16.20.1" }
{ "id": "VLAN30_POOL", "network": "172.16.30.0/24", "default-router": "172.16.30.1" }
Base ACLs. This is important. By default, each VLAN can talk to itself and to the internet, but is blocked from reaching any other VLAN:
VLAN10_ACL:
10 permit ip 172.16.10.0 0.0.0.255 -> 172.16.10.0 0.0.0.255 (intra-VLAN ok)
20 deny ip 172.16.10.0 0.0.0.255 -> 172.16.20.0 0.0.0.255 (block VLAN 20)
30 deny ip 172.16.10.0 0.0.0.255 -> 172.16.30.0 0.0.0.255 (block VLAN 30)
1000 permit ip any -> any (internet ok)
VLAN20_ACL:
10 permit ip 172.16.20.0 0.0.0.255 -> 172.16.20.0 0.0.0.255
20 deny ip 172.16.20.0 0.0.0.255 -> 172.16.10.0 0.0.0.255
30 deny ip 172.16.20.0 0.0.0.255 -> 172.16.30.0 0.0.0.255
1000 permit ip any -> any
VLAN30_ACL:
10 permit ip 172.16.30.0 0.0.0.255 -> 172.16.30.0 0.0.0.255
20 deny ip 172.16.30.0 0.0.0.255 -> 172.16.10.0 0.0.0.255
30 deny ip 172.16.30.0 0.0.0.255 -> 172.16.20.0 0.0.0.255
1000 permit ip any -> any
One kubectl apply. The router now has subinterfaces, DHCP pools, and ACLs for three VLANs. I didn't type a single command on the router CLI.
3.3 Deploy Applications (Before Policy)
The demo uses three pods:
- postgres on VLAN 20: a PostgreSQL database
- hello-world on VLAN 10: a Flask app that tries to connect to PostgreSQL
- no-hello-world on VLAN 30: the exact same Flask app, but on a different VLAN
# 04-apps.yaml (simplified)
apiVersion: v1
kind: Pod
metadata:
name: hello-world
namespace: demo
annotations:
cisco.io/vlan: vlan-10 # <-- This is all you need
spec:
containers:
- name: hello-world
image: 192.168.100.254:5000/hello-world:v1
env:
- name: DB_HOST
value: "postgres-vlan.demo.svc.cluster.local"
The pod only has cisco.io/vlan: vlan-10. A mutating webhook intercepts pod creation and injects the Multus network annotation automatically. The app uses DNS (postgres-vlan.demo.svc.cluster.local) to find the database. The operator creates a headless Service that resolves to the database's DHCP-assigned VLAN IP.
$ kubectl apply -f 04-apps.yaml
namespace/demo created
pod/postgres created
pod/hello-world created
pod/no-hello-world created
$ kubectl get pods -n demo -o wide
NAME READY STATUS IP NODE
hello-world 1/1 Running 10.42.1.97 node-2
no-hello-world 1/1 Running 10.42.1.98 node-2
postgres 1/1 Running 10.42.0.93 node-1
The VLAN CRDs now show one active pod each:
$ kubectl get ciscovlans -o wide
NAME VLAN-ID CIDR GATEWAY STATE PODS
vlan-10 10 172.16.10.0/24 172.16.10.1 Active 1
vlan-20 20 172.16.20.0/24 172.16.20.1 Active 1
vlan-30 30 172.16.30.0/24 172.16.30.1 Active 1
I port-forwarded both Flask apps to my laptop:
kubectl port-forward pod/hello-world -n demo 8080:8080 &
kubectl port-forward pod/no-hello-world -n demo 8081:8080 &
Open http://localhost:8080 and http://localhost:8081. Both apps have a dashboard that auto-refreshes every second, showing whether the database is reachable.
Both show DB UNREACHABLE. Red badge on both.
// hello-world (VLAN 10) at http://localhost:8080/api/status
{
"db_reachable": false,
"db_error": "connection failed: No route to host",
"net1_ip": "172.16.10.2",
"pod": "hello-world"
}
// no-hello-world (VLAN 30) at http://localhost:8081/api/status
{
"db_reachable": false,
"db_error": "connection failed: No route to host",
"net1_ip": "172.16.30.2",
"pod": "no-hello-world"
}
That's the base ACL doing its job. VLANs 10 and 30 both have deny rules blocking traffic to VLAN 20 where PostgreSQL lives. The segmentation is real, enforced on the router hardware, not in software.
3.4 Apply a VLANPolicy
Now let's selectively open access. We want hello-world (VLAN 10) to reach PostgreSQL (VLAN 20) on TCP 5432, but keep no-hello-world (VLAN 30) blocked.
# 01-policy.yaml
apiVersion: cisco.io/v1alpha1
kind: VLANPolicy
metadata:
name: app-to-db
spec:
source: vlan-10
destination: vlan-20
rules:
- protocol: tcp
ports: [5432]
$ kubectl apply -f 01-policy.yaml
vlanpolicy.cisco.io/app-to-db created
$ kubectl get vlanpolicies -o wide
NAME SOURCE DESTINATION STATE ACL-ENTRIES AGE
app-to-db vlan-10 vlan-20 Applied 10 9s
The operator updated both ACLs on the router:
status:
state: Applied
aclEntries: 10
message: "ACL rules applied: VLAN10_ACL (5 entries), VLAN20_ACL (5 entries)"
Here's what the ACLs look like now on the router. The operator inserted permit rules before the existing deny rules:
VLAN10_ACL (after policy):
10 permit tcp 172.16.10.0 -> 172.16.20.0 eq 5432 << NEW: allow TCP 5432
20 permit ip 172.16.10.0 -> 172.16.10.0 (intra-VLAN)
30 deny ip 172.16.10.0 -> 172.16.20.0 (block everything else to VLAN 20)
40 deny ip 172.16.10.0 -> 172.16.30.0 (block VLAN 30)
1000 permit ip any -> any (internet)
VLAN20_ACL (after policy):
10 permit tcp 172.16.20.0 -> 172.16.10.0 established << NEW: return traffic only
20 permit ip 172.16.20.0 -> 172.16.20.0
30 deny ip 172.16.20.0 -> 172.16.10.0
40 deny ip 172.16.20.0 -> 172.16.30.0
1000 permit ip any -> any
Two things to notice:
-
VLAN10_ACL now has a
permit tcp ... eq 5432at sequence 10, before the deny. VLAN 10 can initiate TCP connections to VLAN 20 on the PostgreSQL port. -
VLAN20_ACL has an
establishedrule. This allows return traffic from connections that VLAN 10 started, but VLAN 20 cannot initiate new connections to VLAN 10.
VLAN30_ACL didn't change at all. No policy mentions VLAN 30, so it stays fully blocked.
3.5 The Result
Back to the browsers. The dashboards refresh every second.
hello-world (VLAN 10) at http://localhost:8080
{
"db_reachable": true,
"db_host": "postgres-vlan.demo.svc.cluster.local",
"latency_ms": 7.2,
"net1_ip": "172.16.10.2",
"pod": "hello-world"
}
Green badge. DB REACHABLE, 7.2ms. It can also write and read data:
// GET http://localhost:8080/db
{
"status": "SUCCESS",
"inserted": {
"id": 1,
"created_at": "2026-02-24T18:14:04.158103"
},
"recent_messages": [
{
"content": "Hello from 172.16.10.2 at 18:14:04",
"id": 1
}
]
}
no-hello-world (VLAN 30) at http://localhost:8081
{
"db_reachable": false,
"db_error": "connection failed: timeout expired",
"net1_ip": "172.16.30.2",
"pod": "no-hello-world"
}
Red badge. DB UNREACHABLE. Same Docker image. Same code. Same DB_HOST env var. The only difference is one line in the YAML: cisco.io/vlan: vlan-30 instead of vlan-10. The router drops the traffic.
// GET http://localhost:8081/db
{
"status": "ERROR",
"error": "connection failed: No route to host"
}
4. What's Next
The full source code is on GitHub: github.com/csepulveda/cisco-kubernetes
Let me repeat what I said at the beginning: this is not production software. The code is tied to a specific router model (Cisco 4331), a specific IOS-XE version (16.12), and a specific physical setup. If you actually need network segmentation in Kubernetes, go with Cilium or Calico. If you're in the Cisco ecosystem and have the budget, ACI with Nexus is what it's designed for. Those are real, tested, supported solutions.
This project is something else. It's a lab experiment. A proof of concept. And the thing it proves is this: Kubernetes operators can manage anything that has an API.
The operator pattern is a control loop. It watches for desired state (your CRDs) and makes actual state match. Most operators manage stuff inside the cluster: databases, message queues, certificates. But nothing says it has to stay inside. The reconciliation loop doesn't care whether it's creating a Pod or configuring a physical router 3 meters away.
In this case it's a Cisco router. But it could be a firewall, a load balancer, a DNS provider, a cloud networking service, some legacy system with SOAP endpoints. If it has an API and it has state, you can write an operator for it.
That's what I find exciting about this. I wired a mid-range Cisco 4331 (not a Nexus 9000, not an ACI fabric, just a regular router) into a Kubernetes control plane using nothing but RESTCONF and Go. No vendor SDK, no proprietary integration, no expensive hardware. Just the operator pattern doing what it does best: turning YAML into reality.
Built with Go, Kubebuilder, K3s, and a lot of patience debugging YANG models at 2am.


Top comments (0)