1. Context, Scope, and Design Objectives
This document presents a rigorous architectural synthesis of an on‑premises Kubernetes networking design implemented atop a Hyper‑V virtualization substrate. The objective is to construct a networking architecture that compensates for the absence of a cloud provider while remaining conceptually and operationally aligned with cloud‑native paradigms.
The design explicitly targets the following goals:
- Operation in a fully on‑premises environment without reliance on a managed cloud provider
- Functional support for
Serviceobjects of typeLoadBalancer - Clear and enforceable separation of concerns among compute, networking, and ingress responsibilities
- Immediate operational viability coupled with architectural compatibility for future autoscaling at both the pod and node levels
- Conceptual portability, such that the underlying mental model remains valid upon eventual migration to a public cloud environment
The core infrastructural components considered herein are:
- FRR Virtual Machine: An Ubuntu‑based virtual appliance running FRRouting, responsible for fabric‑level routing and control‑plane signaling
- Kubernetes Worker Nodes: Linux virtual machines hosting workloads and participating in routing advertisement
- MetalLB: Deployed in BGP mode to provide load‑balancer semantics in the absence of a cloud‑native implementation
2. Rationale for FRR and MetalLB in On‑Premises Kubernetes
2.1 Absence of a Cloud Provider and Its Consequences
In contrast to managed Kubernetes offerings, an on‑premises Kubernetes deployment lacks the following capabilities by default:
- A managed Layer‑4/Layer‑7 load balancer
- A managed routing plane capable of advertising service reachability
- Automatic integration between Kubernetes service abstractions and the surrounding network fabric
Consequently, the Service abstraction of type LoadBalancer is inert unless explicitly backed by external infrastructure. The burden of implementing these capabilities therefore shifts to the platform architect.
2.2 MetalLB as a Functional Analog to Cloud Load Balancers
MetalLB is introduced to fill this infrastructural void by providing two essential capabilities:
- Allocation of externally reachable virtual IP addresses for Kubernetes services
- Advertisement of reachability for those IPs to the upstream network
MetalLB supports two operational modes:
- Layer‑2 (L2) Mode, based on ARP/NDP announcements
- Border Gateway Protocol (BGP) Mode, based on explicit route advertisement
This architecture deliberately adopts BGP mode, for reasons grounded in scalability, determinism, and alignment with Kubernetes’ distributed systems model.
3. Justification for BGP Mode over L2 Mode (Engineering Perspective)
3.1 Architectural Implications of L2 Mode
In L2 mode, MetalLB assigns ownership of a LoadBalancer IP to a single node. That node responds to ARP requests on behalf of the service, and failover is achieved indirectly via ARP cache invalidation.
From an architectural standpoint, this entails:
- Tight coupling between a network identity (the service IP) and a specific node
- Implicit, non‑contractual ownership semantics
- Dependence on shared mutable state in the form of ARP caches
Such properties undermine predictability, complicate failure analysis, and conflict with Kubernetes’ expectation that nodes are ephemeral and interchangeable.
3.2 Architectural Advantages of BGP Mode
Under BGP mode, service reachability is expressed through explicit route advertisements. Nodes hosting service endpoints announce routes, and the routing fabric (FRR) selects viable forwarding paths. When a node becomes unavailable, its routes are withdrawn in a deterministic and protocol‑defined manner.
This yields several architectural advantages:
- Explicit contracts governing reachability and ownership
- Well‑defined state machines for convergence and failure handling
- Decoupling of node lifecycle events from ingress semantics
The conceptual correspondence between Kubernetes primitives and BGP behavior is direct:
| Kubernetes Event | BGP Effect |
|---|---|
| Pod scheduled | Route advertised |
| Pod terminated | Route withdrawn |
| Node added | BGP peer established |
| Node removed | BGP session torn down |
This symmetry renders BGP mode a natural fit for Kubernetes’ distributed control model.
4. Network Roles and Separation of Responsibilities
4.1 FRR Virtual Machine as a Fabric Router
The FRR virtual machine functions as a stable fabric‑level routing component. Its responsibilities are intentionally constrained and explicitly defined.
It does not act as:
- An Internet gateway
- A NAT device
- A default gateway for Kubernetes nodes
Instead, FRR serves exclusively as a fabric router, providing deterministic Layer‑3 reachability for Kubernetes Service and Pod traffic via BGP.
Network Interface Partitioning
The FRR VM is provisioned with two network interfaces, each mapped to a distinct routing domain:
-
eth0(External – 192.168.1.0/24)- Management access
- Outbound Internet connectivity (package installation, updates)
- Connectivity to the broader on‑premises LAN
-
eth1(Internal – 10.10.0.0/24)- Kubernetes fabric
- BGP peering with Kubernetes nodes
- Transport for Pod and LoadBalancer traffic
Critically, only internal fabric routes are advertised into BGP, preserving strict separation between management traffic and cluster data paths.
Netplan Configuration (FRR VM)
The following netplan configuration illustrates the canonical interface setup on the FRR VM:
# /etc/netplan/50-cloud-init.yaml
network:
version: 2
ethernets:
eth0:
dhcp4: false
addresses:
- 192.168.1.13/24
routes:
- to: default
via: 192.168.1.1
nameservers:
addresses:
- 1.1.1.1
- 8.8.8.8
eth1:
dhcp4: false
addresses:
- 10.10.0.10/24
This configuration enforces a single default route via the external interface, ensuring that internal fabric traffic is never misrouted toward the management plane.
FRR (FRRouting) Configuration Overview
Within FRR, the routing policy is deliberately minimalistic and explicit:
router bgp 65001
bgp router-id 10.10.0.10
no bgp default ipv4-unicast
address-family ipv4 unicast
redistribute connected route-map INTERNAL_ONLY
exit-address-family
Supporting policy objects restrict route advertisement to the Kubernetes fabric only:
ip prefix-list INTERNAL_NET seq 10 permit 10.10.0.0/24
route-map INTERNAL_ONLY permit 10
match ip address prefix-list INTERNAL_NET
This ensures that:
- External subnets (e.g., 192.168.1.0/24) are never advertised to Kubernetes nodes
- FRR cannot be accidentally interpreted as a default or Internet gateway
4.2 Kubernetes Nodes as Disposable Compute Elements
Kubernetes nodes are treated strictly as ephemeral compute resources. They host workloads and participate in routing advertisements via MetalLB, but they do not retain long‑term ownership of ingress IPs.
Each node:
- Establishes an explicit BGP peering relationship with FRR
- Advertises LoadBalancer IPs only while hosting active service endpoints
This design reinforces the principle that nodes remain stateless with respect to ingress, enabling both horizontal pod autoscaling and future node‑level autoscaling without architectural refactoring.
5. FRR Configuration: Conceptual Overview
5.1 Governing Principles
The FRR configuration adheres to three guiding principles:
- The routing fabric must be stable and long‑lived
- Compute nodes must remain disposable
- External and internal routing domains must be strictly segregated
5.2 Functional Responsibilities of FRR
- Operate a BGP process under a dedicated autonomous system (AS 65001)
- Accept peering relationships from Kubernetes nodes (AS 65002)
- Advertise only internal fabric connectivity
5.3 On Explicit Peer Configuration
The requirement to explicitly configure BGP neighbors when adding new nodes is intrinsic to BGP’s design. This characteristic reflects protocol intentionality rather than architectural deficiency.
Crucially, this constraint does not impede future autoscaling; it merely indicates that automation of peer management has not yet been introduced.
6. MetalLB Configuration: Conceptual Overview
In BGP mode, MetalLB:
- Allocates LoadBalancer IPs from a predefined pool
- Advertises those IPs to FRR via BGP
- Withdraws advertisements automatically upon failure or topology changes
MetalLB deliberately refrains from managing FRR configuration or dynamically creating BGP neighbors. This boundary enforces a clean separation between Kubernetes‑level intent and infrastructure‑level policy.
7. Autoscaling Readiness and Evolutionary Path
7.1 Pod‑Level Autoscaling
Horizontal Pod Autoscaling operates entirely within the Kubernetes control plane and is unaffected by the external routing architecture. The design fully supports HPA without modification.
7.2 Node‑Level Scaling: Present and Future
At present:
- Nodes are provisioned manually
- BGP peers are added manually
- System behavior remains stable and predictable
In a future automated environment:
- Node provisioning becomes orchestrated
- BGP peering is automated or abstracted
Importantly, the network architecture itself remains unchanged; only the automation layer evolves.
8. Architectural Continuity Across Cloud Migration
| On‑Premises Component | Cloud Analog |
|---|---|
| FRR VM | Managed cloud router / VPC router |
| MetalLB | Managed cloud LoadBalancer |
| Internal fabric | VPC subnet |
| BGP semantics | Provider‑managed routing |
The architectural mental model therefore persists intact across deployment environments.
9. Architecture Diagram (Textual Representation)
External LAN (192.168.1.0/24)
|
[ eth0 ]
+----------------+
| FRR VM |
| AS 65001 |
+----------------+
[ eth1 ]
|
Kubernetes Fabric (10.10.0.0/24)
---------------------------------
| | |
[ Node‑1 ] [ Node‑2 ] [ Node‑N ]
AS 65002 AS 65002 AS 65002
| | |
Pods Pods Pods
10. Sequence Diagram: LoadBalancer Traffic Flow
@startuml
top to bottom direction
component Client
component "FRR\nBGP Speaker" as FRR
component "Kubernetes\nNode" as Node
component "Pod\nApplication" as Pod
Client --> FRR : TCP SYN to LoadBalancer IP
FRR --> Node : BGP-selected\nnext hop
Node --> Pod : Service routing\n(kube-proxy / dataplane)
Pod --> Node : Response payload
Node --> FRR : Return traffic
FRR --> Client : Forward response
@enduml
11. Concluding Observations
- BGP mode is selected on the basis of architectural rigor rather than convenience
- FRR serves as a stable routing substrate, not an application gateway
- Nodes remain ephemeral and autoscaling‑compatible
- LoadBalancer semantics closely mirror those of managed cloud environments
- The design is simultaneously operationally sound today and structurally extensible for the future
This architecture prioritizes clarity, determinism, and long‑term evolutionary capacity—attributes essential to robust Kubernetes infrastructure at scale.
PlantUML Diagram
@startuml
skinparam backgroundColor #FFFFFF
skinparam shadowing false
skinparam componentStyle rectangle
skinparam defaultFontName Monospace
title Kubernetes on-prem with MetalLB (BGP) and FRR Fabric Router
'========================
' External Network
'========================
package "External LAN\n192.168.1.0/24\n(Management / Internet)" {
node "Client\n192.168.1.2" as client
node "is-kube-01\nControl Plane\neth0: 192.168.1.11" as kube01_ext
node "is-kube-02\nWorker Node\neth0: 192.168.1.12" as kube02_ext
node "FRR-VM\neth0: 192.168.1.13\n(No BGP here)" as frr_ext
}
'========================
' Internal Fabric
'========================
package "Kubernetes Internal Fabric\n10.10.0.0/24" {
node "FRR-VM\nFabric Router\neth1: 10.10.0.1\nAS 65001" as frr_int
node "is-kube-01\neth1: 10.10.0.11\nkubelet --node-ip\nAS 65002" as kube01_int
node "is-kube-02\neth1: 10.10.0.12\nkubelet --node-ip\nAS 65002" as kube02_int
component "MetalLB Speaker\n(on each node)" as metallb
}
'========================
' Kubernetes Objects
'========================
package "Kubernetes Cluster" {
component "kube-apiserver\n(6443)" as apiserver
component "kube-proxy\niptables / IPVS" as kubeproxy
component "Pods\n(Containers)" as pods
}
'========================
' Management / Admin Path
'========================
client --> kube01_ext : SSH / HTTPS / kubectl
client --> kube02_ext : (optional admin)
kube01_ext --> apiserver : control-plane
apiserver --> kube01_ext
apiserver --> kube02_ext
'========================
' Routing Control Plane (BGP)
'========================
kube01_int --> frr_int : BGP (TCP 179)\nAdvertise LB IPs
kube02_int --> frr_int : BGP (TCP 179)\nAdvertise LB IPs
metallb --> kube01_int : Speaker binds\nInternalIP
metallb --> kube02_int
note right of frr_int
FRR role:
- BGP fabric router
- Learns LoadBalancer IPs
- No NAT
- No data forwarding
end note
'========================
' Data Plane (Service Traffic)
'========================
frr_int ..> kube01_int : Routing info only\n(NO packets)
frr_int ..> kube02_int : Routing info only\n(NO packets)
kube01_int --> kubeproxy
kube02_int --> kubeproxy
kubeproxy --> pods
note bottom of kubeproxy
Actual data path:
Client -> Node holding LB IP
kube-proxy forwards to Pod
FRR NOT in packet path
end note
'========================
' Separation of Concerns
'========================
note bottom
eth0 (192.168.1.x):
- Internet
- Admin
- OS routing / apt
eth1 (10.10.x.x):
- Kubernetes fabric
- BGP
- MetalLB
end note
@enduml
Author's Note
This document was prepared as a general guide and reference note only. The content was drafted with the assistance of an AI system and subsequently reviewed, refined, and curated by the author.


Top comments (0)