DEV Community

FirstPassLab
FirstPassLab

Posted on • Originally published at firstpasslab.com

Building a Network Digital Twin with Batfish, ContainerLab, and Suzieq — A Practical Guide

A network digital twin is a virtual replica of your production network that lets you test config changes, simulate failures, and validate routing behavior — all before anything touches a live device.

In 2026, you don't need a six-figure vendor platform to get started. Batfish, ContainerLab, and Suzieq are free, open-source tools that cover config analysis, topology emulation, and observability. Here's how to build one from scratch.

What Is a Network Digital Twin?

A digital twin mirrors your actual production network — topology, configurations, routing tables, and optionally live state. Unlike a generic lab, when you push a BGP route-policy change, the twin tells you exactly which prefixes will be affected in your specific environment.

The critical insight: the twin is the missing layer between your automation pipeline and production. Every proposed change gets validated before deployment.

Three Maturity Levels

Level 1: Static Topology Visualization

What: Always-current map of topology, inventory, and config state.
Tools: NetBox + config backups (Oxidized/RANCID/git) + visualization.
Effort: 1-2 weeks.

Most teams can't accurately answer "show me every device in this VLAN." A static twin solves this with an automated, queryable inventory.

Level 2: Config-Aware Simulation

What: Analyzes production configs to validate routing, ACLs, and reachability — no traffic required.
Primary tool: Batfish — ingests device configs (Cisco IOS/XE/XR, Junos, Arista EOS), builds a vendor-independent model, and answers structured queries.

What you can validate:

Query Type Example Why It Matters
Routing analysis "All BGP routes from AS 65001 after this policy change?" Catch prefix leaks before they happen
ACL/firewall analysis "Can 10.1.1.5 reach 192.168.1.100:443?" Validate security without test traffic
Differential analysis "What changes if I apply this config?" Pre-change impact assessment
Compliance checks "Do all interfaces have descriptions?" Automated audit readiness

Complementary: ContainerLab for live topology emulation. Define your topology in YAML:

name: dc-fabric
topology:
  nodes:
    spine1:
      kind: ceos
      image: ceos:4.32
    spine2:
      kind: ceos
      image: ceos:4.32
    leaf1:
      kind: ceos
      image: ceos:4.32
    leaf2:
      kind: ceos
      image: ceos:4.32
  links:
    - endpoints: ["spine1:eth1", "leaf1:eth1"]
    - endpoints: ["spine1:eth2", "leaf2:eth1"]
    - endpoints: ["spine2:eth1", "leaf1:eth2"]
    - endpoints: ["spine2:eth2", "leaf2:eth2"]
Enter fullscreen mode Exit fullscreen mode

Supports Nokia SR Linux, Arista cEOS, Cisco XRd, Juniper cRPD. Spin up a 20-node DC fabric on a 64GB server in under 5 minutes.

Effort: 2-4 weeks for Batfish; +1-2 weeks for ContainerLab.

Level 3: Live Telemetry-Fed AIOps Twin

What: Real-time replica with live routing tables, interface counters, and flow data. Enables anomaly detection, predictive capacity planning, and automated root cause analysis.

Open-source: Suzieq — collects and normalizes operational state from multi-vendor devices.
Commercial: Forward Networks, IP Fabric, Cisco Nexus Dashboard, Selector AI.

What Level 3 enables:

  • Anomaly detection — ML models catch BGP flapping before full failure
  • Predictive capacity planning — extrapolate growth trends vs. guessing
  • Automated RCA — correlate events across layers in minutes, not hours
  • Historical replay — rewind network state to diagnose intermittent issues

Effort: 1-3 months (open-source) or 2-6 weeks (commercial).

Step-by-Step: Building Your First Twin

Step 1: Config Backups

Automate config backups from every L3 device, at least daily. Store in Git for version history:

# Oxidized config example
source:
  default: csv
  csv:
    file: /etc/oxidized/router.db
    delimiter: ":"
    map:
      name: 0
      model: 1
Enter fullscreen mode Exit fullscreen mode

Step 2: Deploy Batfish

docker pull batfish/batfish
docker run -d -p 9997:9997 -p 9996:9996 batfish/batfish
pip install pybatfish
Enter fullscreen mode Exit fullscreen mode

Run your first queries:

from pybatfish.client.session import Session

bf = Session(host="localhost")
bf.set_network("production")
bf.init_snapshot("/path/to/configs", name="current")

# Find all BGP sessions and their status
bgp_sessions = bf.q.bgpSessionStatus().answer().frame()
print(bgp_sessions)

# Check reachability
reachability = bf.q.traceroute(
    startLocation="web-server",
    headers={"dstIps": "10.0.1.100", "applications": ["mysql"]}
).answer().frame()
Enter fullscreen mode Exit fullscreen mode

Step 3: ContainerLab for Live Testing

bash -c "$(curl -sL https://get.containerlab.dev)"
containerlab deploy -t dc-fabric.yaml
Enter fullscreen mode Exit fullscreen mode

Apply your production configs → real control plane behavior. OSPF adjacencies form, BGP sessions establish, failover scenarios are testable.

Step 4: Suzieq for Operational State

pip install suzieq
sq-poller -D /path/to/inventory.yaml
Enter fullscreen mode Exit fullscreen mode

Query across vendors:

suzieq-cli
> ospf show
> path show src=10.1.1.1 dest=10.2.2.2
Enter fullscreen mode Exit fullscreen mode

Step 5: Wire Into Your Change Workflow

Highest-ROI integration: pre-change validation in CI/CD.

  1. Engineer proposes config change via Git PR
  2. CI pipeline loads proposed config into Batfish
  3. Batfish runs differential analysis + compliance checks
  4. Results posted as PR comments
  5. Reviewer sees impact analysis before approving

Organizations embedding Batfish in CI/CD significantly reduce change-induced outages.

Open-Source vs. Commercial

Criteria Open Source Commercial
Cost Free (server resources only) $50K-$500K+/year
Setup time 2-6 weeks 2-4 weeks
Config analysis Deep (Batfish) Deep (Forward)
Live state Good (Suzieq) Excellent
AI/NLP queries Manual/scripted Built-in
Scale Hundreds of devices Thousands
CI/CD integration Native (Python) API-based

Recommendation: Start open-source. Batfish + ContainerLab covers 80% of what a twin needs. Evaluate commercial when you need enterprise scale or executive dashboards.

How Digital Twins Enable AIOps

Without a twin, AIOps tools process disconnected telemetry — syslog, SNMP traps, NetFlow — without a behavioral model. With a twin, every alert is contextualized:

"Interface Gi0/0/1 on router-core-1 went down" becomes "the primary path between Site A and Site B is down, traffic is failing over to backup MPLS, latency to cloud increases by 15ms."

Teams with validated twins push changes daily instead of weekly because every change is pre-tested. Organizations with mature network automation resolve incidents 60-80% faster.

FAQ

What is a network digital twin?
A virtual replica of your production network that lets you simulate changes and predict failures before they hit production.

What open-source tools do I need?
Batfish (config validation), ContainerLab (topology emulation), Suzieq (state collection). Together they cover 80% of twin capabilities.

How much does it cost?
Open-source: free beyond server resources (32-64GB RAM server). Commercial: $50K+/year.

I already use EVE-NG — do I need a twin?
EVE-NG is great for learning. A digital twin goes further — it mirrors your actual production and integrates into CI/CD for automated change validation.


Originally published at FirstPassLab.


AI Disclosure: This article was adapted from the original with AI assistance. Technical content has been reviewed for accuracy.

Top comments (0)