DEV Community

Cover image for Solved: Network automation seems a lot like stitching APIs together
Darian Vance
Darian Vance

Posted on • Originally published at wp.me

Solved: Network automation seems a lot like stitching APIs together

🚀 Executive Summary

TL;DR: Network automation often feels like brittle API stitching due to multi-vendor complexity, lack of a centralized Source of Truth, and reliance on imperative scripting. Solutions range from adopting smarter, idempotent tools like Ansible to building a centralized abstraction layer with a SoT like NetBox, ultimately evolving towards a unified platform mindset for full infrastructure abstraction.

🎯 Key Takeaways

  • Multi-vendor environments create significant challenges due to disparate APIs, authentication methods, and data models, requiring complex ‘glue code’ to integrate.
  • Transitioning from imperative scripting (step-by-step instructions) to a declarative model (defining desired end state) with tools like Ansible drastically improves automation robustness and safety by handling state and failing loudly.
  • Implementing a centralized Source of Truth (SoT) like NetBox or Nautobot decouples intent from implementation, allowing an automation orchestrator to translate high-level data model updates into multi-vendor API calls, reducing manual ‘API stitching’.

Is network automation just a glorified exercise in gluing disparate APIs together? A senior engineer explores why it feels that way and offers three practical solutions, from smarter scripts to building a true platform.

So, Network Automation is Just Stitching APIs Together? Yeah, About That…

I remember a 3 AM call that nearly broke me. We had a simple deployment: spin up a new web server, prod-web-42, add it to the F5 load balancer pool, and open a firewall port. The Python script our team wrote timed out calling the F5 API. But here’s the kicker: the script failed open. It continued on, called the Palo Alto API to open the port, and reported “SUCCESS” in Jenkins. For the next hour, 10% of our production traffic was being black-holed to a server the load balancer didn’t even know existed. Finding that needle in a haystack, with three different vendor UIs open and a junior engineer frantically trying to trace a “simple” script, was a nightmare. That’s when I realized the Reddit post was right: a lot of what passes for “automation” is just brittle, hopeful glue code holding the internet together.

Why It Feels Like You’re Just a “Glue-Code Monkey”

Let’s be honest, the feeling is valid. The core issue isn’t that you’re using APIs; it’s that you’ve been handed a box of mismatched Lego bricks with no instruction manual. Here’s the root of the problem:

  • Multi-Vendor Chaos: Your core switches are Cisco, your firewalls are Palo Alto, your load balancers are F5, and your cloud is AWS. None of these were designed to talk to each other. Their APIs speak different dialects, have different authentication methods, and model the world in fundamentally different ways. You’re not an engineer; you’re a UN translator.
  • The Missing Source of Truth: Where does the “truth” about your network live? If your answer is “a collection of spreadsheets, Confluence pages, and the running configs of the devices themselves,” you have a problem. Without a single, authoritative Source of Truth (SoT), your automation is always reactive. It has to query ten different APIs just to understand the current state before it can even think about making a change.
  • Imperative vs. Declarative: Most simple scripts are imperative. They are a list of steps: “Do A, then do B, then do C.” This is incredibly fragile. If step B fails, what happens? As my 3 AM war story shows, it can be disastrous. The goal is to move to a declarative model, where you define the desired end state and the automation engine is smart enough to figure out the steps to get there.

Okay, So How Do We Fix It? Three Levels of Sanity

Feeling stuck is normal. But you don’t have to live in a world of brittle Python scripts forever. Here are three ways to evolve your approach, from a quick fix to a complete paradigm shift.

Level 1: The Tactical Fix – Embrace the Glue, but Make it Smarter

You can’t boil the ocean overnight. The quickest win is to stop writing raw Python scripts for everything and adopt a tool that handles the hard parts for you. My tool of choice here is Ansible. Yes, you’re still calling APIs, but you’re doing it in a structured, declarative, and idempotent way.

Instead of a script that says “add server,” you write a playbook that says “this server must be present in this pool.” If it’s already there, Ansible does nothing. If the API call fails, it fails loudly and stops. It’s a smarter, safer glue.

Consider this simple Ansible task:

- name: Add new web server to the production pool
  bigip_pool_member:
    provider:
      server: "{{ f5_host }}"
      user: "{{ f5_user }}"
      password: "{{ f5_password }}"
      validate_certs: no
    pool: "prod_web_app_pool"
    host: "{{ new_server_ip }}"
    port: 80
    state: present
Enter fullscreen mode Exit fullscreen mode

This is still “API stitching,” but it’s light-years ahead of a custom script. It handles state, it’s easy to read, and it won’t continue if the F5 is unreachable.

Level 2: The Strategic Fix – Build a Centralized Abstraction Layer

This is where you stop being a glue-coder and start being an architect. The problem is that your engineers have to think about the implementation details of every vendor. The fix is to give them a single, unified place to declare their intent.

This is where a Source of Truth (SoT) like NetBox or Nautobot comes in. The workflow completely changes:

  1. An engineer no longer runs a script. They go into the NetBox UI (or use its API) and assign the IP for prod-web-42 to the “Production Web Servers” group.
  2. NetBox fires a webhook.
  3. An automation orchestrator (like Ansible Tower/AWX or a Jenkins job) catches that webhook.
  4. The orchestrator triggers the necessary playbooks, pulling all the required variables (IPs, pool names, firewall zones) directly from NetBox. It then makes the API calls to the F5, the Palo Alto, and maybe even updates your monitoring in Zabbix.

Pro Tip: This approach is powerful, but be warned. Your Source of Truth becomes a critical piece of infrastructure. If NetBox goes down, you can’t make automated changes. Protect it, back it up, and treat it with the respect it deserves.

The engineer is no longer stitching APIs. They are simply updating a data model. The automation platform handles the messy, multi-vendor translation. You have successfully decoupled intent from implementation.

Level 3: The ‘Nuke It From Orbit’ Fix – Adopt a Platform Mindset

This is the endgame. At this level, you stop thinking about “network automation” as a standalone task. You integrate it into a single, unified Internal Developer Platform (IDP). The network becomes just another resource that developers can provision on-demand, just like a database or a compute instance.

Instead of a network engineer touching NetBox, a developer defines their entire application stack in a single YAML file:

apiVersion: mycorp.com/v1alpha1
kind: Application
metadata:
  name: user-profile-service
spec:
  replicas: 3
  image: "docker.io/mycorp/user-profile:v1.2.4"
  database:
    type: postgres
    size: medium
  networking:
    expose: true
    port: 443
    path: "/api/users"
Enter fullscreen mode Exit fullscreen mode

When the developer runs kubectl apply -f app.yaml, a controller (built with something like Crossplane or a custom Kubernetes Operator) takes over. This controller is the ultimate API stitcher. It reads this high-level definition and translates it into a dozen low-level API calls:

  • Talk to vSphere to provision 3 VMs.
  • Talk to NetBox/IPAM to assign IPs.
  • Talk to the F5 to create a VIP and pool.
  • Talk to the Palo Alto to generate a dynamic firewall policy.
  • Talk to Vault to issue a certificate.

This is a massive organizational and technical undertaking. You are effectively building your own private cloud experience. But when you get here, the “API stitching” problem is completely solved, because it’s been abstracted away behind a single, powerful platform API that your entire organization uses.

Approach Effort Brittleness Who It’s For
Level 1: Smart Glue Low Medium Small teams, or anyone just starting their automation journey.
Level 2: SoT & Abstraction Medium Low Growing teams that need consistency, auditability, and scale.
Level 3: Platform Mindset Very High Very Low Mature orgs treating infrastructure as a product for internal customers.

So yes, sometimes network automation feels like stitching APIs together. And in the beginning, it is. But that’s not the destination. It’s the first step on a journey toward building robust, declarative, and eventually, fully abstracted systems that let you and your team focus on architecture, not glue code.


Darian Vance

👉 Read the original article on TechResolve.blog


☕ Support my work

If this article helped you, you can buy me a coffee:

👉 https://buymeacoffee.com/darianvance

Top comments (0)