DEV Community

FirstPassLab
FirstPassLab

Posted on • Originally published at firstpasslab.com

Why 82% of Network Automation Projects Fail (and What the 18% Do Differently)

Network automation is supposed to save time, reduce errors, and scale operations. So why do most projects stall or fail entirely?

According to Enterprise Management Associates (EMA) research surveying 354 IT professionals, only 18% of network automation initiatives fully succeed. Another 54% report partial success, and 28% have stalled or failed outright.

This isn't a tooling problem — Ansible, Terraform, and NSO all work fine. It's an architecture and organizational problem. Here's what the data actually says.

The Numbers: What Gets Automated and What Doesn't

The NANOG 95 survey (October 2025) stack-ranked automation adoption by task:

Task Automation Rate
Backups 88%
Device Deployment 78%
Firmware Upgrades 67%
Service Provisioning 59%
Non-Provisioning Config 54%
Firewall Rules 53%
Troubleshooting 44%
Capacity Planning 39%
eBGP & Interconnection 37%
DDoS Response 31%

The pattern is clear: simple, repetitive, low-risk tasks are highly automated. Complex, judgment-heavy tasks remain manual. Backups are essentially solved. eBGP peering — which requires understanding business relationships, route policy, and traffic engineering — is still mostly done by hand.

The Top 5 Reasons Automation Projects Fail

The Itential/EMA research identifies five challenges, all clustered tightly:

1. Integration Difficulties (25%)

The #1 killer. Network automation doesn't exist in isolation — it needs to integrate with:

  • ITSM (ServiceNow, Jira) for change management
  • Monitoring (Prometheus, Datadog, ThousandEyes) for closed-loop remediation
  • Source of truth (NetBox, Nautobot) for inventory and intended state
  • CI/CD pipelines (GitLab, Jenkins) for testing and deployment
  • AAA/RBAC for authorization and audit

Most teams pick a tool and start writing playbooks without designing the integration architecture first. The tool works in isolation but breaks when it needs to talk to everything else.

2. Network Complexity & Lack of Standards (24.9%)

Multi-vendor environments, inconsistent naming conventions, 15 years of organic growth, and devices running 6 different firmware versions. You can't automate what you can't normalize.

3. Legacy Infrastructure (24.3%)

Devices that only support CLI — no NETCONF, no RESTCONF, no API. Switches running IOS 12.x that can't be upgraded because they support a critical application. Firewalls with undocumented rules nobody wants to touch.

4. Tool Complexity (23.7%)

Ansible is "simple" — until you need error recovery, conditional logic across multi-vendor environments, and rollback procedures. Terraform works for cloud infra but gets hairy with network resources. NSO is powerful but has a steep learning curve.

5. Data Quality (22.3%)

Automation is only as good as its input data. If your CMDB says a switch is a Nexus 9300 but it's actually a 9500, your playbook generates the wrong config. Source of truth tools (NetBox, Nautobot) solve this — but populating them accurately requires upfront investment most orgs skip.

What Separates the 18% That Succeed?

Funding Is the Single Biggest Predictor

The correlation between funding and success is stark:

Funding Level Success Rate
Fully funded 80%
Adequately funded ~55%
Underfunded 29%

"Fully funded" means:

  • Dedicated headcount — at least 1 FTE automation engineer per 500-1000 managed devices
  • Training budget — Python, Ansible, NETCONF/RESTCONF for the team
  • Tool licensing — NSO, Terraform Cloud, CI/CD infra
  • Executive sponsorship — someone who protects the initiative from deprioritization

The 29% bucket typically has "one engineer doing automation on the side of their regular job." That's not an automation initiative — that's a hobby.

Architecture Before Scripting

Successful projects start with architectural decisions, not playbook writing:

  1. Define the source of truth — where does intended network state live?
  2. Design integration points — how do ITSM, monitoring, and automation communicate?
  3. Establish the workflow — change request → approval → testing → deployment → validation → rollback
  4. Choose the abstraction layer — raw API calls vs. Ansible vs. NSO service models vs. Terraform
  5. Build the testing framework — pyATS, Batfish, or custom validation scripts

Only then write the first playbook.

Start with High-Value, Low-Risk Tasks

The NANOG data shows successful orgs automate in order of risk:

  • Phase 1 (Months 1-3): Backups, compliance checks, inventory — zero operational risk
  • Phase 2 (Months 3-6): Firmware upgrades, standard deployments — low risk with rollback
  • Phase 3 (Months 6-12): Service provisioning, firewall rules — moderate risk, requires testing
  • Phase 4 (Year 2+): Troubleshooting, eBGP changes, DDoS response — high risk, requires confidence

Jumping straight to Phase 3 or 4 without the foundation is the #1 pattern in stalled projects.

The Public Sector Gap

One often-overlooked data point: 95% of public sector network changes are still manual. Government agencies, military networks, and regulated industries lag significantly behind commercial enterprises.

This is both a problem and an opportunity — the demand for automation architects in the public sector is about to explode.

TL;DR

The data is clear:

  • 82% of network automation projects fail or partially succeed — mostly due to organizational and architectural problems, not tooling
  • Funding is the #1 predictor — fully funded projects succeed 80% of the time vs 29% for underfunded
  • Integration architecture matters more than scripting skills — design the system first, write playbooks second
  • Start low-risk, build confidence — backups → deployments → provisioning → complex operations

The tools work. The question is whether you have the right architecture and organizational support to make them succeed.


Originally published at FirstPassLab. For more deep dives on network engineering and automation, check out firstpasslab.com.


Disclosure: This article was adapted from the original with AI assistance. All technical content and data citations have been verified by the editorial team.

Top comments (0)