FirstPassLab

Posted on Mar 18 • Originally published at firstpasslab.com

Why 82% of Network Automation Projects Fail (and What the 18% Do Differently)

#networking #devops #automation #tutorial

Network automation is supposed to save time, reduce errors, and scale operations. So why do most projects stall or fail entirely?

According to Enterprise Management Associates (EMA) research surveying 354 IT professionals, only 18% of network automation initiatives fully succeed. Another 54% report partial success, and 28% have stalled or failed outright.

This isn't a tooling problem — Ansible, Terraform, and NSO all work fine. It's an architecture and organizational problem. Here's what the data actually says.

The Numbers: What Gets Automated and What Doesn't

The NANOG 95 survey (October 2025) stack-ranked automation adoption by task:

Task	Automation Rate
Backups	88%
Device Deployment	78%
Firmware Upgrades	67%
Service Provisioning	59%
Non-Provisioning Config	54%
Firewall Rules	53%
Troubleshooting	44%
Capacity Planning	39%
eBGP & Interconnection	37%
DDoS Response	31%

The pattern is clear: simple, repetitive, low-risk tasks are highly automated. Complex, judgment-heavy tasks remain manual. Backups are essentially solved. eBGP peering — which requires understanding business relationships, route policy, and traffic engineering — is still mostly done by hand.

The Top 5 Reasons Automation Projects Fail

The Itential/EMA research identifies five challenges, all clustered tightly:

1. Integration Difficulties (25%)

The #1 killer. Network automation doesn't exist in isolation — it needs to integrate with:

ITSM (ServiceNow, Jira) for change management
Monitoring (Prometheus, Datadog, ThousandEyes) for closed-loop remediation
Source of truth (NetBox, Nautobot) for inventory and intended state
CI/CD pipelines (GitLab, Jenkins) for testing and deployment
AAA/RBAC for authorization and audit

Most teams pick a tool and start writing playbooks without designing the integration architecture first. The tool works in isolation but breaks when it needs to talk to everything else.

2. Network Complexity & Lack of Standards (24.9%)

Multi-vendor environments, inconsistent naming conventions, 15 years of organic growth, and devices running 6 different firmware versions. You can't automate what you can't normalize.

3. Legacy Infrastructure (24.3%)

Devices that only support CLI — no NETCONF, no RESTCONF, no API. Switches running IOS 12.x that can't be upgraded because they support a critical application. Firewalls with undocumented rules nobody wants to touch.

4. Tool Complexity (23.7%)

Ansible is "simple" — until you need error recovery, conditional logic across multi-vendor environments, and rollback procedures. Terraform works for cloud infra but gets hairy with network resources. NSO is powerful but has a steep learning curve.

5. Data Quality (22.3%)

Automation is only as good as its input data. If your CMDB says a switch is a Nexus 9300 but it's actually a 9500, your playbook generates the wrong config. Source of truth tools (NetBox, Nautobot) solve this — but populating them accurately requires upfront investment most orgs skip.

What Separates the 18% That Succeed?

Funding Is the Single Biggest Predictor

The correlation between funding and success is stark:

Funding Level	Success Rate
Fully funded	80%
Adequately funded	~55%
Underfunded	29%

"Fully funded" means:

Dedicated headcount — at least 1 FTE automation engineer per 500-1000 managed devices
Training budget — Python, Ansible, NETCONF/RESTCONF for the team
Tool licensing — NSO, Terraform Cloud, CI/CD infra
Executive sponsorship — someone who protects the initiative from deprioritization

The 29% bucket typically has "one engineer doing automation on the side of their regular job." That's not an automation initiative — that's a hobby.

Architecture Before Scripting

Successful projects start with architectural decisions, not playbook writing:

Define the source of truth — where does intended network state live?
Design integration points — how do ITSM, monitoring, and automation communicate?
Establish the workflow — change request → approval → testing → deployment → validation → rollback
Choose the abstraction layer — raw API calls vs. Ansible vs. NSO service models vs. Terraform
Build the testing framework — pyATS, Batfish, or custom validation scripts

Only then write the first playbook.

Start with High-Value, Low-Risk Tasks

The NANOG data shows successful orgs automate in order of risk:

Phase 1 (Months 1-3): Backups, compliance checks, inventory — zero operational risk
Phase 2 (Months 3-6): Firmware upgrades, standard deployments — low risk with rollback
Phase 3 (Months 6-12): Service provisioning, firewall rules — moderate risk, requires testing
Phase 4 (Year 2+): Troubleshooting, eBGP changes, DDoS response — high risk, requires confidence

Jumping straight to Phase 3 or 4 without the foundation is the #1 pattern in stalled projects.

The Public Sector Gap

One often-overlooked data point: 95% of public sector network changes are still manual. Government agencies, military networks, and regulated industries lag significantly behind commercial enterprises.

This is both a problem and an opportunity — the demand for automation architects in the public sector is about to explode.

TL;DR

The data is clear:

82% of network automation projects fail or partially succeed — mostly due to organizational and architectural problems, not tooling
Funding is the #1 predictor — fully funded projects succeed 80% of the time vs 29% for underfunded
Integration architecture matters more than scripting skills — design the system first, write playbooks second
Start low-risk, build confidence — backups → deployments → provisioning → complex operations

The tools work. The question is whether you have the right architecture and organizational support to make them succeed.

Originally published at FirstPassLab. For more deep dives on network engineering and automation, check out firstpasslab.com.

Disclosure: This article was adapted from the original with AI assistance. All technical content and data citations have been verified by the editorial team.

DEV Community