We analyzed 1.4 million test executions across 2,616 organizations. Two numbers caught our attention before we even got to the findings:
- AI-assisted generation brings the average time from spec upload to a runnable test suite down to 4 minutes.
- 41% of APIs experience undocumented schema changes within 30 days of initial test creation.
The first one is a reliability problem most teams don't know they have. The second is a workflow shift that's already happening, whether teams plan for it or not.
1. Auth failures
34% of all observed failures are due to authentication and authorization issues: expired tokens, incorrect scopes, and misconfigured headers. Schema and validation errors add another 22%. Actual 5xx server crashes? Under 10%.
Most API regressions are silent contract violations, not outages. If your testing skews toward "does it return 200", you're missing the majority of what breaks in production. Understanding API behavior at this level requires going beyond surface-level checks and considering the full API lifecycle, from API design and creation through to API security and governance.
2. The rise of E2E API testing
58% of organizations now run multi-step API workflow tests. Among enterprise teams, that's 84%. Over 11,200 workflows have been automated on the platform, with teams averaging ~50 workflow runs per week.
API-level workflows validate the same critical paths as UI tests, with less setup, less maintenance, and faster execution. Teams are replacing browser-based regression suites with API workflows as deployment gates. Where manual tests once dominated, development teams now automate tests throughout the API testing process to keep up with the pace of code changes.
3. How teams are testing
AI handles breadth. Engineers handle depth. Neither does the job alone.
Fully AI-generated test suites hit an 82% failure detection rate. When engineers add domain-specific assertions on top, that climbs to 91%.
AI models and AI agents own baseline coverage, status codes, schema validation, and boundary conditions. Humans own the edge cases that require actual system knowledge, multi-step state logic, and complex failure modes. This is the paradigm shift: generative AI and large language models can analyze vast amounts of API specification data and autonomously execute tasks such as generating test scripts, but human oversight remains important for validating test scenarios that require contextual intelligence and domain judgment.
4. Legacy industries have the most complex APIs and the least testing coverage
27% of APIs in the dataset fall into the "complex" tier (25+ fields). These incline heavily toward the Healthcare and Oil & Gas industries, where APIs and schemas are really old.
These same industries show the lowest CI/CD adoption rates in the dataset. Complexity plus low test coverage is the highest-risk combination in the data. They also tend to rely on historical data and manual tests rather than on agentic AI systems that can adapt dynamically to schema changes and secure access across user loads, making it harder to guarantee predictable outcomes.
5. What agentic AI systems mean for API testing going forward
This section looks ahead at the broader shift that the data points toward.
The practical implications for development teams:
Large language models can now interpret API interactions in plain English and natural language, turning api documentation into runnable test cases without manual translation.
Predictive analytics built on historical data can flag which parts of an api ecosystem are most likely to drift, helping teams prioritize human oversight where it matters most.
For organizations that generate revenue through external and third-party services, this level of AI capability directly reduces risk at the API gateway layer.
-
Just as CI/CD made continuous deployment possible, agentic systems are enabling continuous monitoring and validation of API performance, not just at release time.
Conclusion
The full report covers all of this in more depth, including execution patterns, assertion maturity benchmarks, industry breakdowns, and where AI-assisted testing is and isn't closing the gap.


Top comments (0)