From HAR File to Running Load Test in 60 Seconds With AI

#ai #performance #productivity #testing

The traditional workflow for creating a performance test goes like this. Record a user journey. Import it into your load testing tool. Run it. Watch it fail. Then spend hours doing detective work: tracing dynamic values through request-response chains, writing regex extractors, debugging why the extractor captured the wrong thing, running again, and repeating until the script works.

For a typical enterprise application, that process takes 5-25 hours. Per script. And the script breaks the next time the application changes a token format.

AI replaces this workflow. Not incrementally – structurally. Here is what it actually looks like.

Step 1: Record a HAR File (2 minutes)

Open your browser's dev tools (F12), switch to the Network tab, and navigate through the user journey you want to test. Login, navigate, perform the business action, verify the result, log out. Save the recording as a HAR file. Done.

A couple of recording tips that matter:

Use Firefox over Chrome. Chrome's recent versions default to a sanitised export that strips cookies and auth tokens. It can also drop large response bodies - meaning dynamic values that need correlating may not appear in the file. Firefox produces more complete captures.

For complex enterprise apps, use a proxy recorder (Fiddler, mitmproxy). A proxy captures complete request-response pairs without browser export quirks. Nothing gets dropped, nothing gets sanitised.

HAR files are JSON - structured, parseable, machine-readable. An AI agent can reason about the traffic without format translation. This is a meaningful advantage over tool-specific recording formats like JMX or LoadRunner scripts.

Step 2: Import and Let the Pipeline Work (30-90 seconds)

When you import a HAR file into an AI-powered pipeline, several things happen automatically:

Noise filtering. A typical browser recording contains 200-500 entries. Most are CDN requests, analytics, fonts, and third-party widgets. The pipeline classifies each domain and filters irrelevant traffic. The 40-80 actual API calls and page loads are what matter.

Assertion generation. Before any correlation work begins, the pipeline analyses responses and generates validation rules. Status code checks, content-type validation, and - most valuably - soft failure detection. A 200 response containing "success": false in the body. A login form on a page that should show a dashboard. These silent failures are invisible to HTTP status codes. The AI catches them before testing begins.

Observation. Every value in every response is compared against every subsequent request. The scanner flags values that change and records where each one first appeared. It assigns uniqueness scores: a 36-character UUID scores high (safe to replace globally), a user ID of "1" scores low (needs boundary-aware replacement to avoid corrupting unrelated data).

Decision and extraction. AI agents evaluate each candidate. For JSON responses, a JSONPath extractor is preferred – precise, readable, resistant to formatting changes. For HTML or text, a regex specialist builds expressions tuned for the specific regex engine of the target tool (JMeter's ORO engine has syntax quirks a generic regex will trip over). Each extraction rule is validated against the recorded data before insertion.

Replacement. Hardcoded values get swapped for variable references. High-uniqueness values get global replacement. Low-uniqueness values get boundary-aware replacement - the pipeline checks each occurrence in context before committing.

Proof. The test runs. Every extraction and substitution is checked against actual server responses. Failures are classified: was the observation wrong, the decision wrong, or did the application change? Each classification triggers a different repair path.

Safety net. A final scan catches anything the main loop missed: NOT_FOUND patterns, dynamic parameters outside the original candidate list, new tokens that appeared during the run. A QA agent validates across six categories: configuration, assertions, correlation, data and variables, scripts, and load profile.

The full pipeline completes in under two minutes for a standard web application. Complex enterprise apps with hundreds of entries: five to ten minutes.

What Comes Out

The output is not a fragile script. It is an engineered test asset:

Extractors for every dynamic value, validated against real server responses
Assertions including soft failure detection invisible to status codes
A QA report flagging remaining issues with proposed fixes
A baseline - the pipeline knows what "working" looks like for this application, so when things change, self-healing kicks in

Compare that to the old way:

	Traditional	AI Pipeline
Time to working script	5-25 hours	1-10 minutes
Dynamic values caught	Depends on engineer skill	Comprehensive scan
Assertions	Usually none	Auto-generated
Maintenance on app change	Full re-correlation	Self-healing diff

Why HAR Files Over Script Recorders

Most load testing tools ship with proxy-based script recorders. They work. But they bind you to one tool (JMeter's recorder produces JMX, LoadRunner's produces C). A HAR file is tool-agnostic — the same recording can produce a JMeter test plan or a Locust Python script.

There is also a practical issue: JMeter's proxy recorder cannot handle WebSocket connections. When the browser attempts a WebSocket upgrade, the proxy recording stops. Modern applications that rely on real-time communication become partially or entirely unrecordable. A browser-based HAR recording keeps going.

The Honest Limitations

HAR files capture what was sent but not why. They show a token in a request header but not which JavaScript function generated it. For most correlation work, what-was-sent is enough. But as recording technology evolves, capturing the "why" alongside the "what" will unlock new capabilities.

The AI pipeline is not magic. It handles the mechanical, repetitive work — the work that requires thoroughness and patience rather than creativity or judgment. When something genuinely novel appears (a bespoke authentication flow no system has seen before), human expertise still matters. The AI compresses the common case so that human attention can focus where it actually adds value.

Try It Yourself

The quickest way to see this in action:

Open Firefox, hit F12, go to the Network tab
Navigate through any web application (a login flow works well)
Right-click in the Network panel and "Save All as HAR"
Import that file into an AI-powered testing platform

What used to take hours of manual scripting now takes minutes. The time saved is real. What you do with that time - actual performance analysis, tuning, strategic testing - is where the engineering happens.

This article is adapted from AI Performance Engineering: How Agentic AI Is Transforming Load Testing by David Campbell. The book covers the full workflow in depth, including the agent architecture, a time-motion study, self-healing tests, and a build-your-own guide. Available on Leanpub.

LoadMagic offers a free tier if you want to try the HAR-to-test workflow yourself.