Most k6 guides assume you are on Linux, or at worst on a Mac. They do not cover what happens when your setup is minimal by choice - no WSL, no Docker, and the standard install paths all stall before you get to the first test.
That was the actual situation on a recent retainer engagement. My Windows machine runs lean and I do not have repo access to the codebase I am testing, I am the QA engineer, not the dev. The workaround is not complicated, but nobody documents it clearly. k6 ships as a standalone binary. You download the Windows zip, extract k6.exe into a folder your user account controls, and call it from Git Bash with the full path. That is the entire setup. No installer, no package manager required.
The full breakdown at qajourney.net covers the binary setup, the DevTools recon process, the gzip-js compression wall that kills response inspection on Convex-backed apps, and the actual p95 numbers from three staged test runs against live AI endpoints. Worth reading in full if you are walking into a similar engagement.
What stands out from the actual test data is the latency tiering. Direct Convex queries with no AI in the path returned in the 288ms to 800ms range. Medium AI action calls came in around 3 seconds. The heavy AI generation endpoint, a complex multi-part structured output task, landed at a 26-second p95. That number is not a failure — it reflects the actual workload. The only actionable finding was a UX concern: 26 seconds with no feedback state is a separate problem from whether the backend is performing acceptably.
Custom metrics per endpoint matter here. The default k6 http_req_duration aggregation would bury the difference between a 288ms query and a 26-second AI action call in a single blended number. Defining a Trend metric per endpoint gives you per-tool visibility in the summary output.
One thing the DevTools path cannot tell you on a Convex backend: the response body. Convex uses gzip-js compression on responses, which the browser does not automatically decompress in the DevTools Response tab. You see raw compressed bytes. The request payload is readable and shows function names, which is useful. The responses are not. Know your escalation path before you hit this. For a retainer engagement, asking the AI dev directly for the function list was the right move. Fiddler Classic or mitmproxy are the proxy alternatives if direct access to the dev team is not available.
Zero errors across 105 checks on a single-VU baseline run confirms the backend is healthy before any concurrent testing is attempted. The concurrent stage, 10 to 50 VUs each firing AI action calls, is pending. That test requires coordination with the dev team first because AI endpoints burn compute credits on every call and a sudden spike looks like an attack to monitoring systems if nobody is warned.
Top comments (0)