There is a specific failure mode in AI-assisted QA work that most tooling discussions skip entirely, and it shows up earliest when you are working solo on a real engagement.
Every new chat session is stateless. You paste the ticket, describe the feature, explain your severity logic, set up the context, and by the time the AI is actually useful, you have rebuilt your methodology from scratch for the third time that week. That is not a workflow problem you fix with better prompts. It is an architecture problem, and the fix is a skill file.
QAJourney has a full breakdown of this system at qajourney.net/ai-qa-workflow-for-real-projects, including the actual skill files as free downloads. The short version: a skill file is a context document you load as a system prompt. It carries your test surface tiers, your three-path testing framework, your bug report format, your severity and priority logic, your Playwright conventions, and an explicit definition of what the AI does and does not get to call. Load it once per session. The AI operates inside your methodology from the first message instead of a blank slate.
The local LLM layer solves a different problem. On a freelance or retainer engagement, tickets contain real product logic and real client data. Sending that to a cloud API on every session is a data exposure question whether or not it rises to a compliance issue. Running Ollama locally with the same skill file as system context keeps the engagement data on the machine. For the output quality required on QA tasks, current 7B to 14B models are sufficient. The cost at zero marginal per token makes it infrastructure rather than a service you pay by the session.
The three-role setup in the workflow: engineer as judgment layer, cloud AI loaded with the skill file for complex reasoning and active session output, local LLM for lightweight tasks and client data work. The skill file is the constant across all three.
The part that took time to internalize: AI dev teams already run their own QA layer. Linters on every change, unit tests automatically, agent-generated Playwright scripts before the PR opens. By the time you see a ticket in review, the obvious paths have already been tested. Your work starts where the agent's testing ends: testing what it built against what a real user would expect, not what the spec said to build. Those two things are not the same.
The billing feature case in the post is the clearest illustration. Every acceptance criterion passed. Code was technically correct. No billing history. No indication of recurring charges. A real user hitting that screen for the first time would have no idea what had happened to their money. The agent built what the ticket specified. The ticket did not specify what users need to orient themselves in a billing context. That is not a code problem. It is a product logic problem, and it only surfaces when a human reads the screen with user intent.
That check is not automatable. It is not a coverage gap the skill file closes. The skill file handles the mechanical work so the judgment layer has room to do the work only a human can do. Coverage mapping, test case generation, bug report scaffolding, edge case expansion, Playwright scaffolds: the system handles all of it faster and more consistently than manual. Reading the product as a user reads it: that is still yours.
The practical takeaway if you are running any kind of solo QA operation or retainer engagement: write down your methodology before you try to load it into an AI. The act of writing it makes the judgment layer visible. You will see exactly what travels into the file and exactly what does not, and the gap between those two things is the actual job description.
Full skill files, both full and lite versions, are available as free downloads on the post.
Top comments (0)