How I built an AI agent to automate BDD test creation in Zephyr

#qa #automation #zephyr

Manually writing BDD test cases from lengthy Product Requirement Documents (PRDs) is a slow, repetitive, but critical task for any QA team. It's a process that begs for automation.

What if we could bridge the gap between product requirements and test management? What if an AI could read a PRD and generate high-quality, ready-to-use Gherkin scenarios for us?

In this post, I'll walk you through how I built "Zephyr AI", a command-line tool that does exactly that. It connects Confluence, Google's Gemini AI, and Zephyr Scale into a seamless, automated pipeline that turns requirements into test cases in minutes.

The High-Level Workflow
The concept behind Zephyr AI is simple. It follows a clear, four-step process to automate the entire workflow:

Fetch Requirement: The tool securely connects to a Confluence page using its API to get the raw PRD text.
Clean & Prepare: It processes the raw, often messy, Confluence HTML, cleaning it up to extract only the meaningful content for the AI.
Generate BDD with AI: The clean text is sent to the Gemini AI with a carefully engineered prompt, asking it to generate comprehensive BDD scenarios.
Create Test Case in Zephyr: The AI-generated Gherkin script is used to automatically create a new, organized test case in Zephyr Scale via its REST API.

You can follow along or check out the final code in the project's GitHub repository: ai-zephyr-bdd-generator

The Tech Stack

This project was built with a straightforward and powerful set of tools:

Language: Python
APIs: Google Gemini API, Confluence API, Zephyr Scale REST API
Key Libraries: requests (for all API communication), BeautifulSoup (for HTML parsing), and python-dotenv (for managing secrets).
requirements.txt

requests
beautifulsoup4
google-generativeai
python-dotenv
atlassian-python-api
html2text`

Challenges I Overcame (The Real Story)
Building this tool was a fantastic learning experience, filled with the kind of "gotchas" every developer can relate to.

Getting high-quality output from an LLM is all about the quality of the prompt. My initial prompts gave me incomplete or generic test cases. The breakthrough came from treating the AI like a junior QA engineer who needs very specific instructions.

Here’s a simplified version of the final prompt structure that worked wonders:

You are a meticulous Quality Assurance Automation Engineer. Your task is to write complete, end-to-end BDD scenarios in Gherkin based on the provided PRD.

**Critical Instructions for Accuracy and Completeness:**
1.  **No Incomplete Scenarios:** Every `Scenario` you write MUST be fully formed.
2.  **Full End-to-End Flow:** Trace the user's journey from `Given` to `Then`.
3.  **Infer Necessary Details:** If the PRD is brief, infer logical details like user roles, balances, error messages, and success confirmations.

Here is the PRD to analyze:
---
[The cleaned PRD text goes here]
---

The Zephyr Folder Mystery The biggest hurdle was getting the newly created test cases into the right folder in Zephyr Scale. My initial request body looked like this, which seemed logical:

// The intuitive, but INCORRECT payload
{
  "projectKey": "VX",
  "name": "My Test Case",
  "folder": {
    "id": 12345678
  }
}

The API would accept the request but silently ignore the folder object, creating the test case at the root level. After a great interaction with the SmartBear support team, they pointed out the solution. The API documentation specifies a different, top-level parameter:

// The CORRECT payload
{
  "projectKey": "VX",
  "name": "My Test Case",
  "folderId": 12345678
}

Conclusion & What's Next
"Zephyr AI" has already proven to be a massive time-saver, automating the bridge between product requirements and test case management. It ensures that our testing is always aligned with our specs and frees up valuable time for more complex exploratory testing.

The journey isn't over. For Phase 4, I'm planning to build a simple web interface for the tool using Flask or Streamlit, making it accessible to the entire team, not just those comfortable with the command line.
You can find the project here: ai-zephyr-bdd-generator

What do you think? Are there other features I should add? Let me know in the comments below!

DEV Community

How I built an AI agent to automate BDD test creation in Zephyr

The Tech Stack

Top comments (0)