DEV Community

Cover image for How I Built a 12-Step CI/CD Pipeline That Spins Up MySQL, Flask, and Playwright From Scratch
Yaniv
Yaniv

Posted on

How I Built a 12-Step CI/CD Pipeline That Spins Up MySQL, Flask, and Playwright From Scratch

Setting up CI for a web app is straightforward. Setting up CI for a test automation framework that needs a real database, two backend servers, and a headless browser — all starting from zero on every run — is a different problem.

This is how I built a GitHub Actions pipeline that provisions the entire infrastructure, runs 37 tests across 3 layers, and deploys a historical test report — every time I push to main.

The Problem: "Works On My Machine" Doesn't Scale

Locally, my test framework needs:

  • MySQL 8.0 with a specific schema loaded
  • A JSON Server running on port 3000
  • A Flask API server running on port 5000
  • Playwright with Chromium installed
  • Environment variables for DB credentials and API keys

Running pytest locally assumes all of this is already set up. In CI, nothing exists. Every run starts from an empty Ubuntu container.

The challenge isn't running the tests — it's building the world they need to run in.

The 12 Steps

Here's the full pipeline, and why each step exists:

Steps 1-3: Foundation

- name: 1. Checkout Code
  uses: actions/checkout@v4

- name: 2. Set up Python 3.13
  uses: actions/setup-python@v5
  with:
    python-version: '3.13'

- name: 3. Set up Node.js 18
  uses: actions/setup-node@v4
  with:
    node-version: '18'
Enter fullscreen mode Exit fullscreen mode

Python for the test framework. Node.js for JSON Server. Nothing surprising here, but note the explicit version pinning — 3.13, not 3.x. CI flakiness often starts with "we let the runner pick the version."

Step 4-5: Dependencies

- name: 4. Install Python Dependencies
  run: |
    python -m pip install --upgrade pip
    pip install -r requirements.txt

- name: 5. Install Playwright Browsers
  run: playwright install chromium --with-deps
Enter fullscreen mode Exit fullscreen mode

--with-deps is critical. Without it, Playwright installs the browser binary but not the OS-level libraries it needs (libgbm, libasound, etc.). The test run will fail with a cryptic error about missing shared objects.

Step 6: Database Schema

services:
  mysql:
    image: mysql:8.0
    env:
      MYSQL_ROOT_PASSWORD: root_password
      MYSQL_DATABASE: expense_test_db
      MYSQL_USER: test_user
      MYSQL_PASSWORD: test_password
    ports:
      - 3306:3306
    options: >-
      --health-cmd="mysqladmin ping"
      --health-interval=10s
      --health-timeout=5s
      --health-retries=5

# In steps:
- name: 6. Initialize MySQL Schema
  run: |
    sudo apt-get install -y mysql-client
    mysql -h 127.0.0.1 -u test_user -ptest_password expense_test_db < data/init_mysql.sql
Enter fullscreen mode Exit fullscreen mode

The MySQL service container starts alongside the job. The health-cmd ensures the container is actually ready before we try to load the schema. Without health checks, you'll hit "Connection refused" errors roughly 30% of the time.

The schema itself is minimal — one table with a CHECK constraint:

CREATE TABLE IF NOT EXISTS expenses (
    id INT PRIMARY KEY AUTO_INCREMENT,
    expense_name VARCHAR(255),
    amount DOUBLE CHECK (amount >= 0),
    date VARCHAR(50),
    category VARCHAR(100)
);
Enter fullscreen mode Exit fullscreen mode

That CHECK constraint is actually a test target — one of my E2E tests validates that negative amounts get rejected at the DB level even though the UI accepts them.

Steps 7-9: Server Orchestration

- name: 7. Install & Start JSON Server
  run: |
    npm install -g json-server
    json-server --watch json-server/db.json --port 3000 &

- name: 8. Start Flask Server
  env:
    DB_TYPE: mysql
    MYSQL_HOST: 127.0.0.1
  run: |
    python server/app.py &

- name: 9. Wait for Servers to be Ready
  run: |
    curl --retry 10 --retry-delay 2 --retry-connrefused http://localhost:3000/expenses
    curl --retry 10 --retry-delay 2 --retry-connrefused http://localhost:5000/expenses
    echo "All servers are up and running!"
Enter fullscreen mode Exit fullscreen mode

The & at the end of each server command runs it in the background. Step 9 is the safety net — it polls both servers with retry logic until they respond, or fails after 20 seconds.

This is a pattern I see skipped in a lot of CI setups. People start a server and immediately run tests, then wonder why they get intermittent connection errors. Always add a readiness check.

Step 10: The Actual Tests

- name: 10. Run Tests (exclude Mobile)
  env:
    GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }}
  run: |
    pytest -m "not mobile" --alluredir=allure-results --ai-analysis
Enter fullscreen mode Exit fullscreen mode

-m "not mobile" excludes tests that require a physical Android device. The remaining 37 tests cover Web (Playwright), API, Database, and cross-layer E2E.

--ai-analysis triggers an optional Groq LLM call on test failures to classify the root cause. The API key is stored as a GitHub Secret.

Steps 11-12: Reporting

- name: 11. Generate Allure Report
  uses: simple-elf/allure-report-action@master
  if: always()
  with:
    allure_results: allure-results
    allure_history: allure-history
    keep_reports: 20

- name: 12. Deploy Allure Report to GitHub Pages
  uses: peaceiris/actions-gh-pages@v3
  if: always()
  with:
    github_token: ${{ secrets.GITHUB_TOKEN }}
    publish_dir: allure-history
Enter fullscreen mode Exit fullscreen mode

if: always() is key — the report is generated even when tests fail. Without this, failures produce no report, which is exactly when you need one most.

keep_reports: 20 maintains the last 20 runs, so you get historical trend analysis in Allure — pass rates over time, flakiness detection, duration trends.

What I Learned Building This

1. Health checks prevent 80% of CI flakiness. The MySQL health-cmd and the curl retry loops in step 9 eliminated almost all intermittent failures. Before adding them, roughly 1 in 4 runs failed due to timing issues.

2. if: always() on reporting steps is non-negotiable. The whole point of CI reports is to understand failures. If the report step only runs on success, it's useless.

3. Service containers beat docker-compose in GitHub Actions. I originally tried running docker-compose up inside the workflow. It works, but it's slower and harder to debug. Native service containers integrate better with the runner's networking.

4. Pin your versions. Python 3.13, Node 18, MySQL 8.0 — not latest, not 3.x. A version bump in a dependency should be a deliberate commit, not a surprise in CI.

The Docker Alternative

For local development, the same test suite runs via Docker Compose with a single command:

docker-compose up --build
Enter fullscreen mode Exit fullscreen mode

This uses a custom entrypoint script that replicates the CI steps — waits for MySQL, starts both servers, runs pytest:

[1/5] Waiting for MySQL...        ✓
[2/5] Starting JSON Server...     ✓
[3/5] Starting Flask Server...    ✓
[4/5] Waiting for servers...      ✓
[5/5] Running tests...
========================= 34 passed, 3 xfailed =========================
Enter fullscreen mode Exit fullscreen mode

Same tests, same infrastructure, same results — whether it runs in GitHub Actions or on a developer's laptop.

Full Source

The complete workflow file and Docker setup are in the repo:

GitHub: Financial-Integrity-Ecosystem

The CI config is at .github/workflows/ci.yml. The Docker setup is in Dockerfile, docker-compose.yml, and docker-entrypoint.sh.


This is part 2 of a series on building a multi-layer test automation framework. Part 1 covered using Set Theory for cross-layer data integrity validation.

Yaniv Metuku — QA Automation Engineer

Top comments (0)