Setting up CI for a web app is straightforward. Setting up CI for a test automation framework that needs a real database, two backend servers, and a headless browser — all starting from zero on every run — is a different problem.
This is how I built a GitHub Actions pipeline that provisions the entire infrastructure, runs 37 tests across 3 layers, and deploys a historical test report — every time I push to main.
The Problem: "Works On My Machine" Doesn't Scale
Locally, my test framework needs:
- MySQL 8.0 with a specific schema loaded
- A JSON Server running on port 3000
- A Flask API server running on port 5000
- Playwright with Chromium installed
- Environment variables for DB credentials and API keys
Running pytest locally assumes all of this is already set up. In CI, nothing exists. Every run starts from an empty Ubuntu container.
The challenge isn't running the tests — it's building the world they need to run in.
The 12 Steps
Here's the full pipeline, and why each step exists:
Steps 1-3: Foundation
- name: 1. Checkout Code
uses: actions/checkout@v4
- name: 2. Set up Python 3.13
uses: actions/setup-python@v5
with:
python-version: '3.13'
- name: 3. Set up Node.js 18
uses: actions/setup-node@v4
with:
node-version: '18'
Python for the test framework. Node.js for JSON Server. Nothing surprising here, but note the explicit version pinning — 3.13, not 3.x. CI flakiness often starts with "we let the runner pick the version."
Step 4-5: Dependencies
- name: 4. Install Python Dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
- name: 5. Install Playwright Browsers
run: playwright install chromium --with-deps
--with-deps is critical. Without it, Playwright installs the browser binary but not the OS-level libraries it needs (libgbm, libasound, etc.). The test run will fail with a cryptic error about missing shared objects.
Step 6: Database Schema
services:
mysql:
image: mysql:8.0
env:
MYSQL_ROOT_PASSWORD: root_password
MYSQL_DATABASE: expense_test_db
MYSQL_USER: test_user
MYSQL_PASSWORD: test_password
ports:
- 3306:3306
options: >-
--health-cmd="mysqladmin ping"
--health-interval=10s
--health-timeout=5s
--health-retries=5
# In steps:
- name: 6. Initialize MySQL Schema
run: |
sudo apt-get install -y mysql-client
mysql -h 127.0.0.1 -u test_user -ptest_password expense_test_db < data/init_mysql.sql
The MySQL service container starts alongside the job. The health-cmd ensures the container is actually ready before we try to load the schema. Without health checks, you'll hit "Connection refused" errors roughly 30% of the time.
The schema itself is minimal — one table with a CHECK constraint:
CREATE TABLE IF NOT EXISTS expenses (
id INT PRIMARY KEY AUTO_INCREMENT,
expense_name VARCHAR(255),
amount DOUBLE CHECK (amount >= 0),
date VARCHAR(50),
category VARCHAR(100)
);
That CHECK constraint is actually a test target — one of my E2E tests validates that negative amounts get rejected at the DB level even though the UI accepts them.
Steps 7-9: Server Orchestration
- name: 7. Install & Start JSON Server
run: |
npm install -g json-server
json-server --watch json-server/db.json --port 3000 &
- name: 8. Start Flask Server
env:
DB_TYPE: mysql
MYSQL_HOST: 127.0.0.1
run: |
python server/app.py &
- name: 9. Wait for Servers to be Ready
run: |
curl --retry 10 --retry-delay 2 --retry-connrefused http://localhost:3000/expenses
curl --retry 10 --retry-delay 2 --retry-connrefused http://localhost:5000/expenses
echo "All servers are up and running!"
The & at the end of each server command runs it in the background. Step 9 is the safety net — it polls both servers with retry logic until they respond, or fails after 20 seconds.
This is a pattern I see skipped in a lot of CI setups. People start a server and immediately run tests, then wonder why they get intermittent connection errors. Always add a readiness check.
Step 10: The Actual Tests
- name: 10. Run Tests (exclude Mobile)
env:
GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }}
run: |
pytest -m "not mobile" --alluredir=allure-results --ai-analysis
-m "not mobile" excludes tests that require a physical Android device. The remaining 37 tests cover Web (Playwright), API, Database, and cross-layer E2E.
--ai-analysis triggers an optional Groq LLM call on test failures to classify the root cause. The API key is stored as a GitHub Secret.
Steps 11-12: Reporting
- name: 11. Generate Allure Report
uses: simple-elf/allure-report-action@master
if: always()
with:
allure_results: allure-results
allure_history: allure-history
keep_reports: 20
- name: 12. Deploy Allure Report to GitHub Pages
uses: peaceiris/actions-gh-pages@v3
if: always()
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
publish_dir: allure-history
if: always() is key — the report is generated even when tests fail. Without this, failures produce no report, which is exactly when you need one most.
keep_reports: 20 maintains the last 20 runs, so you get historical trend analysis in Allure — pass rates over time, flakiness detection, duration trends.
What I Learned Building This
1. Health checks prevent 80% of CI flakiness. The MySQL health-cmd and the curl retry loops in step 9 eliminated almost all intermittent failures. Before adding them, roughly 1 in 4 runs failed due to timing issues.
2. if: always() on reporting steps is non-negotiable. The whole point of CI reports is to understand failures. If the report step only runs on success, it's useless.
3. Service containers beat docker-compose in GitHub Actions. I originally tried running docker-compose up inside the workflow. It works, but it's slower and harder to debug. Native service containers integrate better with the runner's networking.
4. Pin your versions. Python 3.13, Node 18, MySQL 8.0 — not latest, not 3.x. A version bump in a dependency should be a deliberate commit, not a surprise in CI.
The Docker Alternative
For local development, the same test suite runs via Docker Compose with a single command:
docker-compose up --build
This uses a custom entrypoint script that replicates the CI steps — waits for MySQL, starts both servers, runs pytest:
[1/5] Waiting for MySQL... ✓
[2/5] Starting JSON Server... ✓
[3/5] Starting Flask Server... ✓
[4/5] Waiting for servers... ✓
[5/5] Running tests...
========================= 34 passed, 3 xfailed =========================
Same tests, same infrastructure, same results — whether it runs in GitHub Actions or on a developer's laptop.
Full Source
The complete workflow file and Docker setup are in the repo:
GitHub: Financial-Integrity-Ecosystem
The CI config is at .github/workflows/ci.yml. The Docker setup is in Dockerfile, docker-compose.yml, and docker-entrypoint.sh.
This is part 2 of a series on building a multi-layer test automation framework. Part 1 covered using Set Theory for cross-layer data integrity validation.
Yaniv Metuku — QA Automation Engineer
Top comments (0)