What if an AI agent could test your backend by calling API endpoints and verifying results directly in SQL?
I tried exactly that using Claude Code. The agent called API endpoints, inspected a Docker database via SQL, and validated a decision tree of business logic scenarios automatically.
When a single operation triggers cascading changes across multiple entities, the number of scenarios grows quickly. In practice, this often ends up with manually clicking endpoints in Postman and checking the database state after every operation.
Instead of running tests manually, I wrote down the decision tree of scenarios in a markdown file and handed it to Claude Code. Then I gave it access to a local Docker database and an API service that seeds test data.
Claude executed operations through the API, verified the system state using SQL queries, analyzed the results, and reported PASS / FAIL for each scenario.
In practice, it behaves like an agent running integration tests — but without writing test code.
In this article, I will show how to let Claude Code call your API, inspect a local database, and automatically validate complex decision trees.
This workflow is still experimental, but it demonstrates an interesting direction for backend testing automation — especially in systems with complex business logic and many edge-case combinations.
Tools used in this article:
- Claude Code
- Docker Desktop 29.1.5 (Mac)
- DBeaver — database client
Demo repository: github.com/KamilBuksa/claude-code-local-db-testing
Running a Local Database with Docker
We start by setting up the environment. The repository includes a ready-to-use docker-compose.yml with MariaDB — one command is enough:
docker compose up -d
Once the containers are running, we can connect to the database. In Docker Desktop it should look like this:
Viewing the Database in DBeaver
To see what's happening in the database in real time, I connected DBeaver.
Open DBeaver → New Database Connection → choose MariaDB:

Click Test Connection — it should display "Connected (59ms)". Then click Finish.
After connecting, the full database structure becomes visible (to generate the structure, run npm run start:dev):
Everything is ready.
Domain: HR with Cascading Statuses
The application models a company operating across multiple offices. Employees belong to departments, and departments operate inside buildings. Each entity has its own lifecycle.
Building
-
ACTIVE— has at least one active department with employees -
VACANT— all departments are empty or disbanded -
CLOSED— manually closed, does not change automatically
Department
-
ACTIVE— has at least one employee -
EMPTY— the last employee left the department (the department still exists) -
DISBANDED— dissolved by an admin or automatically when the last employee becomes deactivated
Employee — ACTIVE or DEACTIVATED. Each employee can belong to multiple departments with roles MANAGER or MEMBER.
The key cascading rule: when a department changes status, the system checks whether the building should change to VACANT. Conversely, when an active department appears, the building returns to ACTIVE.
I defined five edge-case scenarios for three operations:
| # | Operation | Situation | Expected result |
|---|---|---|---|
| 1 | Remove employee from department | The only active department in the building. The only employee voluntarily leaves | Department: EMPTY · Building: VACANT
|
| 2 | Disband department | The only department in the building | Department: DISBANDED · Building: VACANT
|
| 3 | Disband department | Other departments in the building remain active | Department: DISBANDED · Building: ACTIVE
|
| 4 | Deactivate employee | The only active department in the building. The only employee is deactivated | Department: DISBANDED · Building: VACANT · Employee: DEACTIVATED
|
| 5 | Deactivate employee | The employee is not the only person in any department | Departments and buildings: ACTIVE · Employee: DEACTIVATED
|
What happens step by step:
- The only employee voluntarily leaves the department → the department becomes
EMPTY. The building no longer has any active departments, so it becomesVACANT. - When an admin disbands a department, it becomes
DISBANDED. Because it was the only department in the building, the building becomesVACANT. - Disbanding one department does not affect the building if other departments are still active — the building remains
ACTIVE. - Deactivating the employee automatically disbands the department (
DISBANDED) instead of merely emptying it (EMPTY) — this is the key difference compared to case 1. - Deactivating an employee who is not the only member of any department does not trigger any cascade — departments and buildings remain unchanged.
Prompt for Claude Code
The decision tree — the full table of scenarios with setups and expected results — is stored in docs/test-cases.md in the repository. Claude has access to it via CLAUDE.md. I used a single short prompt:
Seed the database, then test each scenario from @docs/test-cases.md.
For every case: set up the state, call the endpoint, verify via SQL, report PASS / FAIL.
Claude read the file with the test cases, planned the execution order, and started working.
Claude Code in Action
Claude started by verifying that the API was running:
First, a curl request to /buildings to confirm the service responds. Then it loaded the test data:
The /mock/seed endpoint creates a complete dataset: buildings, departments, employees, and their memberships. Claude saved the returned IDs and proceeded with the tests.
To verify the database state, Claude used docker exec to run SQL queries directly:
Claude did not just execute queries — it analyzed the results and understood the context, explaining why the result was correct before moving to the next case.
First Test: Case 4
Claude began with Case 4 — deactivating the only employee in the department:
State reset, minimal setup: one building, one IT department, and one employee (John) as the only manager. After deactivating John, the expected result is: department → DISBANDED, building → VACANT.
Case 4: PASS. The cascading logic works correctly.
Parallel Tests
After verifying Case 4, I asked Claude to run the remaining scenarios in parallel. It prepared the following cases:
Each test received a separate agent, and all started simultaneously:
Result:
All five scenarios: PASS. The cascading business logic works correctly for every edge case.
Demo Repository
The full demo is publicly available: github.com/KamilBuksa/claude-code-local-db-testing
Simply follow the setup instructions in the README.md, run claude in the repository directory, and type:
Let's play around and test the decision tree.
Claude will read the context from CLAUDE.md, recall the decision tree, and guide the entire testing process:
In the screenshot you can see that Claude immediately starts working — it reads the context from CLAUDE.md and begins by seeding the test data.
Important Notes
A few things to keep in mind before trying this yourself:
- Never give Claude Code access to a production database — it's unsafe. In this article we work with an isolated local environment running in Docker.
- The repository uses schema synchronization instead of migrations for quick setup — in production environments you should use migrations.
- All endpoints are public for testing purposes — this is not recommended for real applications.
- Tested on Docker Desktop 29.1.5 on Mac — if the
dockercommand does not work, you likely need a newer version.
Summary
What happened here in short? Claude executed operations through curl calls to the API, verified SQL results via docker exec, understood the context of the results, and reported PASS/FAIL.
This workflow is still experimental. I had a lot of fun exploring Claude's behavior during this process. In real projects it may be useful to create a database dump from the testing environment and restore it locally to start with realistic data. Preparing the decision tree in a markdown file beforehand is also important.
Personally, I never trust AI 100%, so I would still manually verify critical cases through the UI after connecting the frontend. However, this kind of testing can detect issues earlier and save time.
This approach works particularly well when testing complex decision trees, where the number of scenarios grows quickly and manual testing becomes impractical.
Thanks for reading. Happy coding! 🚀











Top comments (2)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.