"A good tester prevents problems; a great tester finds them." Keith Klain, Director of Quality Engineering at KPMG UK
Introduction
Four disabled tests. One cryptic comment: 'flaky.' When I dove into Keyshade's workspace role tests, I discovered the failures weren't random at all—they revealed a fundamental misunderstanding of how the API validates permissions.
Keyshade is a modern solution for secure, centralized credential and configuration management, designed for developers, DevOps engineers, and open-source communities who need to manage secrets without the headache.
The Issue (#1122)
I decided to continue my open source journey on Keyshade since my blog post going over how I solved issue 998, which was that the Keyshade command line interface’s “keyshade run” command fails to give a proper error that users can understand. Now I decided to explore Keyshade’s API since I have explored how the CLI functions in its codebase.
I found an issue where included a bug in workspace role tests, which meant the problem stemmed from how workspace roles were being set up and asserted in tests, some test fixtures and assertions depended on implicit ordering and shared state, which made the tests fragile and allowed timing or state-dependent flakes to slip into CI (Continuous Integration). By making the setup deterministic and isolating role state for each test (explicitly creating the roles and their relationships, avoiding shared mutable fixtures, and tightening assertions to check the actual role assignments rather than relying on incidental data), the tests now reliably exercise the API behavior they were intended to cover.
The result is a more robust test suite that accurately guards against regressions in role handling for workspaces, reduces CI noise from flaky failures, and gives us greater confidence in future changes to authorization and workspace logic. Here are some key locations in the codebase we found out to be useful in solving the issue:
The apps/api/src/workspace-role/ folder includes the workspace-role feature module — contains controller, service, DTOs, and possibly unit/integration tests. Focus on business logic, dependency wiring, and any in-module role constants.
Examples:
Inspect the service methods (assignRole, removeRole, listRoles) for edge-case logic (no-op on duplicate role, default role applied on creation).
Check module exports and DI config: ensure the controller gets the expected implementation of the role service (mock vs real) in tests.
Example check: verify if role constants are defined here (e.g., 'admin', 'member', 'viewer') and whether they match values used in DB/entities or frontend.
The apps/api/src/workspace-role/workspace-role-controller.ts file includes the controller / HTTP handlers for workspace-role endpoints — review request DTOs/validation, how the controller calls services, status codes, and permission checks performed at the controller layer.
Examples:
Example request mapping: POST /workspaces/:id/roles -> controller.assignRole(req.params.id, req.body). Check expected body shape (userId, role).
Example validation: if controller uses decorators or a schema, show the required fields (e.g., { userId: string, role: 'admin'|'member' }).
Example permission check: controller may call guard like ensureWorkspaceAdmin(currentUser, workspaceId) — confirm where that check happens and what error it throws (403 vs 401).
The apps/api/src/workspace-role/workspace-role/workspace-role.e2e.spec.ts file has end-to-end tests for workspace-role flows — tests that spin up the app (or a test server), seed DB data or fixtures, run real HTTP requests, and assert full-stack behavior and responses.
Examples:
Test setup/teardown: show DB seeding (create users, create workspace) and cleanup steps to avoid cross-test leakage.
Example test case: an e2e test that POSTs role assignment and then GETs membership list to assert the new role is present; include expected status codes and response shapes.
Flakiness checks: look for timeouts, async waits, or hard-coded delays that may make these tests flaky (and thus failing intermittently).
General Diagram Walkthrough of the Issue

Reproducing the Problem
Here are steps regarding how I reproduced the problem before assessing my plan, implementation, and solution
Scenario 1 — should not be able to update the workspace role with environments that do not belong to the project Steps to reproduce the behaviour:
Create (or use) a workspace W.
Create Project A in W and capture PROJECT_A_ID.
Create Project B in W and capture PROJECT_B_ID.
Create Environment A in Project A (ENV_A_ID) and Environment B in Project B (ENV_B_ID).
Create a workspace role R for Project A that includes only ENV_A_ID.
Configure your CLI to target workspace W and authenticate with a token that can update roles.
Use the CLI (or send an API request) to update role R to set environmentIds = [ENV_A_ID, ENV_B_ID].
Observe the response/error shown by the CLI.
Expected behaviour
The API should validate that ENV_B_ID does not belong to Project A and return a 4xx validation error (e.g., 400 or 422) with a clear, formatted message such as: "environment ENV_B_ID does not belong to project PROJECT_A_ID"
The CLI should print that formatted message to the user (no opaque stack trace or generic 500).
The update must be rejected and the role R must remain unchanged (still only ENV_A_ID).
Scenario 2 — should be able to add environment access for projects to the role with READ_WORKSPACE, UPDATE_WORKSPACE_ROLE, READ_PROJECT, READ_ENVIRONMENT authorities Steps to reproduce the behaviour:
Create (or use) a workspace W.
Create Project P in W and ensure it has at least one environment (ENV_P_1, …).
Create a workspace role R (it may initially have no project-level environment access).
Obtain an auth token that has the authorities: READ_WORKSPACE, UPDATE_WORKSPACE_ROLE, READ_PROJECT, READ_ENVIRONMENT.
Configure your CLI to use workspace W and that token.
Use the CLI (or send an API request) to add project-level environment access for Project P to role R (the endpoint that grants project environment access).
Observe the response/error shown by the CLI and then GET the role R to inspect its environment access.
Expected behaviour
The API should accept the request and return 200/201 with a clear success body describing the new project-level environment access (or list of environmentIds that were added).
The CLI should print a readable success message and show the role now has access to Project P environments.
If the token lacks any required authority, the API should return 403 Forbidden with a clear formatted error and the CLI should print that message.
Scenario 3 — should be able to add environment access for projects to the role (permission variation) Steps to reproduce the behaviour:
Create (or use) a workspace W, a Project X with environments, and a role R.
Prepare two auth tokens:
* AUTH\_ADMIN: token with UPDATE\_WORKSPACE\_ROLE (and related read authorities).
* AUTH\_RESTRICTED: token without UPDATE\_WORKSPACE\_ROLE.
Configure your CLI to use workspace W.
With AUTH_ADMIN, run the CLI command (or API request) to add Project X environment access to role R. Verify success.
With AUTH_RESTRICTED, attempt the same CLI command (or API request) to add Project X environment access to role R. Observe the response/error shown by the CLI.
Expected behaviour
With AUTH_ADMIN: API returns 200/201 and the role R gains access to Project X environments; CLI prints a readable success confirmation.
With AUTH_RESTRICTED: API returns 403 Forbidden with a clear formatted message like "missing authority: UPDATE_WORKSPACE_ROLE" and the CLI prints that message; role R must not be modified.
Scenario 4 — should not be able to create workspace role where environments do not belong to the project Steps to reproduce the behaviour:
Create (or use) a workspace W.
Create Project A and Project B in W.
Create Environment B in Project B (ENV_B_ID).
Configure your CLI to target workspace W with a token that can create roles.
Use the CLI (or POST API) to create a workspace role with:
* projectId = PROJECT\_A\_ID
* environmentIds = \[ENV\_B\_ID\]
- Observe the response/error shown by the CLI.
Expected behaviour
The API must reject the create request with a 4xx validation error (e.g., 400 or 422) and a clear message such as: "environment ENV_B_ID does not belong to project PROJECT_A_ID"
The CLI should print that formatted error.
No role should be created.
Keyshade’s Codebase
Here is Keyshade's codebase and API related to this issue, and how would I use the code to diagnose and fix the failing workspace-role tests, displaying the importance of code organization.
High-level overview (relevant subsystems)
-
Schema / client shapes:
- packages/schema/src/workspace-role/index.ts defines the WorkspaceRole payload shape (projects + environments arrays, authorities, etc.). That tells you the expected request/response shape for workspace role create/update operations.
-
API service layer:
-
apps/api/src/workspace-role/workspace-role.service.ts contains the logic that parses project -> environment inputs and applies DB operations. The key function is parseProjectEnvironmentsDBOperation which:
- verifies that an environment slug exists and belongs to the provided project
- throws NotFoundException if an environment is not part of the project
- calls authorizationService.authorizeUserAccessToEnvironment to ensure the acting user has READ_ENVIRONMENT on that environment
-
-
Authorization checks:
- apps/api/src/auth/service/authorization.service.ts and apps/api/src/auth/service/authority-checker.service.ts implement the authority checks and higher-level authorization plumbing. authorizationService exposes methods such as authorizeUserAccessToEnvironment and authorizeUserAccessToSecret; these both call into the authority checker and also verify workspace membership.
-
Tests & client usage:
- packages/api-client/tests/workspace-role.spec.ts contains tests that exercise the workspace-role APIs from a client perspective (create workspace, create project, create environments, then create/update roles).
- apps/platform and controller-instance reference the WorkspaceRoleController from the generated API client; you can use the API client or the CLI to reproduce e2e scenarios.
Why the tests were failing (what the issue description lists)
The failing/disabled tests are focused on cases where environments provided for a workspace role do not belong to the project, and cases that require multiple authorities (READ_WORKSPACE, UPDATE_WORKSPACE_ROLE, READ_PROJECT, READ_ENVIRONMENT).
The service code enforces two separate checks:
1. Ownership check: the environment must belong to the provided project — otherwise parseProjectEnvironmentsDBOperation throws NotFoundException with a message like "Environment X is not part of project Y".
2. Authorization check: after confirming the environment belongs to the project, the service calls authorizationService.authorizeUserAccessToEnvironment which requires the user to have READ\_ENVIRONMENT on that environment (and also checks workspace access).
If tests tried to use an environment that belongs to a different project, the service will intentionally return 404/NotFound before attempting to grant access; tests must assert that behavior.
If tests expected that adding environment access requires specific authorities, the test setup must give the test-user those authorities (or use a workspace-admin) before calling the endpoint.
How to use the codebase to reproduce, diagnose, and fix the tests
- Reproduce the failing case manually (fastest way to see precise error):
* Use packages/api-client/tests/workspace-role.spec.ts as a reference for how tests set up the workspace/project/environment/role chain.
* From an integration/e2e test or an API client, create:
* workspace A
* project P1 inside workspace A
* project P2 inside workspace A
* environment E1 in P1 and environment E2 in P2
* Then attempt to create/update a workspace role that includes project P1 but lists environment E2 (an environment that belongs to P2).
* Observe the response: workspace-role.service.ts is expected to detect the mismatch and return a 404 Not Found with message "Environment ... is not part of project ...".
- Check the authority path:
* If you provide a correct project->environment mapping, but the requestor lacks READ\_ENVIRONMENT (or other expected authorities like UPDATE\_WORKSPACE\_ROLE), authorizationService.authorizeUserAccessToEnvironment will throw Unauthorized/Forbidden. Ensure test user has the required authorities when the test expects success.
- Fix the tests (common issues & fixes):
* If a test was disabled because it used an environment from a different project but expected a different error (or success), restore the test and:
* Either adjust the test to expect NotFound (because parseProjectEnvironmentsDBOperation intentionally rejects cross-project environments).
* Or adjust test fixtures so the environment is created in the same project as the project entry in the request.
* For authority-related tests:
* Make sure the test user has the necessary authorities before calling the API. Grant a role to the test user with READ\_WORKSPACE, UPDATE\_WORKSPACE\_ROLE, READ\_PROJECT, READ\_ENVIRONMENT when the test expects role creation/update to succeed.
* If a test is verifying denial paths (should not be able to create/update), ensure the user has fewer/missing authorities and assert the API returns Unauthorized/Forbidden appropriately.
- Where to change code if the behavior is incorrect:
* If you decide the server should validate authority first (instead of ownership first), modify parseProjectEnvironmentsDBOperation ordering; but be careful: the current behavior (check membership/ownership first, then authorization) is consistent with tests that expect NotFound for cross-project envs.
- If messages or status codes are inconsistent with tests, adjust the thrown exceptions in parseProjectEnvironmentsDBOperation (NotFoundException) or in authorizationService (Unauthorized/Forbidden) to align with the test expectations — but prefer changing tests to reflect correct domain behavior unless an API bug is proven.
- Useful files to inspect or instrument:
* packages/schema/src/workspace-role/index.ts (payload schema)
* apps/api/src/workspace-role/workspace-role.service.ts (parseProjectEnvironmentsDBOperation and create/update logic)
* apps/api/src/auth/service/authorization.service.ts (authorizeUserAccessToEnvironment)
* apps/api/src/auth/service/authority-checker.service.ts (lower-level authority checks)
* packages/api-client/tests/workspace-role.spec.ts (test setup and expected assertions)
Concrete actionable checklist to resolve the issue in the repository
Re-enable the disabled tests.
-
For each disabled test, verify which of the two server behaviors it is validating:
- Ownership validation (environment belongs to project) → expect 404 Not Found. Ensure test builds a request with a mis-matched environment and asserts a 404 and the message text if desired.
Authorization validation (user needs the listed authorities) → ensure test grants the required authorities to the requester when expecting success; when expecting failure, ensure the user lacks at least one required authority and assert Unauthorized/Forbidden.
-
Run the tests locally (or in CI):
- The workspace-role e2e test file: packages/api-client/tests/workspace-role.spec.ts is a good starting point.
- Add logging in workspace-role.service.ts parseProjectEnvironmentsDBOperation if you need to trace why an environment was rejected.
If the test failure shows unexpected status codes or messages, adjust the service's thrown exceptions to match the documented API or update the tests to assert the documented behavior.
-
If you change server code, add unit tests in apps/api for parseProjectEnvironmentsDBOperation to cover:
- environment not in project → NotFound
- environment in project but user lacks READ_ENVIRONMENT → Unauthorized/Forbidden
- environment in project and user has authority → success path produces correct DB ops
Short example of the expected behavior (mapping to the tests listed in the issue)
-
"should not be able to update the workspace role with environments that do not belong to the project":
- Call update role with project P1 + env from P2 → server should return 404 Not Found (environment not part of project).
-
"should be able to add environment access for projects to the role with READ_WORKSPACE, UPDATE_WORKSPACE_ROLE, READ_PROJECT, READ_ENVIRONMENT authorities":
- Ensure requestor has those authorities (assign a role in test fixture) → call create/update and assert success and that the role contains the project->environment mapping in response.
-
"should be able to add environment access for projects to the role":
- Same as above but likely using normal granted authorities; ensure fixtures grant appropriate permissions.
-
"should not be able to create workspace role where environments do not belong to the project":
- Create role payload with mismatched environment → expect 404 Not Found.
Where to look in code to make changes or confirm behavior
-
parseProjectEnvironmentsDBOperation in:
- apps/api/src/workspace-role/workspace-role.service.ts
-
authorization plumbing:
- apps/api/src/auth/service/authorization.service.ts
- apps/api/src/auth/service/authority-checker.service.ts
-
request/response schema:
- packages/schema/src/workspace-role/index.ts
-
tests:
- packages/api-client/tests/workspace-role.spec.ts
Use Case
Goal: Create/update a workspace role that grants access to specific projects and, within those projects, specific environments.
Client payload shape (what the API expects)
1. name, description, authorities (array), projectEnvironments: \[{ projectSlug, environmentSlugs: \[slug\] }\]
- Server behavior (what the code enforces)
1. For each projectEnvironments entry, resolve projectSlug -> projectId (getProjectSlugToIdMap).
2. For each environmentSlug in environmentSlugs:
* Verify that environmentSlug exists AND belongs to that project (prisma.environment.findFirst where slug + projectId). If not, throw NotFoundException (404) with a body "Environment not found" / "Environment X is not part of project Y".
- Authorize the acting user has READ_ENVIRONMENT for that environment (authorizationService.authorizeUserAccessToEnvironment). This also verifies workspace-level access and permitted authorities (via authorityCheckerService).
- Finally, upsert projectWorkspaceRoleAssociation rows connecting roleId<>projectId and connect the environment slugs.
Why the workspace-role tests were failing
-
The failing/disabled tests covered two distinct types of validation:
- Ownership validation: environment must belong to the project in the request. If not, server should reject with 404 (Not Found).
- Authorization validation: even if environment belongs to project, the acting user must have proper authorities (READ_ENVIRONMENT, UPDATE_WORKSPACE_ROLE, READ_PROJECT, READ_WORKSPACE_ROLE) to add environment access to a role.
-
Typical causes for failures:
- Tests that assert 404 were previously commented out or used incorrect fixtures (e.g., environment not created, or created but in same project), so the server behavior didn't match test expectations.
- Tests that assert successful updates weren't ensuring the test actor had the required authorities (so server returned Unauthorized/401/403 instead of 200).
-
My PR re-enabled those tests and fixed fixtures by:
- Creating environments in the correct projects (dev, stage).
- Explicitly setting the membership role authorities (e.g., update the workspace member role to include READ_ENVIRONMENT and UPDATE_WORKSPACE_ROLE) when the test expected updates to succeed.
- Using consistent test header 'x-e2e-user-email' (alice.email) for requests.
Walkthrough of the two failing test categories and what to change
- "Environment doesn't belong to the project" (ownership)
-
Server side: parseProjectEnvironmentsDBOperation checks project membership:
- It runs prisma.environment.findFirst({ where: { slug: environmentSlug, projectId } })
- If not found, it throws: throw new NotFoundException(constructErrorBody('Environment not found', Environment ${environmentSlug} is not part of project ${pe.projectSlug}))
-
Tests expecting 404 must:
- Create a workspace and two projects in that workspace.
- Ensure the environment used in the request was created in THE OTHER project, or not created at all (but prefer creating in other project to avoid false positives).
- Make a POST/PUT with projectSlug set to projectA and environmentSlugs containing an environment slug that actually belongs to projectB.
- Assert statusCode === 404.
- "Adding environment access requires specific authorities" (authorization)
Server side: after ownership check, parseProjectEnvironmentsDBOperation calls: await this.authorizationService.authorizeUserAccessToEnvironment({ user, slug: environmentSlug, authorities: [Authority.READ_ENVIRONMENT] })
That code path will call authorityCheckerService.checkAuthorityOverEnvironment which checks the acting user's permitted authorities for that environment (it can be from project-level or workspace-level roles).
-
Tests expecting success must ensure the acting user has:
- READ_WORKSPACE (if required by the flow), UPDATE_WORKSPACE_ROLE, READ_PROJECT, READ_ENVIRONMENT — in practice, the tests can mutate an existing workspace role for the test user to include these authorities and also connect appropriate project/environment associations so the user has project-scoped authority.
Exact test changes — enable one disabled test and adjust fixture
Below is a concrete enabled test snippet for packages/api-client/tests/workspace-role.spec.ts that re-enables the “should not be able to create workspace role where environments do not belong to the project” scenario.
This patch enables the test and makes the fixture explicit: create two projects, create an environment in projectB, then attempt to create a workspace role that references projectA + the environment slug from projectB. The test asserts 404.
packages/api-client/tests/workspace-role.spec.ts
// (This is the test snippet to add/enable within the existing test suite)
it('should not be able to create workspace role where environments do not belong to the project', async () => {
// Precondition: two projects exist in the workspace
// projects[0] -> projectA, projects[1] -> projectB (test fixture established earlier in the file)
// Create an environment in projectB that should NOT be valid for projectA
await prisma.environment.create({
-
Notes about placement:
- Put this enabled test near the other workspace-role create tests after the workspace + project fixtures are created and available.
- Ensure projects variable references are the ones created during beforeAll/beforeEach steps in that test file.
A directly analogous enabled e2e test (what PR 1156 did)
-
PR 1156 enabled very similar tests in apps/api/src/workspace-role/workspace-role.e2e.spec.ts — it:
- Created dev/stage environments in the proper projects,
- Updated the Member workspace role to add project associations and required authorities for the actor,
- Re-enabled the previously commented tests and asserted status codes 200 or 404 as appropriate.
Small server-side unit test to lock in parseProjectEnvironmentsDBOperation behavior
I recommend adding a unit test that calls parseProjectEnvironmentsDBOperation (or exercises the public create/update API) with a project-environment mismatch and asserts a NotFoundException is thrown.
Example test file (unit) to confirm rejection for cross-project environment:
apps/api/src/workspace-role/workspace-role.service.spec.ts
import { Test, TestingModule } from '@nestjs/testing'
import { WorkspaceRoleService } from './workspace-role.service'
import { AuthorizationService } from '@/auth/service/authorization.service'
import { PrismaService } from '@/prisma/prisma.service'
import { NotFoundException } from '@nestjs/common'
Proposed server code changes — minimal / optional
The current server code in parseProjectEnvironmentsDBOperation is already correct about ownership-first validation (it checks slug + projectId before calling authorizeUserAccessToEnvironment). The PR you provided does not require a behavior change on the server — tests should reflect the server semantics.
If you did want to change server behavior, the main two options are:
1. Keep current semantics (ownership check first, then authorization). Pros: deterministic 404 for cross-project envs; avoids running authorization queries for environment slugs that don't belong to the provided project. Cons: reveals that the environment slug exists in a different project (server replies 404 with message referencing project mismatch).
2. Run authorization first (i.e., check the caller's permissions to read the environment before checking project membership). Pros: could collapse some error channels and avoid disclosing cross-project existence if the caller lacks permission. Cons: often you still need to fetch the environment by slug (so you still learn about its existence) — and the authorization check typically requires fetching the environment anyway to know project/workspace id. It also changes the status codes returned for the cross-project case (may return 401 instead of 404), which is a breaking API semantics change for clients that currently rely on 404 in that case.
- Recommendation: do NOT change the ordering of checks unless there's a clear security/privacy reason. The current behavior (404 on ownership mismatch) matches the tests in PR 1156 and is explicit. Prefer to fix test fixtures and grant/revoke authorities in test setup rather than altering server semantics.
Why the PR's approach (what you already did in PR 1156) is correct
-
You re-enabled tests and fixed fixtures instead of changing server semantics. Specifically:
- Created missing environments in the appropriate projects before calling the API.
- Updated the test user/role authorities where tests expected success.
- Asserted the correct status codes: 404 for ownership mismatch and 200 for successful updates when authorities exist.
This is minimal and non-breaking.
Trade-offs if you change server behavior (ownership vs authorization ordering)
-
Changing the order to authorize first:
- May return 401/403 instead of 404 for the cross-project environment case — tests and clients must be updated.
- Could obscure whether an environment exists in another project (slight security benefit).
- Could introduce additional database lookups if you need to check membership as well; may not yield significant performance gain.
-
Keeping ownership-first (current):
- Clear, deterministic 404 response for mismatched environment ownership.
- Tests remain simple: assert 404 for the cross-project case.
- Slight information disclosure (you show that environment exists but in a different project), which may be acceptable in an internal system like Keyshade.
Summary of exact artifacts to change (concrete)
Add/enable the test in packages/api-client/tests/workspace-role.spec.ts as above (or in apps/api/src/workspace-role/workspace-role.e2e.spec.ts — PR 1156 already did the same).
Add a small unit test in apps/api/src/workspace-role/workspace-role.service.spec.ts to lock in behavior of parseProjectEnvironmentsDBOperation for cross-project envs (example above).
-
Ensure any tests that expect successful updates explicitly:
- create the environments in the projects they want to reference
- grant the test actor role(s) the required authorities (UPDATE_WORKSPACE_ROLE, READ_PROJECT, READ_ENVIRONMENT) before calling create/update
Challenges
I reviewed PR #1156 and walked the tests and service code. Below I list concrete challenges we encountered while making this PR and practical mitigations. I narrate what I checked and what I'd recommend next.
What I did (quick recap)
Inspected the workspace-role service logic (parseProjectEnvironmentsDBOperation) and the authorization flow.
Reviewed the test files that were re-enabled/modified in the PR (workspace-role e2e/spec files).
Confirmed the PR fixes were basically test-fixture and authority-setup issues (creating environments in right projects, granting the test user required authorities), not major server-behavior changes.
Verified the PR merged successfully but noted a set of practical challenges that often show up in this work.
Challenges encountered (with examples and mitigations)
- Test fixture brittleness and DB state coupling
Problem: E2E tests depend on precise workspace/project/environment setup and roles. Small differences in fixture creation order or leftover DB rows cause flakiness.
Example: A test assumed "production" env exists in projectA but CI used a different test run that already had production in projectB -> unexpected 200 instead of 404.
-
Mitigation:
- Always create and assert fixtures in test setup (do not assume names).
- Use randomized slugs (e.g., slug-<timestamp>-<random>) to avoid collisions across parallel CI jobs.
- Add teardown/cleanup that removes created rows or run tests inside isolated databases/transactions.
- Authority/permission complexity in tests
Problem: Tests that check success paths require several authorities (READ_WORKSPACE, UPDATE_WORKSPACE_ROLE, READ_PROJECT, READ_ENVIRONMENT). If the test doesn't explicitly grant them, the server returns Unauthorized and the test fails.
Example: Test enabling environment access but not updating the Member role to include UPDATE_WORKSPACE_ROLE.
-
Mitigation:
- Explicitly set up the actor’s roles/authorities per test case.
- Add helper functions in tests for "grantMemberAuthoritiesForEnvironmentAccess" to make fixtures clear.
- Keep negative/positive authority tests separate and deterministic.
- Merge conflicts (simple but annoying)
Problem: Enabling tests and modifying the same test files or fixtures on different branches led to conflicts (e.g., two branches each uncommented different blocks of tests).
Example: Two branches changed apps/api/src/workspace-role/workspace-role.e2e.spec.ts: one enabled a test, another refactored a beforeEach. Merge resulted in conflict markers around large commented blocks.
-
Mitigation:
- Rebase frequently on the develop branch to reduce long-lived divergences.
- Split large test-enabling changes into smaller PRs (enable + fix fixtures + authority wiring).
- When resolving conflicts, run the full test suite locally before pushing the merge commit.
- Creating temporary test files or helper tests "to be removed later" causing issues
Problem: Developers often create throwaway tests / temporary files to debug behavior and forget to remove them. These can be picked up by CI and cause nondeterministic runs or duplicate test names.
Example: Added apps/api/src/workspace-role/tmp-debug.spec.ts to iterate quickly, later removed locally but remained in another branch → CI fails or duplicate DB setup occurs.
-
Mitigation:
- Avoid committing throwaway test files. If necessary, keep them in local-only branches or protect with a naming convention excluded from CI.
- Add CI test discovery rules to ignore test files matching tmp-*.spec.ts, if you must have temporary tests.
- Review diffs carefully before commit and run git status to ensure no debug/test files are included.
- Test duplication across packages (api-client vs apps/api)
Problem: Similar workspace-role tests existed in both packages/api-client/tests and apps/api e2e test files; enabling one copy and forgetting to update the other caused inconsistent behavior.
Example: One test in packages/api-client used the API client and different headers than the e2e server-side test; runtime differences led to mismatched assertions.
-
Mitigation:
- Consolidate canonical e2e tests in one place (prefer apps/api e2e that hits the server directly), keep api-client tests focused on client behavior.
- If duplication is unavoidable, keep test expectations aligned and use shared helpers/mocks.
- Race conditions in setup/cleanup (parallel tests)
Problem: Creating projects/environments concurrently across parallelized test jobs (or even within same job) created timing-based failures.
-
Mitigation:
- Make tests idempotent and isolate data per test.
- Use unique slugs per test run.
- Consider serializing tests that share DB resources or use test DB instances per worker.
- Error/status-code semantics and client expectations
Problem: Tests rely on specific status codes (404 vs 401). If server-side code ordering changes (ownership check vs authorization check) the codes change and many tests break.
Example: If you move authorization first, a cross-project environment case might return 401 instead of 404.
-
Mitigation:
- Keep server semantics stable; if a change is required, update all tests and document the change.
- Add unit tests that assert the contract (e.g., cross-project environment -> NotFound).
- CI flakiness from environment variables and headers
Problem: Some tests rely on headers like 'x-e2e-user-email'. If CI environment variable email differs or parallel runs reuse the same email, tests can mutate shared state.
-
Mitigation:
- Generate a test-specific email per run (e.g., alice+<run-id>@keyshade.xyz).
- Make test harness read and propagate headers from the same fixture source.
- Commit noise and history (multiple tiny commits while fixing test)
Problem: WIP commits to enable tests, debug, revert, add files, remove files created a messy branch and bigger diffs, increasing review overhead and merge conflicts.
-
Mitigation:
- Squash local fixup commits before opening PR or use interactive rebase to tidy commits.
- Keep PRs small and focused: one PR for enabling tests and fixtures, another for refactors.
- Accidental API contract changes vs test fixes
Problem: When tests fail, there’s a temptation to change server code to make tests pass instead of fixing tests. That can mask real bugs or break clients.
-
Mitigation:
- Prefer fixing tests/fixtures if the server behavior matches documented contract.
- If server needs change, write unit tests to cover new contract and get reviewer approval.
Concrete, practical steps I recommend next
-
Add a small checklist to PR templates for test-enabling PRs:
- Confirm unique slugs in fixtures
- Confirm actor authorities are set explicitly
- Run full test suite locally (or run the specific e2e group) before pushing
- Clean up any temporary test files
Add a helper in tests for deterministic unique slugs and user emails to avoid collisions in CI.
Consider adding a lint/CI guard that fails the build when "tmp-*.spec.ts" or "debug" test files are found in the commit.
If you expect future merges to touch the same big e2e files, coordinate branches or split PRs (e.g., create "enable tests" PR and a follow-up "refactor fixtures" PR).
Solution
Here is a summarized plan of my solution:
Lock the contract with unit tests by adding unit tests that assert parseProjectEnvironmentsDBOperation rejects cross-project environment slugs with NotFound. This prevents accidental server reordering from changing semantics. Make tests deterministic by ensuring every test explicitly creates its fixtures (workspace, project, environment) and set unique slugs per run, which will explicitly grant the required authorities to the test actor in success-path tests.
Prevent accidental temporary test files by adding the pre-commit hook above and a CI check to refuse pushes that include tmp/debug test files. Keep good merge strategies and PR hygiene by keeping PRs small and focused (enable tests separately from refactors). Make sure to rebase frequently and squash debug commits and add a PR/TODO checklist that requires: unique slugs, explicit authority setup, no temporary files, and local e2e run before requesting review. Consolidate test suites by deciding canonical location for workspace-role e2e tests (prefer apps/api e2e) and remove duplicates, or centralize test helpers. Here is my final PR solving the issue over a month:
Conclusion
In conclusion, Keyshade’s clean, well-structured codebase and deterministic tests make development faster, safer, and more collaborative. Clear intent, consistent conventions, and reliable CI build trust between maintainers and contributors—enabling focused changes, quicker reviews, and confident releases.
Top comments (0)