DEV Community

Vincent Davis
Vincent Davis

Posted on

Optimizing Test Design Mutation Testing, ISP, and CFG in the GBIM Project

Vincent Davis Leonard


TL;DR

I applied three test design optimization techniques (Input Space Partitioning or ISP, Control Flow Graph or CFG Analysis, and Mutation Testing) to the four main features I manage in the GBIM project. These features include account registration, account activation via token, account verification by admin, and the approval or rejection of submissions. The results include 33 new tests in the backend, 3 commits in the frontend, an expanded mutation scope, and the addition of Stryker threshold enforcement to the CI pipeline.


Tools and Methods Used

1. Input Space Partitioning (ISP)

ISP (Ammann & Offutt, Introduction to Software Testing, 2016) is a technique that divides the input domain of a function into equivalence classes. These classes are groups of values expected to be treated similarly by the system. Instead of trying all combinations, we select one representative value per partition.

I used the base-choice coverage strategy. We choose one valid value as a baseline and then vary one characteristic per test. This approach is efficient without falling into combinatorial explosion.

Tools used

  • Manual analysis of the source code (authentication/serializers.py, RegisterForm.tsx)
  • Annotation # ISP <characteristic>.<partition> in each test for traceability

Examples of partitioned characteristics for the Register feature

Characteristic Partition
Email valid, no @, active duplicate, inactive duplicate, >254 char, whitespace
Password <8 char, exactly 8 without digits, no uppercase, valid strong
Role KAPRODI, GURU_BESAR, ADMIN (blocked), invalid enum
Activation Token valid fresh, expired, already used, malformed, missing

2. Control Flow Graph (CFG) Analysis

CFG (Ammann & Offutt, ch.7) represents the program execution flow as a graph where each node is a basic block and each edge is a conditional branch. From the CFG we identify prime paths which are the shortest non-repeating paths through all nodes.

Target module _validate_transition (pengajuan/services.py 66-75)

The actual code

66: def _validate_transition(self, previous_status: str, new_status: str) -> None:
67:     if (previous_status, new_status) not in ALLOWED_TRANSITIONS:
68:         raise ValidationError(
69:             {
70:                 "status": (
71:                     f"Transisi status dari '{previous_status}' ke '{new_status}' "
72:                     "tidak diperbolehkan."
73:                 )
74:             }
75:         )

Enter fullscreen mode Exit fullscreen mode

The CFG of this code (node = line of code, edge = execution flow)

Prime paths

Path Condition Test
1→2→3→5 transition is not in ALLOWED_TRANSITIONS test_disetujui_to_menunggu_raises, test_menunggu_to_menunggu_raises, etc
1→2→4→5 transition is in ALLOWED_TRANSITIONS test_menunggu_to_disetujui, test_menunggu_to_ditolak, etc

This state machine CFG has 4 legal transitions and 5+ illegal transitions that all must have tests.

Tools

  • Manual source code analysis + Mermaid diagram (planned in the cfg/ folder)
  • Annotation # CFG: from_state→to_state in the tests

3. Mutation Testing

Mutation testing (Jia & Harman, IEEE TSE 2011) measures the quality of a test suite by injecting small defects or mutants into the source code. Examples include changing > to >=, or removing a condition. The process then checks if the test suite detects it (meaning the mutant is "killed"). The mutation score is calculated by dividing the killed mutants by the total mutants.

Tools

  • Backend mutmut (Python) with operators including AOR, LCR, ROR, and statement deletion
  • Frontend Stryker Mutator (JS/TS) with operators including arithmetic, logical, equality, string, and array

Both tools are complementary. The mutmut tool is more aggressive in statement deletion, while Stryker is richer in JS/TS level operators.


Application to the Project and Evidence of Improvement

ISP Application

Before this sprint, the test_register_serializers.py test only covered the happy path and one or two errors. After the ISP audit, the following changes were made.

New backend tests added

File New Partitions Count
test_register_serializers.py email whitespace, email >254 char, inactive duplicate email, password exactly 8, password no digit, password no uppercase, password whitespace, role ADMIN blocked, role null, telephone invalid format 13
test_activation_views.py token malformed, account already active 2
test_views_admin_account_verification.py filter role invalid enum, filter status invalid enum, search no match, pagination beyond max 4
test_views_admin_account_verification_detail.py approve AKTIF (idempotent), approve DITOLAK (reactivation), reject DITOLAK, reject non-existent, unauthorized non-admin 9

New frontend tests added

File New Partitions
RegisterForm.test.tsx API 429 rate limit response (ISP ApiError.status.429)
useUpdateStatusPengajuan.test.ts MENUNGGU to DITOLAK transition (CFG happy-path DITOLAK)

CFG Application

Previously, StatusChangeService._validate_transition was only tested for 2 to 3 legal transitions. After the CFG analysis, I added tests for all 5 illegal transitions.


# CFG MENUNGGU to MENUNGGU (illegal self-loop)
def test_validate_transition_menunggu_to_menunggu_raises(self):
    ...

# CFG DISETUJUI to MENUNGGU (illegal backward)
def test_validate_transition_disetujui_to_menunggu_raises(self):
    ...

Enter fullscreen mode Exit fullscreen mode

A total of 5 new CFG tests for illegal transitions were added along with annotations on the existing tests.

mutmut Scope Expansion

Previously, the pyproject.toml file only mutated pengajuan/views/ and kegiatan/views/. Now it includes the following paths.

paths_to_mutate = [
    "pengajuan/views/views_admin.py",
    "pengajuan/views/views_kaprodi.py",
    "kegiatan/views/views_kegiatan.py",
    "statistik_prodi/views.py",
    "authentication/serializers.py",   # new addition
    "authentication/services.py",      # new addition
    "authentication/views.py",         # new addition
    "pengajuan/services.py",           # new addition
    "pengajuan/serializers.py",        # new addition
]

Enter fullscreen mode Exit fullscreen mode

Stryker Threshold Enforcement

The following configuration was added to stryker.config.mjs.

thresholds { high: 80, low: 70, break: 70 }

Enter fullscreen mode Exit fullscreen mode

This means the frontend CI pipeline will automatically fail if the mutation score drops below 70 percent. This ensures the test quality does not regress.


Benefits, Concrete Data, and Best Practices

Quantitative Data

Metric Before After
Number of BE tests (auth and pengajuan scope) ~26 tests in target files +33 = ~59 tests
Number of FE tests (3 target files) 97 tests +2 = 99 tests
mutmut scope (paths_to_mutate) 4 paths (views only) 9 paths (views, services, serializers)
Stryker threshold FE none break 70, high 80
Documented ISP partitions (BE) ~5 (implicit) 28 explicit and annotated
CFG paths covered (StatusChangeService) 2 to 3 legal 4 legal and 5 illegal

Connection to Literature

ISP by Ammann & Offutt (2016)
Ammann and Offutt define Input Space Partitioning as the division of an input domain into partitions where each must be represented by at least one test. The base-choice coverage strategy I used is their recommendation for balancing coverage with efficiency.

Mutation Testing by Jia & Harman (2011)
The survey by Jia and Harman shows that the mutation score is a more reliable predictor of test suite quality than statement coverage. They also documented that equivalent mutants (mutants that are semantically identical to the original code) are a major challenge. I document these cases when found.

Petrović & Ivanković (ICSE 2021) on Mutation Testing
This paper reports the results of deploying mutation testing at scale at Google. Developers who receive mutation testing feedback consistently write better tests. The Stryker threshold I implemented follows this principle by maintaining the mutation score as an automatic quality gate.

Meszaros, xUnit Test Patterns (2007)
Test smells such as Assertion Roulette (multiple assertions without messages) and Obscure Test (tests that are difficult to understand) are anti-patterns I avoid. Every new test has one clear assertion and an ISP or CFG comment.

Google Testing Blog on Mutation Testing at Google (2018)
Google recommends focusing on killed mutants per time rather than the raw mutation score. This means prioritizing mutants in frequently changing code, which aligns with the auth and pengajuan features in this sprint.

Best Practices Followed

  1. Each-choice minimum with base-choice for crucial parameters (Ammann & Offutt recommendation)
  2. Mutation score as a quality gate rather than an optional metric (Google engineering practice)
  3. Test isolation via override_settings and locmem cache for rate limiter tests to avoid flakiness
  4. Annotation-based traceability (# ISP, # CFG) to allow coverage auditing without reading all the code

Critique of Previous Testing and Measurable Improvements

Anti-Patterns Found in the Old Test Suite

Anti-Pattern 1 Happy-Path-Only Register Serializer Test

Location authentication/tests/test_register_serializers.py (before this sprint)

Problem The registration test only verified that valid data passed the serializer. There were no tests for the following scenarios.

  • Passwords with a length of exactly 7 (boundary condition that should fail) versus exactly 8 (should pass)
  • Emails that are already registered but not yet activated (which behave differently from active ones)
  • The ADMIN role which should not be able to self-register

Why this is weak A mutant changing len(password) < 8 to len(password) <= 8 or len(password) < 7 would survive because no test could distinguish the difference. This is a classic Boundary Value Analysis gap based on Myers' The Art of Software Testing (1979).

Fix Added test_password_exactly_seven_chars_invalid, test_email_already_registered_inactive, and test_role_admin_cannot_register with the annotation # ISP password.length_boundary.

Anti-Pattern 2 StatusChangeService Did Not Test Illegal Transitions

Location pengajuan/tests/test_services.py (before this sprint)

Problem There were only tests for legal transitions (MENUNGGU to DISETUJUI, and MENUNGGU to DITOLAK). There were no tests verifying that backward transitions (DISETUJUI to MENUNGGU) or self-loops (MENUNGGU to MENUNGGU) raise an exception.

Why this is weak A mutant removing one condition in the ALLOWED_TRANSITIONS dictionary would survive. A state machine that is not tested exhaustively could allow status transitions that corrupt data integrity.

Following the principles from Meszaros (xUnit Test Patterns), tests must verify error behavior just as rigorously as happy behavior.

Fix Added 5 CFG tests for illegal transitions with assertions that ensure exceptions are raised.

Anti-Pattern 3 Frontend Tests Did Not Cover API Error Variants

Location tests/features/authentication/components/RegisterForm.test.tsx

Problem The registration form tests only mocked the success scenario (201) and generic errors. There were no tests for the following situations.

  • HTTP 429 rate limit response which should display a throttling message
  • HTTP 400 with field-specific errors which should map to the correct fields

Why this is weak A Stryker mutant changing the HTTP status check condition would survive. This is also an Over-Mocked Service smell according to Meszaros, as overly generic mocks do not exercise real branch logic.

Fix Added test_register_form_shows_rate_limit_error with the annotation // ISP ApiError.status.429.

Measurable Improvements (Before and After)

Dimension Before After Delta
Annotated ISP partitions 0 (implicit) 28 (explicit) +28
Covered CFG illegal transitions 0 5 +5
Mutmut scope (auth and pengajuan) 0 paths 5 new paths baseline capture enabled
Stryker threshold FE none break 70, high 80 active CI guard
Test methods for auth and pengajuan BE ~26 ~59 +33
Test methods FE (3 target files) 97 99 +2

Connection to Industry Standards

Google researchers (Petrović et al., ICSE 2021) found that mutation testing is most effective when integrated into the developer workflow as automated feedback rather than just a final report. The Stryker threshold I set implements this pattern by automatically blocking any merge request to staging if the mutation score regresses.

The Stryker Mutator whitepaper (2023) recommends setting coverageAnalysis to "perTest" (which is already active in the config) to isolate mutants to the specific tests that cover them. This reduces false positives and execution time.


Commit Links

BE-GBM MR !158

Commit Message
b7564fc5 chore(testing) expand mutmut scope to authentication and pengajuan services
0c774595 [GREEN] test(auth) add ISP partitions for register serializer, activation, and admin verification
a2def037 [GREEN] test(pengajuan) add CFG prime path coverage for StatusChangeService state machine

MR Link: [https://gitlab.cs.ui.ac.id/ppl-fasilkom-ui/2026/kelas-d/group1-gb/be-gbm/-/merge_requests/158]

fe-gbm MR !142

Commit Message
f18590fe chore(testing) enforce Stryker mutation threshold at 80% high and 70% break
42d9b028 [GREEN] test(auth) add ISP partitions for RegisterForm and useActivation hook
19328728 [GREEN] test(pengajuan) add CFG branch coverage for useUpdateStatusPengajuan

MR Link: [https://gitlab.cs.ui.ac.id/ppl-fasilkom-ui/2026/kelas-d/group1-gb/fe-gbm/-/merge_requests/142]

References

  1. Ammann, P. & Offutt, J. (2016). Introduction to Software Testing (2nd ed.). Cambridge University Press. (ISP ch.6, Graph Coverage ch.7).
  2. Jia, Y. & Harman, M. (2011). "An Analysis and Survey of the Development of Mutation Testing." IEEE Transactions on Software Engineering, 37(5), 649-678.
  3. Petrović, G., Ivanković, M., Fraser, G., & Just, R. (2021). "Does Mutation Testing Improve Testing Practices?" ICSE 2021.
  4. Petrović, G. & Ivanković, M. (2018). "State of Mutation Testing at Google." Google Testing Blog.
  5. Meszaros, G. (2007). xUnit Test Patterns Refactoring Test Code. Addison-Wesley.
  6. Stryker Mutator. (2023). "Mutation Testing in Practice." stryker-mutator.io.
  7. Myers, G.J. (1979). The Art of Software Testing. Wiley. (Boundary Value Analysis).

Top comments (0)