Vincent Davis

Posted on May 11 • Edited on May 12

Optimizing Test Design Mutation Testing, ISP, and CFG in the GBIM Project

#cicd #computerscience #softwareengineering #testing

Vincent Davis Leonard

TL;DR

I applied three test design optimization techniques (Input Space Partitioning or ISP, Control Flow Graph or CFG Analysis, and Mutation Testing) to the four main features I manage in the GBIM project. These features include account registration, account activation via token, account verification by admin, and the approval or rejection of submissions. The results include 33 new tests in the backend, 3 commits in the frontend, an expanded mutation scope, and the addition of Stryker threshold enforcement in the frontend mutation testing configuration.

Tools and Methods Used

1. Input Space Partitioning (ISP)

ISP (Ammann & Offutt, Introduction to Software Testing, 2016) is a technique that divides the input domain of a function into equivalence classes. These classes are groups of values expected to be treated similarly by the system. Instead of trying all combinations, we select one representative value per partition.

I used the base-choice coverage strategy. We choose one valid value as a baseline and then vary one characteristic per test. This approach is efficient without falling into combinatorial explosion.

Tools used

Manual analysis of the source code (authentication/serializers.py, RegisterForm.tsx)
Annotation # ISP <characteristic>.<partition> in each test for traceability

Examples of partitioned characteristics for the Register feature

Characteristic	Partition
Email	valid, no `@`, >254 char, whitespace
Password	<8 char, exactly 8 valid, exactly 7 invalid, no digit, no uppercase, valid strong
Role	`KAPRODI`, `GURU_BESAR`, `ADMIN` (blocked), invalid enum
Activation Token	valid fresh, expired, already used, malformed, missing

The base-choice input was an actual valid KAPRODI registration payload, not only an abstract test idea:

self.valid_data = {
    "email": "kaprodi@univ.ac.id",
    "password": "Password123!",
    "name": "Bapak Kaprodi",
    "role": RoleChoices.KAPRODI,
    "telephone": "08123456789",
    "perguruan_tinggi": "Universitas Indonesia",
    "program_studi": "Ilmu Komputer",
    "provinsi": "Jawa Barat",
    "kabupaten_kota": "Depok",
}

Each ISP test copies this base input and changes one characteristic. For example, the password boundary is tested by comparing exactly 8 characters against exactly 7 characters:

def test_serializer_accepts_password_exactly_8_chars(self):
    # ISP: Password.exactly_8_chars_valid
    data = self.valid_data.copy()
    data["password"] = "Passw0rd"

    serializer = RegisterSerializer(data=data)

    self.assertTrue(serializer.is_valid())


def test_serializer_rejects_password_exactly_7_chars(self):
    # ISP: Password.exactly_7_chars_invalid
    data = self.valid_data.copy()
    data["password"] = "Pas0rd!"

    serializer = RegisterSerializer(data=data)

    self.assertFalse(serializer.is_valid())
    self.assertIn("password", serializer.errors)

2. Control Flow Graph (CFG) Analysis

CFG (Ammann & Offutt, ch.7) represents the program execution flow as a graph where each node is a basic block and each edge is a conditional branch. From the CFG we identify prime paths which are the shortest non-repeating paths through all nodes.

Target module _validate_transition (pengajuan/services.py 66-75)

The actual code

66: def _validate_transition(self, previous_status: str, new_status: str) -> None:
67:     if (previous_status, new_status) not in ALLOWED_TRANSITIONS:
68:         raise ValidationError(
69:             {
70:                 "status": (
71:                     f"Transisi status dari '{previous_status}' ke '{new_status}' "
72:                     "tidak diperbolehkan."
73:                 )
74:             }
75:         )

The CFG of this code (node = line of code, edge = execution flow)

Prime paths

Path	Condition	Test
1→2→3→5	transition is not in `ALLOWED_TRANSITIONS`	`test_disetujui_to_menunggu_raises`, `test_menunggu_to_menunggu_raises`, etc
1→2→4→5	transition is in `ALLOWED_TRANSITIONS`	`test_menunggu_to_disetujui`, `test_menunggu_to_ditolak`, etc

This state machine CFG has 4 legal transitions and 5+ illegal transitions that all must have tests.

Tools

Manual source code analysis + Mermaid diagram (planned in the cfg/ folder)
Annotation # CFG: from_state→to_state in the tests

3. Mutation Testing

Mutation testing (Jia & Harman, IEEE TSE 2011) measures the quality of a test suite by injecting small defects or mutants into the source code. Examples include changing > to >=, or removing a condition. The process then checks if the test suite detects it (meaning the mutant is "killed"). The mutation score is calculated by dividing the killed mutants by the total mutants.

Tools

Backend mutmut (Python) with operators including AOR, LCR, ROR, and statement deletion
Frontend Stryker Mutator (JS/TS) with operators including arithmetic, logical, equality, string, and array

Both tools are complementary. The mutmut tool is more aggressive in statement deletion, while Stryker is richer in JS/TS level operators.

Application to the Project and Evidence of Improvement

ISP Application

Before this sprint, the test_register_serializers.py test only covered the happy path and one or two errors. After the ISP audit, the following changes were made.

New backend tests added

File	New Partitions	Count
`test_register_serializers.py`	email whitespace, email >254 char, inactive duplicate email, password exactly 8, password no digit, password no uppercase, password whitespace, role ADMIN blocked, role null, telephone invalid format	13
`test_activation_views.py`	token malformed, account already active	2
`test_views_admin_account_verification.py`	filter role invalid enum, filter status invalid enum, search no match, pagination beyond max	4
`test_views_admin_account_verification_detail.py`	approve AKTIF (idempotent), approve DITOLAK (reactivation), reject DITOLAK, reject non-existent, unauthorized non-admin	9

The backend changes were not just additional files; the tests explicitly encode the ISP partitions. For example, these tests cover email formatting, password boundary, and blocked roles in RegisterSerializer:

def test_serializer_rejects_whitespace_only_email(self):
    # ISP: Email.whitespace_only
    data = self.valid_data.copy()
    data["email"] = "   "

    serializer = RegisterSerializer(data=data)

    self.assertFalse(serializer.is_valid())
    self.assertIn("email", serializer.errors)


def test_serializer_accepts_password_exactly_8_chars(self):
    # ISP: Password.exactly_8_chars_valid
    data = self.valid_data.copy()
    data["password"] = "Passw0rd"

    serializer = RegisterSerializer(data=data)

    self.assertTrue(serializer.is_valid())


def test_serializer_rejects_admin_registration(self):
    # ISP: Role.admin_blocked
    admin_data = self.valid_data.copy()
    admin_data.pop("telephone", None)
    admin_data["role"] = RoleChoices.ADMIN

    serializer = RegisterSerializer(data=admin_data)

    self.assertFalse(serializer.is_valid())
    self.assertEqual(
        str(serializer.errors["role"][0]),
        "Registrasi peran Admin tidak diizinkan melalui endpoint ini.",
    )

The same ISP style was also applied outside registration. For activation, malformed tokens are tested as a separate token partition:

def test_activation_rejects_malformed_token(self):
    # ISP: Token.malformed
    response = self.client.post(self.activation_url, {"token": "not-a-valid-token"})

    self.assertEqual(response.status_code, status.HTTP_400_BAD_REQUEST)

For admin verification, invalid filters and empty search results are tested explicitly so the admin list endpoint does not silently accept invalid query parameters:

def test_filter_status_invalid_enum_returns_bad_request(self):
    # ISP: Filter.status_invalid_enum
    self.client.force_authenticate(user=self.admin)

    response = self.client.get(self.url, {"status": "UNKNOWN"})

    self.assertEqual(response.status_code, status.HTTP_400_BAD_REQUEST)

New frontend tests added

File	New Partitions
`RegisterForm.test.tsx`	API 429 rate limit response (ISP ApiError.status.429)
`useUpdateStatusPengajuan.test.ts`	MENUNGGU to DITOLAK transition (CFG happy-path DITOLAK)

For the frontend, the newly added ISP partition checks that a rate-limit response is displayed as a formatted client error, not as an unstructured JSON dump:

// ISP: ApiError.status.429 - rate limited response shows formatted body
it("renders formatted body for a 429 rate-limit error", () => {
  const apiErr = new ApiError(429, "Too Many Requests", {
    detail: "Terlalu banyak percobaan.",
  });

  mockUseRegister.mockReturnValue({
    register: mockRegister,
    loading: false,
    error: apiErr,
    data: null,
  });

  render(<RegisterForm />);

  const alert = screen.getByRole("alert");
  expect(alert).toHaveTextContent("detail: Terlalu banyak percobaan.");
  expect(alert).not.toHaveTextContent("{");
});

CFG Application

Previously, StatusChangeService._validate_transition was only tested for 2 to 3 legal transitions. After the CFG analysis, I added tests for all 5 illegal transitions.

def test_illegal_transition_menunggu_to_menunggu_raises_validation_error(self):
    # CFG: MENUNGGU→MENUNGGU (self-loop)
    self.pengajuan.status = Pengajuan.Status.MENUNGGU
    self.pengajuan.save(update_fields=["status"])
    service = self._build_service()

    with self.assertRaises(ValidationError) as ctx:
        service.update_status(self.pengajuan, Pengajuan.Status.MENUNGGU)

    self.assertIn("status", ctx.exception.detail)
    self.assertEqual(len(self.fake_notifier.calls), 0)


def test_illegal_transition_disetujui_to_menunggu_raises_validation_error(self):
    # CFG: DISETUJUI→MENUNGGU (backward)
    self.pengajuan.status = Pengajuan.Status.DISETUJUI
    self.pengajuan.save(update_fields=["status"])
    service = self._build_service()

    with self.assertRaises(ValidationError) as ctx:
        service.update_status(self.pengajuan, Pengajuan.Status.MENUNGGU)

    self.assertIn("status", ctx.exception.detail)
    self.assertEqual(len(self.fake_notifier.calls), 0)

A total of 5 new CFG tests for illegal transitions were added along with annotations on the existing tests.

Mutation Testing for Developer Confidence

We realized that achieving 100% line coverage can sometimes be a vanity metric if the assertions evaluating that code are weak. To make our TDD process more rigorous and genuinely helpful for developers, we integrated Mutation Testing algorithms.

Mutation testing forces the insertion of small, artificial bugs ("Mutants") into the original code. If a mutant survives the test suite, it means the test case is weak. By actively hunting down these mutants, we made our testing suite bulletproof.

A Note on Engineering Pragmatism: Setting the 80% Threshold

We deliberately established an 80% mutation score as our baseline threshold for this project. While achieving a 100% score is an ideal, practical software engineering requires balancing test confidence with delivery speed. Reaching this >80% mark provides strong confidence that our core business logic and API contracts are well-protected.

We carefully review the remaining survivors and accept them only when they are equivalent, non-critical, or have low business impact. For example, Stryker generated a mutant that removed the .trim() method from our empty-field validation (formData.email.trim() !== "").

While technically a logic change, writing a highly specific test case to input blank spaces (" ") just to kill this mutant is unnecessary, as invalid formats are ultimately caught by subsequent regex and strict backend validations. Accepting these specific survivors ensures our test suite remains highly effective without yielding to over-engineering.

A. Backend Side Using mutmut

To test our backend, we used mutmut. Initially, the tool generated 40 mutants across our codebase. Our original test suite managed to kill 31 of them, leaving 9 survivors and resulting in a baseline mutation score of 77.5%.

By analyzing these survivors, we realized our assertions weren't strict enough. As seen in the report above, a mutant successfully survived by altering a response dictionary key named "detail" to "XXdetailXX". We killed that mutant by updating our test to explicitly require self.assertIn("detail", response.data) in the "Submission not found" scenario. After these targeted improvements, we successfully killed 33 mutants, reducing the survivors to just 7 and bumping our final mutation score to 82.5%.

B. Frontend Side Using Stryker Mutator

On the frontend side, we utilized Stryker Mutator. Because of the complex UI logic, the tool generated a massive 280 mutants. Our comprehensive frontend test suite successfully tracked down and killed 223 of them. With only 55 mutants surviving, we achieved a strong mutation score of 80.29%, proving that our frontend tests are highly resilient against unexpected logic changes.

Benefits, Concrete Data, and Best Practices

Quantitative Data

Metric	Before	After
Backend mutation score with mutmut	31/40 killed, 9 survived = 77.5%	33/40 killed, 7 survived = 82.5%
Frontend mutation score with Stryker	threshold set at 80%	223/280 killed, 55 survived = 80.29%, passing the threshold
mutmut scope (paths_to_mutate)	4 paths (views only)	9 paths (views, services, serializers)
Stryker threshold FE	none	break 70, high 80
Documented ISP partitions (BE)	~5 (implicit)	28 explicit and annotated
CFG paths covered (StatusChangeService)	2 to 3 legal	4 legal and 5 illegal

Connection to Literature

ISP by Ammann & Offutt (2016)
Ammann and Offutt define Input Space Partitioning as the division of an input domain into partitions where each must be represented by at least one test. The base-choice coverage strategy I used is their recommendation for balancing coverage with efficiency.

Mutation Testing by Jia & Harman (2011)
The survey by Jia and Harman shows that the mutation score is a more reliable predictor of test suite quality than statement coverage. They also documented that equivalent mutants and low-value survivors are major practical challenges. In this project, I reviewed surviving mutants instead of blindly chasing a 100% score; for example, the Stryker .trim() survivor was accepted because stricter backend validation already rejects the same invalid input class.

Petrović & Ivanković (ICSE 2021) on Mutation Testing
This paper reports the results of deploying mutation testing at scale at Google. Developers who receive mutation testing feedback consistently write better tests. The Stryker threshold follows this principle by turning mutation score into an explicit project-level quality threshold instead of leaving it as an optional report.

Meszaros, xUnit Test Patterns (2007)
Test smells such as Assertion Roulette (multiple assertions without messages) and Obscure Test (tests that are difficult to understand) are anti-patterns I avoid. Every new test has one clear assertion and an ISP or CFG comment.

Google Testing Blog on Mutation Testing at Google (2018)
Google recommends focusing on killed mutants per time rather than the raw mutation score. This means prioritizing mutants in frequently changing code, which aligns with the auth and pengajuan features in this sprint.

Best Practices Followed

Each-choice minimum with base-choice for crucial parameters (Ammann & Offutt recommendation)
Mutation score as a project quality threshold rather than a vanity metric (Google engineering practice)
Test isolation via override_settings and locmem cache for rate limiter tests to avoid flakiness
Annotation-based traceability (# ISP, # CFG) to allow coverage auditing without reading all the code

Critique of Previous Testing and Measurable Improvements

Anti-Patterns Found in the Old Test Suite

Anti-Pattern 1 Happy-Path-Only Register Serializer Test

Location authentication/tests/test_register_serializers.py (before this sprint)

Problem The registration test only verified that valid data passed the serializer. There were no tests for the following scenarios.

Passwords with a length of exactly 7 (boundary condition that should fail) versus exactly 8 (should pass)
Emails that are already registered but not yet activated (which behave differently from active ones)
The ADMIN role which should not be able to self-register

Why this is weak A mutant changing len(password) < 8 to len(password) <= 8 or len(password) < 7 would survive because no test could distinguish the difference. This is a classic Boundary Value Analysis gap based on Myers' The Art of Software Testing (1979).

Fix Added test_serializer_rejects_password_exactly_7_chars, test_serializer_rejects_email_already_registered_inactive, and test_serializer_rejects_admin_registration with # ISP annotations for traceability.

Anti-Pattern 2 StatusChangeService Did Not Test Illegal Transitions

Location pengajuan/tests/test_services.py (before this sprint)

Problem There were only tests for legal transitions (MENUNGGU to DISETUJUI, and MENUNGGU to DITOLAK). There were no tests verifying that backward transitions (DISETUJUI to MENUNGGU) or self-loops (MENUNGGU to MENUNGGU) raise an exception.

Why this is weak A mutant removing one condition in the ALLOWED_TRANSITIONS dictionary would survive. A state machine that is not tested exhaustively could allow status transitions that corrupt data integrity.

Following the principles from Meszaros (xUnit Test Patterns), tests must verify error behavior just as rigorously as happy behavior.

Fix Added 5 CFG tests for illegal transitions with assertions that ensure exceptions are raised.

Anti-Pattern 3 Frontend Tests Did Not Cover API Error Variants

Location tests/features/authentication/components/RegisterForm.test.tsx

Problem The registration form tests only mocked the success scenario (201) and generic errors. There were no tests for the following situations.

HTTP 429 rate limit response which should display a throttling message
HTTP 400 with field-specific errors which should map to the correct fields

Why this is weak A Stryker mutant changing the HTTP status check condition would survive. This is also an Over-Mocked Service smell according to Meszaros, as overly generic mocks do not exercise real branch logic.

Fix Added the RegisterForm 429 rate-limit test with the annotation // ISP: ApiError.status.429.

Measurable Improvements (Before and After)

Dimension	Before	After	Delta
Annotated ISP partitions	0 (implicit)	28 (explicit)	+28
Covered CFG illegal transitions	0	5	+5
Mutmut scope (auth and pengajuan)	0 paths	5 new paths	baseline capture enabled
Backend mutation score	77.5%	82.5%	+5 percentage points
Frontend mutation score	80% threshold	80.29%	passed threshold
Stryker threshold FE	none	break 70, high 80	configured project threshold

Connection to Industry Standards

Google researchers (Petrović et al., ICSE 2021) found that mutation testing is most effective when integrated into the developer workflow as automated feedback rather than just a final report. The Stryker threshold I set implements this pattern by giving the team a concrete mutation-score threshold to enforce when mutation testing is run.

The Stryker Mutator whitepaper (2023) recommends setting coverageAnalysis to "perTest" (which is already active in the config) to isolate mutants to the specific tests that cover them. This reduces false positives and execution time.

Commit Links

BE-GBM MR !158

Commit	Message
`b7564fc5`	`chore(testing): expand mutmut scope to authentication and pengajuan services`
`0c774595`	`[GREEN] test(auth): add ISP partitions for register serializer, activation, and admin verification`
`a2def037`	`[GREEN] test(pengajuan): add CFG prime path coverage for StatusChangeService state machine`

MR Link: https://gitlab.cs.ui.ac.id/ppl-fasilkom-ui/2026/kelas-d/group1-gb/be-gbm/-/merge_requests/158

fe-gbm MR !142

Commit	Message
`f18590fe`	`chore(testing): enforce Stryker mutation threshold at 80% high and 70% break`
`42d9b028`	`[GREEN] test(auth): add ISP partitions for RegisterForm and useActivation hook`
`19328728`	`[GREEN] test(pengajuan): add CFG branch coverage for useUpdateStatusPengajuan`

MR Link: https://gitlab.cs.ui.ac.id/ppl-fasilkom-ui/2026/kelas-d/group1-gb/fe-gbm/-/merge_requests/142

References

Ammann, P. & Offutt, J. (2016). Introduction to Software Testing (2nd ed.). Cambridge University Press. (ISP ch.6, Graph Coverage ch.7).
Jia, Y. & Harman, M. (2011). "An Analysis and Survey of the Development of Mutation Testing." IEEE Transactions on Software Engineering, 37(5), 649-678.
Petrović, G., Ivanković, M., Fraser, G., & Just, R. (2021). "Does Mutation Testing Improve Testing Practices?" ICSE 2021.
Petrović, G. & Ivanković, M. (2018). "State of Mutation Testing at Google." Google Testing Blog.
Meszaros, G. (2007). xUnit Test Patterns Refactoring Test Code. Addison-Wesley.
Stryker Mutator. (2023). "Mutation Testing in Practice." stryker-mutator.io.
Myers, G.J. (1979). The Art of Software Testing. Wiley. (Boundary Value Analysis).

DEV Community