DEV Community

Karthik Gundu
Karthik Gundu

Posted on

Building an A/B Testing Prototype in RUXAILAB (Without Breaking the System)

There’s a big difference between building a feature in isolation and adding one to a living product.

This A/B testing system started as a fairly straightforward idea: let researchers define experiments, assign participants to variants deterministically, log behavior, and visualize basic results. On paper, that sounds clean. In practice, I was integrating it into an existing Vue 3 + Vuex + Firebase application called RUXAILAB, with its own routing conventions, store patterns, Firebase setup, and a very real history of “this already works, don’t accidentally break it.”

That changed the nature of the task completely.

The goal wasn’t just to make an A/B testing demo. It was to build an MVP that felt native to the current system, respected the architecture already in place, and could survive the usual realities of frontend reactivity, Firebase emulators, callable functions, and local development drift. By the end, the feature worked end to end: create experiment, assign user, log event, view dashboard. But getting there was much more about integration and debugging than just writing new code.

Understanding the Existing System

Before building anything, I spent time reading the codebase to understand how RUXAILAB was structured.

The application already had a fairly established shape. The frontend used Vue 3 with Vuex for state management, Vue Router for navigation, and Firebase for backend infrastructure. On the backend side, Firebase wasn’t just used as a database. It was doing a lot of platform work already: Firestore for persistence, Cloud Functions for server side logic, and Auth for identity.

The first interesting thing I noticed was that the project already had an experiments area in the codebase. It wasn’t fully aligned with the MVP I needed to build, but it was enough to tell me that the right move was not to invent a second parallel system. That’s always a trap in mature codebases. You think you’re moving fast by starting fresh, but what you’re actually doing is creating a future cleanup problem.

So instead of replacing everything, I treated the existing project structure as a constraint and an advantage. I kept the experiments feature inside the established frontend pattern, reused the Vuex registration approach already present in the root store, and integrated with the project’s existing Firebase callable function style instead of introducing a separate API layer.

That early decision saved a lot of pain later. The work became less about “how do I build A/B testing?” and more about “how do I make A/B testing feel like it belongs here?”

Designing the A/B Testing System

At a high level, the system had four responsibilities.

First, experiments needed to be configurable. A researcher should be able to define a study ID, specify variants, set allocation weights, and save the experiment.

Second, assignment needed to be deterministic. If the same participant revisited the same experiment, they should land in the same variant every time. That immediately ruled out anything random at render time. Variant selection had to be stable and backend backed.

Third, events needed to be recorded in a way that could support analytics later. For the MVP, that meant a simple event collection with variant, metric, value, and timestamp.

Fourth, the system needed a minimal analytics surface. Not a full statistical engine yet, but enough to verify that the experiment was alive and behaving correctly: how many users got each variant, and how many events each variant generated.

That led to a design with a fairly clean split:

  • A frontend experiments module to handle UI, routing, and Vuex state
  • A Firestore service layer for persistence concerns
  • A controller layer to coordinate Firestore and callable functions
  • Firebase callable functions for assignment and aggregation
  • A small Python analysis stub to leave room for future statistical work

One design decision that mattered a lot was keeping the assignment logic server driven. It would have been easy to hash on the client and just write the result to Firestore, but that would have made the experiment contract much weaker. By putting assignment in a callable function, I kept the logic centralized and deterministic from one source of truth.

At the same time, I added a Firestore fallback path on the client for local development. That was not the original plan, but it became necessary after hitting emulator issues. In the end, it made the system more resilient.

Implementation Details

The Experiment Module

I created a dedicated feature module under src/features/experiments/ and split it into the usual layers: components, views, controllers, store, and services.

The UI was intentionally minimal. One view handled experiment creation and dashboard display. Another view acted as the experiment study route, where a participant is assigned a variant and sees variant specific content. I didn’t want to overdesign the interface because the point of this MVP was the experimentation flow, not a polished analytics product.

The Vuex module became the backbone of the feature. I modeled three pieces of state:

  • experiments
  • assignments
  • summaries

That was a meaningful improvement over the earlier shape because it matched the actual domain more closely. Instead of a generic “current assignment” or a single shared metrics blob, the store now tracked assignments per experiment and summaries keyed by experiment ID. That made the feature far easier to reason about once multiple experiments were in play.

Deterministic Assignment

The assignment flow was implemented with a Firebase callable function named assignVariant.

The logic was straightforward conceptually: if an assignment already existed for a user and experiment, return it. Otherwise, hash the combination of userId and experimentId, convert that hash into a bucket, and walk through the experiment’s allocation distribution until a variant is selected.

What mattered here was not the math itself, but the guarantee: the same input pair always yields the same output.

That guarantee is what makes A/B testing trustworthy. Without it, returning participants can drift between variants, and the experiment stops being an experiment.

For the participant side of the prototype, I used a lightweight mock user ID persisted in local storage. That kept the flow simple while still preserving deterministic assignment behavior.

Event Logging

Event logging was built as another callable-backed operation: logEvent.

The event model was intentionally small. Each event stores the experiment ID, variant ID, metric name, value, and timestamp. For the MVP, that was enough to track things like study start, CTA interaction, and task completion.

I wanted event logging to be extremely boring from an engineering perspective. That’s a compliment. Analytics systems get dangerous when they become too clever too early. For an MVP, boring is good. Predictable writes, simple fields, easy querying.

The participant study route logs key interactions, and the dashboard rehydrates the aggregated view from stored events.

Dashboard

The dashboard was designed to answer the immediate “is this experiment doing what I think it’s doing?” questions.

For each variant, it shows traffic allocation, assignment counts, total event counts, and metric breakdowns. I also added a simple chart so you can see assignment and event distribution at a glance.

This wasn’t meant to be a statistical significance engine yet. It was meant to be an operational dashboard for verifying the experiment loop. When you create an experiment, open the study route, trigger events, and then refresh the dashboard, you can see the full flow reflected back.

That’s the point where the system stops being abstract architecture and starts feeling real.


Challenges Faced and How They Were Solved

This part was the most interesting, because most of the actual engineering effort went into fixing integration problems rather than writing greenfield code.

The first major issue was a Vue recursive update bug. The experiment creation form used v-model with a variant editor component, and the syncing logic between parent and child looked harmless at first. But it was emitting fresh arrays and immediately re-consuming them in a watcher cycle, which caused a maximum recursive updates error inside VForm.

That kind of bug is classic Vue pain—nothing looks obviously wrong until the reactivity graph starts feeding itself. The fix was to normalize and compare the variant payload before emitting updates. In other words, only emit when the value had actually changed. Once the loop was broken, the form stopped crashing.

The second major issue was the Firebase internal error, and this one turned out to be trickier than it looked. Initially it seemed like the experiments callable itself might be broken. But after tracing the logs, the real problem was broader: the Functions emulator wasn’t booting the codebase correctly at all.

The root cause was an unrelated eager import of nodemailer in the email function module. Since the local functions/node_modules dependencies weren’t installed correctly, the emulator failed while loading the functions bundle. That meant experiment callables never really came online, and from the frontend everything surfaced as a generic internal error.

This is the kind of issue that reminds you why local dev can be deceptively hard. The failure wasn’t in the experiment logic—it was in the boot path of a different part of the backend. I fixed it by making the nodemailer import lazy so the email feature only loads that dependency when it’s actually used. That allowed unrelated functions, including the experiment endpoints, to load normally.

There was also a region alignment issue. The frontend Firebase Functions client was being initialized without matching the backend configuration. That mismatch can be subtle because the code still looks valid, but requests silently go to the wrong place. I updated the frontend initialization so the client and backend were aligned.

The emulator setup had its own configuration drift too. The local .env file pointed the Functions emulator to one port while the project’s Firebase config expected another. That kind of mismatch is easy to miss and frustrating to debug because nothing in the application code looks wrong. Fixing the .env and .env.example values brought the local environment back into alignment.

I also spent time making the frontend fail more gracefully. At one point, backend failures were bubbling up as uncaught runtime overlays in the browser. That’s a rough developer experience—and an even worse user experience. I replaced those crash paths with proper error handling and toast notifications so failures become visible but not destructive.

Finally, I had to address a product-level integration issue: A/B testing was still marked as “Coming Soon” in the method selection UI. That meant I had a working experiments module behind the scenes, but the main study creation flow was still routing users away from it. I updated the method selector so A/B testing was enabled and routed into the dedicated experiments area instead of the generic test creation wizard.

That was an important reminder that “the feature works” and “the feature is reachable” are not the same thing.

Final Working Flow

By the end of the work, the feature loop was complete.

A researcher can create an experiment with variants and allocation weights. A participant entering the study route gets assigned deterministically to a variant. Their interactions generate experiment events. The dashboard then reflects assignment distribution and event counts by variant.

That flow sounds simple when written in one paragraph, but getting it stable required touching state management, routing, Firebase Functions, Firestore access patterns, local emulator behavior, and frontend reactivity.

That’s what made the project satisfying. It wasn’t just about making the happy path run once. It was about making the entire path hold together.

Key Learnings

One of the biggest takeaways from this work is that integration complexity is often more important than feature complexity.

None of the individual parts of this system were especially exotic. A form, a hash function, a few Firestore collections, a dashboard, some callable functions. But once those pieces had to live inside an existing product with existing assumptions, every seam mattered.

I also came away reminded that debugging is architecture work. It’s easy to think of debugging as “cleanup after implementation,” but in reality it often reveals whether the system boundaries make sense. The Firestore fallback for local reliability, the cleaner separation between controller and service layers, and the improved error handling all came directly from problems encountered during debugging.

And maybe most importantly, I was reminded that a working MVP is not the same as a disposable prototype. Even when building something minimal, if it’s integrated into a real system, it deserves the same respect you’d give any production feature: clear boundaries, understandable state, graceful failure modes, and a path for future extension.

Conclusion

Building this A/B testing MVP inside RUXAILAB ended up being a lot more than adding an experiments screen.

It was an exercise in reading an existing architecture carefully, extending it without fighting it, and solving the kind of real-world issues that never show up in idealized system diagrams. The final result is a working experimentation flow with deterministic assignment, event tracking, dashboard visibility, and a foundation for future statistical analysis.

But more than that, it now feels like part of the product rather than a feature bolted onto the side.

And to me, that’s usually the difference between code that merely runs and engineering that actually lands.

Top comments (0)