Model-driven telemetry

This is still WIP...

Summary: The team I'm on at work have committed to adopting xState as our state management library. The paradigm of a State Machine aligns well with our app, or any Prev ↔︎ Next data model really.

An interesting advantage of having xState model the state of an app is that it exposes an opportunity to tightly couple events, with that app’s Telemetry, or the logging/tracking/reporting around those events. In addition, with the visual representation of a state machine - something xState provides out-of-the-box - we can easily see everywhere in our app where we have telemetry coverage.

Background

“The only thing that should happen inside event listeners is sending an event”

– David Khourshid, author of xState

One of the many patterns that contribute heavily toward the mental displacement re: our codebase is the littering of arbitrary logging calls throughout it. Sometimes we’re firing these calls inside click-event handlers…

onClickHandler = async () => {
    try {
        setErrorMessage('')
        Bugsnag.leaveBreadcrumb('Auth.resendSignUp', undefined, 'request')

…where they don’t belong.

Anything that takes place as a consequence of UI code should be delegated to the parts of your app with “brains”. In this case, state. In xState:

pauseButton.onClick = (e: MouseEvent) => {
  xStateService.send({ type: 'PAUSE' })
}

Sometimes the calls are made based on inline conditionals, and that further obfuscates things:

 const onClick = (isLoading) {     
    if (isLoading)... Analytics.record.

Sometimes they’re called from "upstream" locations (business logic-y code), or from "helpers" (utils). This “sprinkle” technique is super confusing and leads to redundancy & poor coverage. You’ll find cases where there’s multiple calls to Braze, Pinpoint, what-have-you for the same thing: A user clicks to submit a form, so in the click handler we send to Pinpoint that the user is trying to sign-up, and here’s the form data as part of the payload. Upstream, at the API call, we send to Pinpoint that the user is trying to sign-up, and here’s the JSON body of the POST - which is the same content we just sent from the UI. I believe some of these 3rd-party entities rate-limit? Even if they don’t, we very well may partner with one in the future that does and that’s an unnecessary expense.

Another common transgression is logging by way of useEffect, which results in 11 useEffects existing in the same component because, well, we have 11 things to track/log. (Intentionally avoiding for the time-being the Next.js imposed requirement of asserting where the code is running).

useEffect(() => {
    Bugsnag.leaveBreadcrumb('post-verification target', state, 'state')
    void Analytics.record({
        name: PostVerificationAnalyticsIds.ACTION_TARGET_REPORT,
        attributes: state,
    })
}, [state])

All of this makes it very difficult to reason about what’s happening when, and in what order, inside the component (the case with overuse of useEffect for whatever reason).

The last and, perhaps most important, thing to call out is that laying out our logging (Telemetry) calls in this manner provides very fragile confidence that we do, in fact, have a proper understanding of the health and efficacy of the Enroll app.

Possible Answer?

We can greatly boost our telemetry confidence by getting all of our telemetry “in the same room” so that, at a glance, we can see what we’re capturing and when. Instead of having to (literally) follow the path of a DOM event to see what we’re capturing, what if we could look at a visual representation of the application’s state and see that “oh yeah, we covering that”. Turns out we can!

Event based telemetry

The xState documentation hints at the idea of centering Telemetry around application state, although not exposing formal discussion toward it. More specifically, it’s around the events that determine that state. It makes sense if you think about it. We don’t really give a shit that we’ve entered some function call. Why would we record that? Errors need to be caught and UI efficacy needs to be recorded, but when talking about tracing a user of our app through the process, we should be doing so from the user experience, and that means coupling telemetry to events and state.

Implementation

The way I’ve laid this out for the survey in Enroll 1.5 is as follows. xState…

Top comments (1)

Matt Bieber • Jul 22 '22

Motherfucker... will follow up