DEV Community

Cover image for Perfect Reactive Dependency Tracking
Jin
Jin

Posted on • Originally published at mol.hyoo.ru

Perfect Reactive Dependency Tracking

All states in our application must be interconnected and thus form a connected directed acyclic graph (DAG). Each node in this graph has two types of neighbors:

  • Pub(lisher) - the source of truth on the basis of which this state is built.
  • Sub(scriber) - a dependent state for which the given is the source of truth.

The list of publishers is usually updated entirely. If some node did not appear in the new list, then an unsubscribe should occur. The composition of subscribers changes arbitrarily, but connections between publishers and subscribers are always two-way.

We need a list of publishers so that we know who to unsubscribe from. And a list of subscribers to notify them about changes in a timely manner. These links have other useful uses. In particular, when debugging, you can walk through them and see who really depends on whom.

Fibers

Let's combine the state, the formula for calculating it, lists of neighbors and other meta information into one abstraction: fiber is a memoizable task with reactive revalidation. Naively in code it can be represented like this:

// Meta Size: 216B+
// Edge Cost: 16B
// Allocations: 6
interface Fiber<
    State = unknown,
    Host = unknown,
    Args = unknown[]
> extends Object { // 12B
    host: Host // 4B for calculation context
    task: ( this: Host, args: Args )=> State // 4B for calculation function
    args: Args // 4B + 16B + 8B+ for calculation arguments
    state: State // 4B for calculation result
    cursor: number // 4B for current status or publisher tracking position
    pubs: Set< Fiber > // 4B + 16B + 60B+ for publishers
    subs: Set< Fiber > // 4B + 16B + 60B+ for subscribers
}
Enter fullscreen mode Exit fullscreen mode

But there are the following problems:

  • Consumes a lot of memory.
  • There is a lot of memory allocation going on.
  • Data is highly fragmented in memory.
  • Constant rehashing of internal hash tables occurs.
  • Basic operations, although they usually occur in O(1), but are heavy in themselves.
  • To determine publishers requiring unsubscription, several memory allocations are required.

The most optimal thing is to get rid of hash tables and combine all dynamic data in one single array, which will require only 3 memory allocations (for the object itself, for the static part of the array and dynamic):

// Meta Size: 64B+
// Edge Cost: 16B
// Allocations: 3
interface Fiber<
    State = unknown,
    Host = unknown,
    Args = unknown[]
> extends Object { // 12B
    host: Host // 4B for context
    task: ( this: Host, args: Args )=> State // 4B for calculation
    data: Array< Args[number] | Fiber | number > // 28B+ for arguments, publishers, subscribers
    state: State // 4B for result
    cursor: number // 4B for current status or publisher tracking position
    pub_from: number // 4B for offset of publishers in data
    sub_from: number // 4B for offset of subscribers in data
}
Enter fullscreen mode Exit fullscreen mode

As you can see, we moved all parts that vary in size (arguments, publishers, subscribers) into array elements, leaving only indices in the object fields that divide the array into sections. This allowed us to reduce reactivity overhead by several times.

It would be possible to save even more memory, at the same time reducing the number of memory allocations to 2, if you inherit from a native array rather than aggregate it, but this is somewhat slower, clutters the object with array methods, and is not very convenient for debugging.

Graphically, the structure of connections between two neighbors can be represented by the following example, where fiber A depends on B:

Each link occupies two array elements: the first contains a link to its neighbor, and the second contains the index at which the reverse link is located in this neighbor. Thus, all our connections are not just two-way, but we always know the offsets in the array, which makes it easy to add, remove and move them around the array in O(1).

Note that the subscribers always lie in a continuous, unordered heap at the end. But publishers are always ordered, so they may contain holes. These holes appear from temporary publishers, which self-destruct after calculating the subscriber, so these holes do not grow uncontrollably.

It can also be noted that the same fibers can be linked several times. This happens when the same fiber is accessed several times from another of the same fiber. This could have been avoided, but the benefits gained would not outweigh the costs of deduplication. In any case, we can recommend, if possible, to save the value received from the channel into a local variable, so as not to yank the reactive machinery once again:

title_double() {
    const title = this.title()
    return `${title}:${title}`
}
Enter fullscreen mode Exit fullscreen mode

Publishers

Sometimes we don’t need a whole fiber with its formulas, states, etc., but something simpler. For example, we want to make some arbitrary state observable and that’s it. Let's throw out everything unnecessary so that the minimal node of our reactive graph - the publisher remains:

// Meta Size: 44B
// Edge Cost: 16B
// Allocations: 3
interface Pub extends Object { // 12B 
    data: Array< Args[number] | Fiber | number > // 28B
    sub_from: number // 4B
}
Enter fullscreen mode Exit fullscreen mode

To demonstrate how it works, let's make a regular local variable reactive:

const pub = new $mol_wire_pub

let _counter = 0

const counter = ( next?: number )=> {

    if( next === undefined ) {
        pub.promote()
        return _counter
    }

    if( Object.is( next, _counter ) ) return _counter

    pub.emit()

    return _counter = next
}
Enter fullscreen mode Exit fullscreen mode

Now, if we work with this variable through the channel we created, then:

  • When reading, the current subscriber will automatically subscribe to our publisher.
  • When recording, all subscribers of the publisher will be notified of the change.

Thus, we can make any state reactive, even one that does not belong to us. For example, the current page address:

const pub = new $mol_wire_pub

window.addEventListener( 'popstate', ()=> pub.emit() )
window.addEventListener( 'hashchange', ()=> pub.emit() )

const href = ( next?: string )=> {

    if( next === undefined ) {
        pub.promote()
    } else if( document.location.href !== next ) {
        document.location.href = next
        pub.emit()
    }

    return document.location.href
}
Enter fullscreen mode Exit fullscreen mode

Let's assemble this module into a separate bundle and post it in NPM: mol_wire_pub. It allows you to make any of your structures observable, adding only 1.5KB to the weight of your library.

So, for example, we have our own implementation of CRDT $hyoo_crowd, which can be used the old fashioned way, without reactivity, but in a reactive environment it is automatically becomes observable without any dancing with a tambourine.

Therefore, if you are the author of a library that works with state, then I encourage you to add integration with mol_wire_pub so that your users do not have a headache about tracking changes. To illustrate, let's make an ordinary set observable. To do this, we will inherit from the native one and overload all methods:

namespace $ {
    export class $mol_wire_set< Value > extends Set< Value > {

        pub = new $mol_wire_pub

        // Some accessors

        has( value: Value ) {
            this.pub.promote()
            return super.has( value )
        }

        get size() {
            this.pub.promote()
            return super.size   
        }

        // Some mutators

        add( value: Value ) {
            if( super.has( value ) ) return this
            super.add( value )
            this.pub.emit()
            return this
        }

        delete( value: Value ) {
            const res = super.delete( value )
            if( res ) this.pub.emit()
            return res
        }

    }

}
Enter fullscreen mode Exit fullscreen mode

As you can see, all methods can be divided into two types: reading and changing. In the reading ones we call .promote(), and in the changing ones - .emit(), if the changes actually happened.

We have omitted some methods here. The complete set can be found in the sources $mol_wire_set.

Subscribers

To keep track of observable states, we may need another lightweight abstraction - a subscriber:

// Meta Size: 52B
// Edge Cost: 16B
// Allocations: 3
interface Sub extends Object { // 12B
    data: Array< Args[number] | Fiber | number > // 28B
    sub_from: number // 4B
    pub_from: number // 4B
    cursor: number // 4B
}
Enter fullscreen mode Exit fullscreen mode

Typically, the subscriber itself can act as a publisher, and the logic of two-way links is common, so it makes sense to combine them:

// Meta Size: 52B
// Edge Cost: 16B
// Allocations: 3
interface PubSub extends Pub { // 44B
    pub_from: number // 4B
    cursor: number // 4B
}
Enter fullscreen mode Exit fullscreen mode

Now, let's look at the dependency tracking loop. The idea here is simple:

  • The subscriber places itself in the global channel $mol_wire_auto() and sets its cursor to 0.
  • Then it reaches out to publishers in a roundabout way via .promote().
  • The publisher looks into that global channel and automatically links to the subscriber at its cursor position and increments it.
  • If there is already another publisher in this position, it flies to the end of the list of publishers.
  • If there is already a subscriber in this place, then it is first deported to the end of the array.
  • At the end, the subscriber cleans up the tails, removing unnecessary publishers, and filling their place with subscribers from the end of the array.

As a result, we receive an organized, up-to-date list of publishers and automatic unsubscribe from publishers who are no longer on this list.

You may have realized that you could immediately unsubscribe from an unnecessary publisher, and not temporarily move it to the end. But if you do this, then if the subscriber’s position changes, it may happen that first there will be an unsubscribe, and only then a subscription. And if we were the only subscriber to this publisher, then it will self-destruct, which can be much more difficult than rearranging several elements in an array.

A simple example of a subscriber working with publishers:

const susi = new $mol_wire_pub_sub
const pepe = new $mol_wire_pub
const lola = new $mol_wire_pub

const backup = susi.track_on() // Begin auto wire
try {
    touch() // Auto subscribe Susi to Pepe and sometimes to Lola
} finally {
    susi.track_cut() // Unsubscribe Susi from unpromoted pubs
    susi.track_off( backup ) // Stop auto wire
}

function touch() {

    // Dynamic subscriber
    if( Math.random() < .5 ) lola.promote()

    // Static subscriber
    pepe.promote()

}
Enter fullscreen mode Exit fullscreen mode

Please note that we first backup the previous value of the global variable, and then restore it from the backup. This is necessary so that auto-tracking can be nested within each other.

It would be possible to wrap the logic for working with backup and try-finally in a wrapper around the closure, but then we would have a lot of intermediate functions on the stack, which would significantly reduce the depth of calls available to the application programmer.

It would be possible to return not a link, but a lightweight object with a rollback method:

const tracker = susi.track()
try {
    touch()
} finally {
    tracker.stop( !!'cut needless pubs' )
}
Enter fullscreen mode Exit fullscreen mode

But then we would get extra memory allocation in very hot code.

We should also talk about the .track_cut() call. When we finish tracking publishers, we have a choice: to unsubscribe from publishers that we previously tracked but have not reached now, or not. So, this method goes through untouched publishers and performs these unsubscribes.

It usually makes sense to call it when a computation has completed normally. If we exited the calculation without completing it ahead of schedule (for example, we returned control because we were waiting for a download from the server), then it is better not to unsubscribe from anyone, since we may soon need those publishers when the download is completed.

Top comments (0)