Programming language of the future Part 5: Data and Ownership

#programming

A large part of state management is data management - who owns data? How are limits kept track of? How is data allocated and destroyed?

References

Our system has objects transparently migrating from server to server, so pointers are not an option - we have references instead. References can become invalid if the referenced object is destroyed, which leaves us with two options. Guarantee at the compiler and runtime level that all references remain valid while used, through lifetime tracking and the borrow checker, or force the programmer to handle the case of an invalid reference on every access.

The latter is easier, if we introduce "owning references", which there can be only one of, and that are always valid. In fact, when an object is just declared as a field of another object - that's an owned reference, and can be accessed in the straightforward manner. But this is in a way limiting - what if you only want to have a field occupied by an object sometimes - and have a null reference at other times? Then you would need a special container - an Option, a List - where you would need to handle the case of an absent object on every access.

An object is always referenced by ID - just, in local-first systems, that ID is often simply the memory location. We need to reference the same object across different machines, and in different places in memory - so we would need a proper ID, a hexadecimal number. But we don't need to ever expose it in the language - references can be regular references, with IDs staying under the hood. This has the additional benefit of having the case of "no such ID" be handled cleanly.

Memory Management

This brings us to memory management. I believe objects should be created either implicitly under the hood - such as for fields of a different object - or explicitly, as elements of an owning container. And memory management boils down to adding and removing elements to and from containers, with trees of owned references (which are always DAGs) being cleaned up automatically.

This does mean a split in the language - there is an owned reference and a weak reference, an owning Option and a weak Option, an owning List and a weak List. But I'll argue, this is worth it - we don't need a garbage collector or a reference counter, we can return references to things, and pass them all aroung the system - and know that when an object disappears from the original owning collection, that case is handled. Owned collections are like pools of objects, out of which we can build up arbitrary graphs of objects organised for a purpose - and then tear them down, safely.

Resource limits

Technically, a system doesn't need resource limits to be correct. A computation uses resources, such as memory and TCP ports, and by using them, it declares that it needs them. And whenever the system runs out of resources - it can stop the computation. So long as there is internal consistency at the moment the computation is stopped, the system stays correct. But resource are still useful - for example, a system can perform 10 easy computations concurrently, or 1 heavy one. The programmer should be able to manage, what the correct behavior is - this may mean different paid tiers of user licenses. We want to expose the mechanism for this.

I think of at least three different kinds of limits, there may be more. A counter, a capped collection, and a time-based limit (a token bucket). A counter would be managed by objects explicitly - increment, decrement. A capped collection can reference another field for the cap - and only allow addition of objects once there is room. A token bucket can be claimed by objects, but refills automatically with time.

If APIs with such resource limiters are industry practice, programmers get the flexibility of defining the correct behavior of their systems, while not sacrificing anything for it.

Dependency Injection

Lastly, I want to discuss a mechanism that is valuable to be supported at the language level - dependency injection. It doesn't necessarily need language level support - you can construct objects, and pass them as parameters when initializing other objects. But objects within our language are long-lived, there is a real possibility that the dependency lives a shorter life than the dependencies that make use of it. It would be nice if there was a mechanism to switch out dependencies at runtime.

Firstly, we would allow object to declare dependencies - with keyword "depends", a name for the dependency, and a type. It will always be a weak reference. Within the object, the field can be used as any other weak reference, with the understanding that it can be swapped out at any moment. Then, of the provider side, fields can be marked "provides" - and since they can be weak references as well. The association between a provider and a user of the dependency can be a simple field assignment.

Conclusion

This concludes the theoretical part of the blog series. I outlined my vision, now it's time to actually build it. I'm sure that as I build it, my ideas will be adjusted, but it's still better to have a plan than not to. Thank you for reading.