Fathom: On RPC

This is the third part of a series of devlogs. I invite you to take a look at the first post for the project brief if you're confused about why Fathom exists at all.

You can also keep track of the project's progress at the GitHub repository for the project.

Hello! It's been a hot minute. I lost power for a week, and had a whole host of other issues to deal with these past few days. Nevertheless, here we are!

In honour of the finalization of the HTTP/3 spec (IETF RFC 9144) yesterday, I figured I'd announce (such as it is) my decision to write a 'custom' RPC implementation over QUIC, HTTP/3's new transport protocol intended to replace TCP. This has been the result of several days' thinking and experimentation, and I'd like to discuss my reasoning for this decision in this post. At their most basic, it boils down to performance, comprehension, and control.

However, before we start, I want to give a brief explanation of what RPC is, and why it's important. Simply, RPC stands for Remote Procedure Call, and is an abstraction over some form of call-and-response mechanism that allows two (or more) processes to interact with each other, much like class methods allow two objects to interact with one another. As long as one object (or process) implements an interface, other entities can interact with that object or process through that interface. It's like Object Oriented Programming except your objects are processes!

What that out of the way, let's talk about the biggest reason to create a custom RPC framework is needed. Most simply, it is control. I want to be able to plunge through the depths of the RPC code to understand why something performs the way that it does, and then be able to do something about it. It is my experience so far that the larger the library and the greater its complexity, the more opaque it is to the developer. Creating a custom framework permits me to implement the absolute simplest code necessary to achieve the functionality that I need without having to worry about everything else that comes with the library.

Everything else follows from that decision. Latency, throughput, and power consumption (all grouped together into 'performance') are more easily understood and dissected when you have control over the code base and can freely plunge the depths whenever you feel like it, making changes as best suits your use case. And the comprehensibility of a single-use code base is almost always superior to that of a general purpose library.

This might all sound like NIH syndrome, and indeed I certainly detect some whiffs of that kind of reasoning myself. However, since this is a learning project, I am prepared to tolerate some of that kind of reasoning here, at least for the moment.

Whilst that may be the case, I think it is nevertheless important to have a discussion of what this means for the project as a whole. First and foremost, this will severely impact the velocity of the project. The time before user-facing features will be visible are extended indefinitely until the basic RPC framework is complete, a non-trivial task by itself. This has knock-on effects, including the increasing likelihood of waning interest of you the reader as the project moves more slowly than is perhaps desirable. It also increases the likelihood that something else will come up, and (having limited functionality), I will drop this project in favour of something else.

These are all important considerations to make, and I am uncertain how to handle them. If you have any experiences and stories about make/use decisions, or anything else you'd like to share, please leave a comment!

Moving on from RPC, I think it's worthwhile to discuss how things have been going in a more general sense. For one thing, you may note that there appears to have been a consolidation of effort into app-server, especially in the test branch. This is a phenomenon that I have noticed often in my projects, where all efforts at organizing things top-down inevitably fail. You'd think I'd have learned by now!

I think that this is a symptom of a sort of uninformed flailing that I tend to do when I don't really know what's going on. Early in a project, when the true complexity of what the project seeks to accomplish is not evident and only the broad-strokes components are visible, it's easy to partition things into logical sections. However, this kind of partitioning inevitably affects the way I think about modules, with mental frames that have to be 'loaded' and 'unloaded' when moving between them. Just like context switches, except defined artificially with no real benefit early on.

As projects grow, it's nice to be able to reason about small units of code independently (just as with microservices), but early development really seems to suffer; even more than I initially thought. With Fathom, every change I made in terms of API primitives or separation of concerns inevitably led me to touch many separate crates all at once, imposing multiple context switches and massive overhead. So I decided to once again contract the code, this time entirely within app-server and app-desktop. With the current pace of development, anything that can improve the pace of development is very welcome.

As for the tasks for the coming days, the priority is going to be getting the basic RPC system up and running. I have (many) ideas about how it's going to work, and the next post will probably be a write-up about it design. The goal here is to have a final, runnable example working by the end of the week, and a write-up complete and posted sometime next week.