<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Benjamin Kampmann</title>
    <description>The latest articles on DEV Community by Benjamin Kampmann (@gnunicorn).</description>
    <link>https://dev.to/gnunicorn</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F351998%2Ffeba6203-e9c4-48ff-9119-0676fe6b0ab3.png</url>
      <title>DEV Community: Benjamin Kampmann</title>
      <link>https://dev.to/gnunicorn</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/gnunicorn"/>
    <language>en</language>
    <item>
      <title>Fighting the Client Spaghetti Monster with Rust Traits</title>
      <dc:creator>Benjamin Kampmann</dc:creator>
      <pubDate>Tue, 09 Dec 2025 00:00:00 +0000</pubDate>
      <link>https://dev.to/gnunicorn/fighting-the-client-spaghetti-monster-with-rust-traits-9cd</link>
      <guid>https://dev.to/gnunicorn/fighting-the-client-spaghetti-monster-with-rust-traits-9cd</guid>
      <description>&lt;p&gt;&lt;strong&gt;tl;dr&lt;/strong&gt;: In Rust, "trait composition" are a neat way to keep code, where a lot of components come together and need to be piped up, clean and avoid spaghettification.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1gt2mceztw1oqezz4v8c.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1gt2mceztw1oqezz4v8c.jpg" alt="colorful Spaghetti, nicely sorted in brown paper bags" width="800" height="530"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;a href="http://www.gnunicorn.org/writings/spaghetti-monster-clients-rust-traits-final-boss/" rel="noopener noreferrer"&gt;Originally posted on my blog&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;A major part of my almost two decade long career in programming has been spent working on “SDKs” in Rust. By which I mean building and maintaining complex systems as libraries used by other developers to implement applications on top of. I did this back at Immmer (now defunct), for Parity with Substrate Core/Client as well as its inner on-chain application SDK to the matrix-rust-sdk and last but not least at &lt;a href="https://acter.global/" rel="noopener noreferrer"&gt;Acter&lt;/a&gt; for the Acter App and then the &lt;a href="https://hellozoe.app/" rel="noopener noreferrer"&gt;Zoe&lt;/a&gt; (&lt;a href="https://github.com/acterglobal/zoe-relay" rel="noopener noreferrer"&gt;relay&lt;/a&gt;) system.&lt;/p&gt;

&lt;p&gt;For a while, but especially during latest iteration, I have been wondering about that highest layer architecture. How to design that client, where all these subcomponents are piped together. How to design it in a way that stays flexible for yourself as well as others, yet robust and ideally testable. How to avoid spaghettification of the client, even if the underlying components are complex trait-based systems themselves.&lt;/p&gt;

&lt;p&gt;As we have to cover a lot of surface area itself, I will not be discussing trait themselves too much – &lt;a href="https://doc.rust-lang.org/book/ch10-02-traits.html" rel="noopener noreferrer"&gt;check the corresponding chapter in the excellent Rust book&lt;/a&gt;, if you are looking for that – but assume you have an understanding of traits, trait bounds and have implemented them in Rust. I will throw around some almost-real code and examples without asking and expect the reader to be able to parse and understand them without much help. As I want to focus on the higher level “how do we use this”-architecture perspective.&lt;/p&gt;

&lt;h2&gt;
  
  
  Traits in SDKs
&lt;/h2&gt;

&lt;p&gt;As with any big task, the best way to tackle it is by splitting them into smaller, manageable tasks and implement these one by one. The same is true for building up large SDKs. Often times they contain various components, like a storage layer; network or communication components; some internal state machine for the actual domain specific logic; and maybe some developer-front facing API or even UI components. To make implementing more manageable, it is common place to split them up into the separate independent components, sometimes even as separate crates, and provide an outer interface.&lt;/p&gt;

&lt;p&gt;In the SDK world you often find that these components internally need to be plugable themselves though. Like a storage component might be implemented with an embedded SQLite for mobile Apps, with some SQL-backend-service or NoSQL-Database on the Server and with IndexDB in the Browser (with Wasm). Generally, the outer composed system doesn’t really have to care which of these is being used and thus it can be up to that component to define that. A common way to provide this abstraction is by defining a trait for that lowest layer and have these various specific parts implement them. Then the higher layer and also the layers on top can focus on their specific side of things.&lt;/p&gt;

&lt;p&gt;This also nicely allows for these implementations that come with their own implementations to be only pulled. Or only compile for the targets that actually use them, as well as introduce new implementations via feature-flags gradually into production. It’s a pretty neat way of organizing the code. In the Matrix SDK we have that layer for implementing storage for example, and though not strictly because of the trait, the SDK even provides a macro to generate the entire test suite against your custom implementation that you can use.&lt;/p&gt;

&lt;h3&gt;
  
  
  To the mock
&lt;/h3&gt;

&lt;p&gt;Having these traits brings in another nice benefit: Mocking. As the higher level components might have their own logic (like caching or ordering or something) testing often requires to set up the lower level component(s) as well. If instead, you defined that interface in a trait, you can implement various Mock-types to test a range of scenarios for your functions and focus on this specific logic. What sounds tedious at first becomes a breeze with the help of crates like &lt;a href="https://docs.rs/mockall/latest/mockall/" rel="noopener noreferrer"&gt;&lt;code&gt;mockall&lt;/code&gt;&lt;/a&gt;. It’s a lot easier and often faster than setting up that lower level layer just to test that the component pulls the objects from the store and returns them sorted regardless of the underlying order.&lt;/p&gt;

&lt;h3&gt;
  
  
  Middleware-ing
&lt;/h3&gt;

&lt;p&gt;Similarly, by having the traits define the interfaces, you can add functionally nicely in a middleware-kinda fashion similar to what is done many web servers. Think of a caching layer on top of the database as an example. That caching layer can wrap anything implementing the trait while also implementing the trait itself. That way you can implement a LRU cache or something, regardless of the underlying storage types. As the interface is just the same trait again, you can mock the lower layer, ensuring a good test coverage on exactly what this layer does. Further you can just plug this “middleware” into the higher level layer without any further changes. This &lt;a href="https://github.com/acterglobal/a3/blob/f72a67a65f1f9710313dd91126712a03ab39e277/native/media-cache-wrapper/src/lib.rs" rel="noopener noreferrer"&gt;is how we implemented a storage layer for the Rust SDK that splits off media storage&lt;/a&gt; (before that was added to the SDK itself) and keeps them at different path (in the mobile’s “cache” directory), for example while passing along everything else to whatever inner database system was being used otherwise (e.g., SQLite).&lt;/p&gt;

&lt;h3&gt;
  
  
  But specific, sometimes
&lt;/h3&gt;

&lt;p&gt;Now, for the traits you only want to expose the common interface of course. But specific implementation sometimes still have APIs to fine tune or configure certain things - like the path for the sqlite database. You don’t want to put these on the traits as they are implementation specific and pointless for other implementations. But as traits are implemented on specific types, your concrete types can still add these helper functions and as the higher level API / SDK you often just use feature-flags to then expose them or not.&lt;/p&gt;

&lt;h2&gt;
  
  
  Composing over many traits
&lt;/h2&gt;

&lt;p&gt;Now that you understand the complexity and usage of these subcomponents, think about how you tie them all together in the &lt;code&gt;Client&lt;/code&gt;. This needs to connect these components, move messages from one component to another, for e.g. to get that messages that just came in from the network to the internal state machine. And a results from the state machine which triggers the storage layer to persist some of these changes. Of course you want the client to be as flexible over the specific implementations as possible – most of that higher level code doesn’t really differ whether the message comes from LoRa, over QUIC or libP2P. It doesn’t matter to the client whether it will be stored in an SQlite database or IndexDB either.&lt;/p&gt;

&lt;p&gt;But at times you have interdependencies, so the Rust compiler need to make sure that the type that the network layer message returns is the one that the state machine accepts. This is where things often spaghettify.&lt;/p&gt;

&lt;p&gt;At the beginning that feels reasonable, but over time it grows, and the more things are pluggable, the more generics you need to add. The client needs one generic, then another, then another… Moving from single letter to entire words, running out of words. Sooner than you think it becomes incomprehensible to follow. Not even mentioning that ever increasing tree of trait bounds you have to keep around everywhere you expose that client. Which is your main external API surface area, so you expose it &lt;em&gt;a lot&lt;/em&gt;. Brave are those, who then need to add another bound (like &lt;code&gt;Send&lt;/code&gt;) to any of the lower traits…&lt;/p&gt;

&lt;p&gt;“There must be a better way”, you think to yourself …&lt;/p&gt;

&lt;h2&gt;
  
  
  The three paths of the enlightenment
&lt;/h2&gt;

&lt;p&gt;As always, you have a few options with its various benefits and trade offs to manage this nicer. You can &lt;code&gt;Box&amp;lt;dyn Trait&amp;gt;&lt;/code&gt; it, use type aliases or compose a Trait with associated types. Let’s look at them one by one, in order of increasing complexity.&lt;/p&gt;

&lt;h3&gt;
  
  
  Type alias
&lt;/h3&gt;

&lt;p&gt;The first thing that probably comes to mind, is alias some of the types definitions to make it a bit cleaner. So you’d still have some components that are generic of some sub traits &lt;code&gt;struct GenericStateMachine&amp;lt;S: StateT, M: MessageT&amp;gt;&lt;/code&gt; that implements most of the concrete logic, but then for the production environment you have an alias &lt;code&gt;type NativeClientStateMachine = GenericStateMachine&amp;lt;NativeState, TcpMessage&amp;gt;;&lt;/code&gt; that you could use.&lt;/p&gt;

&lt;p&gt;Depending how you organize your code, the final client could really end up being a &lt;code&gt;type NativeTcpClient = GenericClient&amp;lt;NativeClientStateMachine, NativeClientStorage, TcpProtocol&amp;gt;;&lt;/code&gt; itself. And you could even have a builder that depending on the target returns one or the other type, but both have the same API implemented via the traits.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;impl&lt;/span&gt; &lt;span class="n"&gt;Builder&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nd"&gt;#[cfg(target_arch&lt;/span&gt; &lt;span class="nd"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"wasm"&lt;/span&gt;&lt;span class="nd"&gt;)]&lt;/span&gt;
    &lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Result&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;WasmClient&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;//&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nd"&gt;#[cfg(not(target_arch&lt;/span&gt; &lt;span class="nd"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"wasm"&lt;/span&gt;&lt;span class="nd"&gt;))]&lt;/span&gt;
    &lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Result&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;NativeTcpClient&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;//&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;impl&lt;/span&gt; &lt;span class="n"&gt;GenericClient&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;StateMachine&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Storage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Protocol&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;state_machine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;StateMachine&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;//&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;storage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;Storage&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;//&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Giving you all the benefits of having the concrete types, including access to the actual types, so the consumers code could even do implementation specific calls and its compile would fail if they tried to do that against a type that doesn’t implement those (e.g. because they picked a different target arch). Of course this only works as long as the compiler doesn’t force you to specify &lt;em&gt;which&lt;/em&gt; exact type you are expecting but can still infer that itself.&lt;/p&gt;

&lt;p&gt;However, you end up with rather lengthy type alias lists you need to manage, especially if you do the wrapping of middlewares I described before, which can be hard to parse and follow, &lt;a href="https://github.com/acterglobal/zoe-relay/blob/972738d4e1e08dbfc672b6c38b33e9ad4dcc8e7a/crates/client/src/client.rs#L38C1-L45C67" rel="noopener noreferrer"&gt;just check this &lt;code&gt;ZoeClientAppManager&lt;/code&gt;, which itself wraps a bunch of aliases&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;ZoeClientStorage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;SqliteMessageStorage&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;ZoeClientSessionManager&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;SessionManager&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ZoeClientStorage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ZoeClientMessageManager&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;ZoeClientGroupManager&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;GroupManager&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ZoeClientMessageManager&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;ZoeClientAppManager&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
    &lt;span class="n"&gt;AppManager&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ZoeClientMessageManager&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ZoeClientGroupManager&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ZoeClientStorage&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;ZoeClientMessageManager&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MultiRelayMessageManager&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ZoeClientStorage&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;ZoeClientBlobService&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MultiRelayBlobService&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ZoeClientStorage&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;ZoeClientFileStorage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;FileStorage&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ZoeClientBlobService&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Navigating this tree isn’t easy. Especially when debugging you can easily end up at the wrong layer and wonder why your changes aren’t showing up.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;dyn Trait&lt;/code&gt;s
&lt;/h3&gt;

&lt;p&gt;A common idea that might come to mind is to wrap the specific implementation in a new type that holds it internally in a &lt;a href="https://doc.rust-lang.org/std/keyword.dyn.html" rel="noopener noreferrer"&gt;&lt;code&gt;dyn Trait&lt;/code&gt;&lt;/a&gt;, if the trait can be made &lt;a href="https://doc.rust-lang.org/reference/items/traits.html#dyn-compatibility" rel="noopener noreferrer"&gt;&lt;code&gt;dyn&lt;/code&gt; compatible&lt;/a&gt; (formerly known as “object safety”). In practice the type most likely must be wrapped in either Box, Arc or similar - if that is what is happening already anyways then this might not be a problem. If dynamic dispatching is not too much of an overhead, this could be a viable solution.&lt;/p&gt;

&lt;p&gt;This is exactly how the Matrix Rust SDK implements the storage layer: by wrapping the &lt;a href="https://github.com/matrix-org/matrix-rust-sdk/blob/238e4e8a87baad51bcfd44c619f0caa985472cc3/crates/matrix-sdk-base/src/store/mod.rs#L179-L180" rel="noopener noreferrer"&gt;specific implementation into a &lt;code&gt;Arc&amp;lt;dyn StateStore&amp;gt;&lt;/code&gt;&lt;/a&gt; and then exposing a &lt;code&gt;StateStore&lt;/code&gt; interface without any generics.&lt;/p&gt;

&lt;p&gt;But &lt;code&gt;dyn&lt;/code&gt;s come with another drawback: the compiler forgets all notion of the concrete type. While this can be cheaper in terms of code size (as generic functions aren’t repeated for each type), it also means that our specific type “is gone”. Any other methods that this type implements outside of the trait become inaccessible. In the Matrix SDK for storage, that seems to be acceptable, as the only &lt;a href="https://github.com/matrix-org/matrix-rust-sdk/blob/main/crates/matrix-sdk/src/client/builder/mod.rs#L234-L285" rel="noopener noreferrer"&gt;implementations specific tuning happens in the builder setup&lt;/a&gt; &lt;em&gt;before&lt;/em&gt; it is passed to the &lt;code&gt;StateStore&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;But something as simple as getting implementation-specific configuration parameters returned from that type at runtime is now impossible, even if the type in question implemented it and it can be asserted that the type is the one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Trait Composition&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If dynamic dispatching isn’t feasible or the specific types needs to still be available, that alias list grows too long and becomes to tedious to update, you might come up with: a trait combining all the types – I call them composing trait. Rather than having a generic client with an increasingly growing list of generics, you define a trait that defines the specific types via associated types. This is what we have been doing in the Parity SDK and on-chain wasm state machine.&lt;/p&gt;

&lt;p&gt;The idea is to create a new &lt;code&gt;trait Configuration&lt;/code&gt; that defines all the requirements as associated types and have a client only reference that trait now. It can still return aliased or sub-types that are generic, but are then for that specific configuration. Like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;trait&lt;/span&gt; &lt;span class="n"&gt;Configuration&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Storage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;StorageC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;NetworkC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;StateMachine&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;StateMachineC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;impl&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;C&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;C&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Configuration&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;state_machine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;GenericStateMachine&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nn"&gt;C&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;StateMachineC&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;//&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;storage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;GenericStorage&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nn"&gt;C&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;StorageC&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;//&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Unfortunately, in reality this is rarely as clean. Often you find yourself needing to define the interdependencies as well. For example: the network needs to give you a specific &lt;code&gt;MessageT&lt;/code&gt; that the state machine also actually understands. Even if you use a &lt;code&gt;trait&lt;/code&gt; here, the compiler will enforce that you use the same type. As a result, you end up with even very low-level trait definitions popping up on your highest level configuration so that you can cross reference them via the associated types:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;trait&lt;/span&gt; &lt;span class="n"&gt;MessageT&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Sized&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="k"&gt;trait&lt;/span&gt; &lt;span class="n"&gt;StorageC&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MessageT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;Self&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;trait&lt;/span&gt; &lt;span class="n"&gt;NetworkC&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MessageT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;next_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Option&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;Self&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Message&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;trait&lt;/span&gt; &lt;span class="n"&gt;StateMachineC&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MessageT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Storage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;StorageC&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;Self&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Message&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;Self&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;trait&lt;/span&gt; &lt;span class="n"&gt;Configuration&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MessageT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Storage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;StorageC&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;Self&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Message&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;NetworkC&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;Self&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Message&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;StateMachine&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;StateMachineC&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Storage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;Self&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Storage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;Self&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Message&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;network&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;GenericNetwork&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;Self&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Network&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;storage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;GenericStorage&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;Self&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Storage&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;state_machine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;GenericStateMachine&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;Self&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;StateMachine&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Nice, and clean, but you can already see how it will become more complex when these traits grow in complexity. In particular when you have to do changes to some of them, it ripples through the entire system quickly with rather hairy and complex bounds that are failing in very verbose error messages. Let’s just add an &lt;code&gt;ErrorT&lt;/code&gt; type that our client might yield, when any of the inner yield an error. So the client is meant to wrap all the inner types. We add&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;trait&lt;/span&gt; &lt;span class="n"&gt;ErrorT&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

&lt;span class="k"&gt;trait&lt;/span&gt; &lt;span class="n"&gt;StorageC&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MessageT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ErrorT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;//.. to all types&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// and on the config:&lt;/span&gt;
&lt;span class="c1"&gt;//&lt;/span&gt;
&lt;span class="c1"&gt;//&lt;/span&gt;
&lt;span class="k"&gt;trait&lt;/span&gt; &lt;span class="n"&gt;Configuration&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// ...&lt;/span&gt;

    &lt;span class="c1"&gt;// gee, this is verbose...&lt;/span&gt;
    &lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ErrorT&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
        &lt;span class="nb"&gt;From&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;Self&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Storage&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;StorageC&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Error&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
        &lt;span class="nb"&gt;From&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;Self&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;StateMachine&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;StateMachineC&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Error&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
        &lt;span class="nb"&gt;From&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;Self&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Network&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;NetworkC&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Error&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It’s a bit verbose, but reasonable overall. It becomes more tricky when you actually try to implement these types as you need to make sure all the types also match up correctly. That way we are able to reduce the generics on client from many to just one. Nice. But dragging around this massive Configuration is a pain, especially for the mock-test-ability as we described before, as we have to mock all the associated types, creating a lot of glue code.&lt;/p&gt;

&lt;p&gt;So instead, what I end up doing is have anything with actual logic still be referring to the generics directly, so you can mock and test these specific ones, and have the final &lt;code&gt;Client&amp;lt;C: Configuration&amp;gt;&lt;/code&gt; just be a holder that then passes along to the specific internal type with the associated types passed in as generics.&lt;/p&gt;

&lt;p&gt;In practice it can become even more tricky if you have some of these configuration on several layers. Like in the &lt;a href="https://github.com/paritytech/substrate/blob/033d4e86cc7eff0066cd376b9375f815761d653c/client/service/src/builder.rs#L79-L93" rel="noopener noreferrer"&gt;Parity Substrate Codebase&lt;/a&gt;, to allow all clients to build on reusable CLI tooling there is a Service that can construct your client. That service requires a Configuration for Network and alike, but only a subset of what a Full Node needs and as result, that second needs to be a super set of the first. But that is a really advanced scenario, and if you have any good ideas to improve that situation, I am all ears.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion: Combined Composition
&lt;/h2&gt;

&lt;p&gt;As so often, enlightenment isn’t picking one solution but combining wisely.&lt;/p&gt;

&lt;p&gt;What you probably end up doing is a combination of these compositions types. Like in the Rust Matrix SDK, where in a lower level, the plugable storage is then held via a &lt;code&gt;dyn Trait&lt;/code&gt;, while on a higher level, you might compose a client with an “trait composition” that allows any other (rust) developer to plug and replace any of the components as they please, including yourself for platform or target specific implementations.&lt;/p&gt;

&lt;p&gt;By keeping any actual logic in the separate components with specific traits for easy mocked testing and using the “client” merely as the place were all these pipes come and plug together, you can rely on the compilers type checks as a means to ensure the correctness of the types being piped, while you have the mock tests for all the actual logic. And integration tests should cover the end-to-end functionality of the client regardless.&lt;/p&gt;

&lt;p&gt;To wrap things up nicely, you can hide that &lt;code&gt;Client&amp;lt;C&amp;gt;&lt;/code&gt; inside a type alias that itself is held by a &lt;code&gt;struct FfiClient(NativeClient);&lt;/code&gt; on which you expose a completely typed no-generics rust-external API. Put on a bow and ship it :) .&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Credits&lt;/em&gt;:  Photo taken by &lt;a href="https://unsplash.com/@spenas88" rel="noopener noreferrer"&gt;Gabriel&lt;/a&gt; (who is available for hire) and &lt;a href="https://unsplash.com/photos/brown-packs-in-blue-textile-ztpMeg3rYQ4" rel="noopener noreferrer"&gt;published on unsplash.com under a free license&lt;/a&gt;&lt;/p&gt;

</description>
      <category>rust</category>
      <category>architecture</category>
      <category>traits</category>
      <category>tech</category>
    </item>
    <item>
      <title>Beware of the DashMap deadlock</title>
      <dc:creator>Benjamin Kampmann</dc:creator>
      <pubDate>Fri, 29 Mar 2024 23:00:00 +0000</pubDate>
      <link>https://dev.to/acter/beware-of-the-dashmap-deadlock-lij</link>
      <guid>https://dev.to/acter/beware-of-the-dashmap-deadlock-lij</guid>
      <description>&lt;p&gt;Rust is famously build for the multi-threaded-processor world. From its core ownership-enforcement-model up to the type-based &lt;code&gt;Sync&lt;/code&gt; + &lt;code&gt;Send&lt;/code&gt;-types, all is around allowing the compiler to ensure memory safety and consistency across thread boundaries. And though the &lt;code&gt;std&lt;/code&gt; also has collections (like &lt;code&gt;HashMap&lt;/code&gt; and &lt;code&gt;BTreeSet&lt;/code&gt;), Atomics and Locks, once you start building real programs with Rust, probably some &lt;code&gt;tokio&lt;/code&gt; for async-support as well, these are not always sufficient.&lt;/p&gt;

&lt;h2&gt;
  
  
  DashMap for the Win
&lt;/h2&gt;

&lt;p&gt;No wonder that you can pick from a handful of libraries helping you achieve this feat, and quite a feat that is, with &lt;code&gt;DashMap&lt;/code&gt; being among the most popular with a whopping 52million downloads on crates.io at the time of this writing:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://crates.io/crates/dashmap"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--zQf-wSac--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/http://www.gnunicorn.org/assets/posts/beware-of-the-dashmap-deadlock-screenshotx15ppalrhabysr821cd7.png" alt="Screenshot of the crates.io entry for dashmap" width="800" height="166"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;52million downloads, years since publication, updated just a few months ago. That looks like a reasonably sound library. So, you start playing with it and find that its API is convenient, seems to work across &lt;code&gt;await&lt;/code&gt;s and async and all the things you've ever dreamed of. So you implement it as the main caching layer for the transient state machine of models within the core business logic of your application.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deadlocks? Really?
&lt;/h2&gt;

&lt;p&gt;I is only a long while later, until the first reports come in. &lt;a href="https://github.com/acterglobal/a3/issues/958"&gt;Sparse at first&lt;/a&gt; and &lt;a href="https://github.com/acterglobal/a3/issues/1264"&gt;unclear in its origin&lt;/a&gt;, but sometimes, it seems, your state machine processing doesn't process the events coming in. Or, better - &lt;a href="https://github.com/acterglobal/a3/pull/1479"&gt;as you learn when digging into it&lt;/a&gt; - &lt;strong&gt;their futures never resolve&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;You see, already known &lt;a href="https://github.com/xacrimon/dashmap/issues/243#issuecomment-1368180321"&gt;since at least December 2022&lt;/a&gt; is that you can use &lt;code&gt;DashMap&lt;/code&gt; in a way that can cause deadlocks &lt;em&gt;and without the compiler detecting them&lt;/em&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  A primer on dead locks
&lt;/h3&gt;

&lt;p&gt;If you have no further knowledge about locks and have never heard of deadlocks, let me give you a minimal rough cut of the problem (overly simplified): Sometimes you have memory that is accessed by multiple threads, but clearly if both write at the same time this can cause problems. Thus, the concept of "locks" was created: small pieces around the memory that you need to have hold of first before you can write to that memory. While it is locked no other thread can write to it and thus have to wait for their turn. Ensuring they all write-one-at-a-time and not in between one another.&lt;/p&gt;

&lt;p&gt;Now, how ever long you hold that lock is your prerogative and there are several problems with holding a lock very long. For example: what if your thread panics while you hold the lock? This in rust is usually referred to as a "poisoned lock", you might have seen &lt;a href="https://doc.rust-lang.org/std/sync/struct.PoisonError.html"&gt;that Error in the std&lt;/a&gt;, and how to deal with that depends on the specific code.&lt;/p&gt;

&lt;p&gt;In this case, we are looking into a so called &lt;em&gt;dead-lock situation&lt;/em&gt;. This can even be cause by a single thread easily: when you hold the lock and your code, running on the same thread, for whatever reason, tries to acquire the same lock &lt;em&gt;while still holding the lock&lt;/em&gt;. This stops the execution as the thread is waiting on the lock it itself is holding and thus preventing from being released.&lt;/p&gt;

&lt;p&gt;This type of scenario can be and in the &lt;code&gt;std&lt;/code&gt;-cases is detected by the rust compiler (yay), but not in the case of &lt;code&gt;DashMap&lt;/code&gt;. As DashMap &lt;em&gt;actively&lt;/em&gt; allows for locks to be held over &lt;code&gt;await&lt;/code&gt;-points (that is kinda its jam ... that it allows the user to do that), it isn't possible for the compiler to figure out that this might lead to a dead lock.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to avoid that problem
&lt;/h2&gt;

&lt;p&gt;The best advice is the one given from [Alice in her post from January 2022] already:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;It is especially important to follow the advice about &lt;strong&gt;never locking it in async code&lt;/strong&gt; when using &lt;code&gt;dashmap&lt;/code&gt; for this reason.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;While this is good advice, this isn't really mentioned on the docs of &lt;code&gt;DashMap&lt;/code&gt; and considering there is nothing detectable wrong with the &lt;a href="https://github.com/xacrimon/dashmap/issues/243#issuecomment-1370273098"&gt;examples&lt;/a&gt; showing &lt;a href="https://github.com/xacrimon/dashmap/issues/243#issuecomment-1368184568"&gt;the problem&lt;/a&gt; when looking at the code &lt;strong&gt;this is quite the foot gun&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;However, the &lt;a href="https://github.com/acterglobal/a3/blob/9615edd751103eff5ef09404cb979e9c2c683424/native/core/src/store.rs#L190-L225"&gt;code in question in our case doesn't even hold any locks over &lt;code&gt;await&lt;/code&gt;-points&lt;/a&gt;, yet it seems to deadlock in some race condition scenarios.&lt;/p&gt;

&lt;p&gt;Then you only find out about it after some long debugging and researching the github issues of that dependency. Taking all that into account, and then that there is no real way for you to create tests or otherwise automatically ensure it isn't reintroduced by any further update the code ... I consider this pretty harmful.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to do about it?
&lt;/h2&gt;

&lt;p&gt;Well, supposedly, this is fixed the in next big iteration of DashMap, &lt;a href="https://github.com/xacrimon/dashmap/issues/150"&gt;which is said to have async support by getting rid of locks entirely&lt;/a&gt;, but with the issue open since 2021 and most of the ideas of how to avoid the locks being discounted for now, there is no telling &lt;em&gt;when&lt;/em&gt; this come - if ever. What I have seen most people do referencing that issue, and &lt;a href="https://github.com/acterglobal/a3/pull/1479"&gt;what we also ended up doing is&lt;/a&gt;: replace or at least remove DashMap from the code base.&lt;/p&gt;

&lt;p&gt;In our case we replaced it with the up and coming &lt;a href="https://crates.io/crates/scc"&gt;scc&lt;/a&gt;, which uses a different locking concept and has the additional benefit of being faster. Others have opted for &lt;code&gt;cachemap2&lt;/code&gt; or replaced it with the std lock &amp;amp; hashmap: there at least the compiler will tell you if you accidentally created a dead-lock-scenario.&lt;/p&gt;

&lt;h3&gt;
  
  
  No disrespect
&lt;/h3&gt;

&lt;p&gt;I am not writing this post to shit on the authors of DashMap, nor its contributors or maintainers. Building a async-safe lock-free-ish collection is a hard task. One that I wouldn't even really want to attempt myself. I still personally don't even understand why this deadlocks internally myself, nor would I consider trying to patch it either - considering that they haven't done it yet makes leads me to believe this isn't an easy thing to do. As such I don't think anyone should be mad about them either, call them names or do any of the other nasty things the internet can do to people that lost its favor.&lt;/p&gt;

&lt;p&gt;I am raising this issue because this is a pretty widely spread library, probably the most popular for the concurrent hashmaps and this is a severe problem that you should know about when using it. That's why I spent a significant amount of this post explaining the core problem and how to avoid it. So, if you use DashMap and want to continue using it, you know what to look out for now.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Edit 2024-04-01&lt;/em&gt;: Edited for clarity, based on &lt;a href="https://lobste.rs/s/xz6daj/beware_dashmap_deadlock"&gt;corresponding feedback&lt;/a&gt; and removed a misleading quote.&lt;/p&gt;

</description>
      <category>rust</category>
      <category>deadlock</category>
      <category>dashmap</category>
      <category>tech</category>
    </item>
    <item>
      <title>Adding a new Ghost via docker-compose to your traefik setup</title>
      <dc:creator>Benjamin Kampmann</dc:creator>
      <pubDate>Fri, 02 Feb 2024 19:31:30 +0000</pubDate>
      <link>https://dev.to/acter/adding-a-new-ghost-via-docker-compose-to-your-traefik-setup-4lc6</link>
      <guid>https://dev.to/acter/adding-a-new-ghost-via-docker-compose-to-your-traefik-setup-4lc6</guid>
      <description>&lt;p&gt;Sometimes the easiest and quickest way to try (or even deploy) a new service is by using the recommended docker-compose-setup that they often have as an example. But if you have an existing infrastructure, like we do with the great &lt;a href="https://github.com/mother-of-all-self-hosting/mash-playbook"&gt;mother of all self-hosting&lt;/a&gt; ansible playbooks, this isn't always easy to integrate. In particular when that infrastructure is managed and started and stopped independently from the additional docker-compose you intend to add. Lucky, who is running their out-most proxy using traefik, because with just a few extra labels your docker-compose becomes available TLS-certs included.&lt;/p&gt;

&lt;p&gt;Fortunately for us the MASH-playbook uses traefik and so adding a Ghost setup for testing was quick and easy. Let's look at the docker-compose (for our fictional &lt;code&gt;blog.example.org&lt;/code&gt;-address) and then we'll explain some of the specific aspects to address:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;3.1'&lt;/span&gt;

&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

  &lt;span class="na"&gt;ghost&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ghost:5-alpine&lt;/span&gt;
    &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;unless-stopped&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="c1"&gt;# see https://ghost.org/docs/config/#configuration-options&lt;/span&gt;
      &lt;span class="na"&gt;database__client&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mysql&lt;/span&gt;
      &lt;span class="na"&gt;database__connection__host&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;db&lt;/span&gt;
      &lt;span class="na"&gt;database__connection__user&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;root&lt;/span&gt;
      &lt;span class="na"&gt;database__connection__password&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;SOME_PRIVATE_ROOT_PASSWORD&lt;/span&gt;
      &lt;span class="na"&gt;database__connection__database&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ghost&lt;/span&gt;
      &lt;span class="c1"&gt;# this url value is just an example, and is likely wrong for your environment!&lt;/span&gt;
      &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://blog.example.org&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./data/ghost:/var/lib/ghost/content&lt;/span&gt;
    &lt;span class="na"&gt;networks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;default&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;traefik&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;aliases&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
         &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;blog-example-org&lt;/span&gt;

    &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;traefik.enable=true&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;traefik.docker.network=traefik&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;traefik.http.routers.blog-example-org.rule=Host(`blog.example.org`)&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;traefik.http.services.blog-example-org.loadbalancer.server.port=2368&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;traefik.http.routers.blog-example-org.entrypoints=web-secure&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;traefik.http.routers.blog-example-org.tls=true&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;traefik.http.routers.blog-example-org.tls.certResolver=default&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;traefik.http.routers.blog-example-org.service=blog-example-org&lt;/span&gt;
  &lt;span class="na"&gt;db&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mysql:8.0&lt;/span&gt;
    &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;unless-stopped&lt;/span&gt;
    &lt;span class="na"&gt;networks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;MYSQL_ROOT_PASSWORD&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;SOME_PRIVATE_ROOT_PASSWORD&lt;/span&gt;
      &lt;span class="na"&gt;MYSQL_DATABASE&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ghost&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./data/db:/var/lib/mysql&lt;/span&gt;


&lt;span class="na"&gt;networks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;traefik&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;external&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;default&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;external&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Alright, there's a few things here that have changes compared to the default example from ghost. We will be ignoring the specific Ghost and MySQL changes as they aren't that relevant but are only included for completeness.&lt;/p&gt;

&lt;h2&gt;
  
  
  The networks
&lt;/h2&gt;

&lt;p&gt;First and foremost, we have the additional &lt;code&gt;networks&lt;/code&gt;-section a the bottom of the configuration with two networks: &lt;code&gt;default&lt;/code&gt; which we will use for this specific service and the other that is bridging to the &lt;code&gt;traefik&lt;/code&gt;-service, which is marked as &lt;code&gt;external: true&lt;/code&gt; telling docker to use the existing set up network. This must the the network the dockerized &lt;code&gt;traefik&lt;/code&gt; is using. In the case of MASH this is just called &lt;code&gt;traefik&lt;/code&gt; as well.&lt;/p&gt;

&lt;p&gt;Secondly we need to the &lt;code&gt;networks&lt;/code&gt;-section to both our services, where any internal service is only on the &lt;code&gt;default&lt;/code&gt; network and the exposed service must also be on the &lt;code&gt;traefik&lt;/code&gt;-network. Here we also give it some specific DNS name within that network for traefik to route the traffic to.&lt;/p&gt;

&lt;h2&gt;
  
  
  the &lt;code&gt;traefik&lt;/code&gt; labels
&lt;/h2&gt;

&lt;p&gt;When &lt;code&gt;traefik&lt;/code&gt; is set up to use docker-labels, which is the case in our MASH setup, we can just label our service with a view fields and the &lt;code&gt;traefik&lt;/code&gt; service will automatically recognize and configure the routing appropriately. Let's go through them one by one:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;traefik.enable=true&lt;/code&gt;: to configure traefik to route this one. Depending on your setup this might not be needed&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;traefik.docker.network=traefik&lt;/code&gt;: the network traefik is on&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;traefik.http.routers.blog-example-org.rule=Host(\&lt;/code&gt;blog.example.org&lt;code&gt;)&lt;/code&gt;: the actual hostname we want this service to be available under between the final ticks. Note that we are creating a custom traefik-&lt;code&gt;router&lt;/code&gt; for this called &lt;code&gt;blog-example-org&lt;/code&gt;, all the following configuration is also using that router prefix:&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;traefik.http.services.blog-example-org.loadbalancer.server.port=2368&lt;/code&gt;: the port on this service the traffic should be routed to&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;traefik.http.routers.blog-example-org.entrypoints=web-secure&lt;/code&gt;: if the traefik has multiple outsid eendpoints, which ones to serve - in the MASH case we want this to be available at &lt;code&gt;https&lt;/code&gt;, which is named &lt;code&gt;web-secure&lt;/code&gt; in our setup.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;traefik.http.routers.blog-example-org.tls=true&lt;/code&gt;: to enable TLS for this router&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;traefik.http.routers.blog-example-org.tls.certResolver=default&lt;/code&gt;: use the default DNS cert resolving functionality. In MASH this means we are using lets-encrypt certification&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;traefik.http.routers.blog-example-org.service=blog-example-org&lt;/code&gt;: the service to route the traffic to. The value is the dns-alias we gave in the network configuration before.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  up and go
&lt;/h2&gt;

&lt;p&gt;And that's about it. Assuming the DNS name already resolves to your server and your traefik is already running just doing a &lt;code&gt;docker compose up -d&lt;/code&gt; and a short time later (if it needs to fetch the certificates for the first time), the service will be routed through and be available at &lt;code&gt;blog.example.org&lt;/code&gt;. Neat!&lt;/p&gt;

</description>
      <category>traefik</category>
      <category>docker</category>
      <category>ghost</category>
      <category>devops</category>
    </item>
    <item>
      <title>Six niche tips for shipping Flutter MacOS builds</title>
      <dc:creator>Benjamin Kampmann</dc:creator>
      <pubDate>Thu, 11 Jan 2024 11:00:00 +0000</pubDate>
      <link>https://dev.to/acter/six-niche-tips-for-shipping-flutter-macos-builds-10cg</link>
      <guid>https://dev.to/acter/six-niche-tips-for-shipping-flutter-macos-builds-10cg</guid>
      <description>&lt;p&gt;Ever since we started shipping &lt;a href="https://next.acter.global"&gt;Acter&lt;/a&gt; to the Apple iOS AppStore, we wanted to have it on the Apple MacOS Store as well. With us building it on Rust and Flutter this should have been quite an easy feat as both have native support for MacOS. Yet actually shipping it was multiple months of try and error—with the last month spent on just a tiny problem caused by the Github Actions Runners. these are six niche tips we wished someone had told us before, that would have saved us months of work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Nightly builds as the baseline
&lt;/h2&gt;

&lt;p&gt;For testing and internal distribution we had nightly builds in Acter for a few months already. They would automatically be created &lt;a href="https://github.com/acterglobal/a3/blob/e540ef02a640b90b5880b126f89d13a59d7fb409/.github/workflows/nightly.yml#L30-L31"&gt;every night at 3am&lt;/a&gt; (hence the name) &lt;a href="https://github.com/acterglobal/a3/blob/e540ef02a640b90b5880b126f89d13a59d7fb409/.github/workflows/nightly.yml#L34-L52"&gt;if changes had been found on the &lt;code&gt;main&lt;/code&gt; branch&lt;/a&gt;. Our build here consists of two parts: the internal Rust library and then we follow with a simple &lt;code&gt;flutter build $target&lt;/code&gt;. So obviously, we have &lt;a href="https://github.com/acterglobal/a3/blob/e540ef02a640b90b5880b126f89d13a59d7fb409/.github/workflows/nightly.yml#L70-L124"&gt;created a nice Github Actions Matrix&lt;/a&gt; to reuse as much as possible. I am not going into too much detail here and the latest action setup probably already changed when you read this, but I have linked the specific sections for record. the&lt;/p&gt;

&lt;h2&gt;
  
  
  1. MacOS is iOS but different—and Google won’t tell you
&lt;/h2&gt;

&lt;p&gt;For the release build of &lt;em&gt;iOS&lt;/em&gt; to work, we needed to sign the app. As a matter of fact, the flutter build won’t really work if you don’t have the necessary signatures set up. For iOS nightly we use an Ad-Hoc setup with a few pre-configured internal devices, for the release we used the distribution profiles, both stored as &lt;a href="https://docs.github.com/en/actions/security-guides/using-secrets-in-github-actions"&gt;environment secrets&lt;/a&gt; as is so commonly used in many tutorials. For MacOS signatures aren’t necessary to build and distribute the App - presumably because MacOS App development pre-dates signed builds. Thus our nightly builds didn’t have any setup for that yet.&lt;/p&gt;

&lt;p&gt;Another important difference to note is on the flutter side. While flutters &lt;code&gt;build ipa&lt;/code&gt; offer the options &lt;code&gt;--export-options-plist=PATH&lt;/code&gt; allowing you to specify certain plist information overrides &lt;em&gt;for that specific build&lt;/em&gt;, no such option exists in &lt;code&gt;flutter build macos&lt;/code&gt;. Meaning that all the configuration setup inside the macos-folder is and must be used as-is. That is a bit annoying as it means we can’t easily make a local release build without the signatures now but that’s what it is.&lt;/p&gt;

&lt;p&gt;One annoying side-effect of Flutter being a lot more popular for building mobile apps is that when you try to Google for information regarding the apple setup needed you’ll almost exclusively find questions and problems for iOS. They then recommend stuff like the &lt;code&gt;export-options&lt;/code&gt;-command or other obscure settings you are supposed to change via xcode but that doesn’t actually do anything in the desktop version or doesn’t even exist. Google really doesn’t help you when you get stuck with your Flutter MacOS build.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Switching from github environment secrets to git-crypt for signatures and profiles
&lt;/h2&gt;

&lt;p&gt;One thing we wished we had done earlier was switching from storing signatures and provisioning profiles in Github secret environment variables to using &lt;code&gt;git-crypt&lt;/code&gt;`. Many tutorials and setups out there recommend using the Github secrets to store, well, secret information like the provisioning profiles and the secrets from the keychain and then have some companion script that puts that into the local Github Action build. That is all good and dandy if you only do that setup once and rarely change it. But I always found it kinda annoying that despite no hint in the Git history a build might fail or pass. Once you go beyond just managing a single profile the scripts are then often falling apart and the increasing number of environment variables becomes very confusing and it is super easy to mess up in converting them into the right base64 because it was soo long ago you did it last.&lt;/p&gt;

&lt;p&gt;Rather than storing profiles and the keystore and similar file-based secrets in the secret environment variables we switched to using &lt;a href="https://www.agwa.name/projects/git-crypt/"&gt;&lt;code&gt;git-crypt&lt;/code&gt;&lt;/a&gt; a git extension you can configure that transparently encrypts a subset of files before committing them to the repo. That makes it super easy and simple to update them and still keep the files available. Rather than extracting each secret from the environment into a file we just install git-crypt and have the main password as the action secret that we then use to decrypt the files:&lt;/p&gt;

&lt;pre class="highlight"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Unlock git-crypt&lt;/span&gt;
  &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;matrix.with_apple_cert&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;brew install git-crypt&lt;/span&gt;
    &lt;span class="s"&gt;echo "$" | base64 --decode &amp;gt; .github/assets/git-crypt-key&lt;/span&gt;
    &lt;span class="s"&gt;git-crypt unlock .github/assets/git-crypt-key&lt;/span&gt;
    &lt;span class="s"&gt;echo "Files found:"&lt;/span&gt;
    &lt;span class="s"&gt;git-crypt status -e&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Once we have the files decrypted, we use the commonly used script to import the keychain from the now decrypted file. Technically we wouldn’t even need the extra password for that file, but it also doesn’t hurt:&lt;/p&gt;

&lt;pre class="highlight"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Install the Apple certificate and provisioning profile&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Install the Apple certificates&lt;/span&gt;
  &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;matrix.with_apple_cert&lt;/span&gt;
  &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;P12_PASSWORD&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.BUILD_CERTS_P12_PASSWORD }}&lt;/span&gt;
    &lt;span class="na"&gt;KEYCHAIN_PASSWORD&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.KEYCHAIN_PASSWORD }}&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;echo "starting in $RUNNER_TEMP"&lt;/span&gt;
    &lt;span class="s"&gt;# create variables&lt;/span&gt;
    &lt;span class="s"&gt;CERTIFICATE_PATH=".github/assets/build_certificates.p12"&lt;/span&gt;
    &lt;span class="s"&gt;KEYCHAIN_PATH="$RUNNER_TEMP/app-signing.keychain-db"&lt;/span&gt;
    &lt;span class="s"&gt;echo "vars set"&lt;/span&gt;
    &lt;span class="s"&gt;# import certificate and provisioning profile from secrets&lt;/span&gt;
    &lt;span class="s"&gt;# create temporary keychain&lt;/span&gt;
    &lt;span class="s"&gt;echo "creating keychain"&lt;/span&gt;
    &lt;span class="s"&gt;security create-keychain -p "$KEYCHAIN_PASSWORD" "$KEYCHAIN_PATH"&lt;/span&gt;
    &lt;span class="s"&gt;echo "setting keychain"&lt;/span&gt;
    &lt;span class="s"&gt;security set-keychain-settings -lut 21600 "$KEYCHAIN_PATH"&lt;/span&gt;
    &lt;span class="s"&gt;echo "unlocking keychain"&lt;/span&gt;
    &lt;span class="s"&gt;security unlock-keychain -p "$KEYCHAIN_PASSWORD" "$KEYCHAIN_PATH"&lt;/span&gt;
    &lt;span class="s"&gt;# import certificate to keychain&lt;/span&gt;
    &lt;span class="s"&gt;echo "importing certificate"&lt;/span&gt;
    &lt;span class="s"&gt;security import "$CERTIFICATE_PATH" -P "$P12_PASSWORD" -A -t cert -f pkcs12 -k "$KEYCHAIN_PATH"&lt;/span&gt;
    &lt;span class="s"&gt;echo "listing keychains"&lt;/span&gt;
    &lt;span class="s"&gt;security list-keychain -d user -s "$KEYCHAIN_PATH"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And finally we just take all the now decrypted provisioning_profiles files and copy them where they need to be. All files for all builds in the git repo. Sweet.&lt;/p&gt;

&lt;pre class="highlight"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Install the Apple certificate and provisioning profile&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Install the Apple certificates&lt;/span&gt;
  &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;matrix.with_apple_cert&lt;/span&gt;
  &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;P12_PASSWORD&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.BUILD_CERTS_P12_PASSWORD }}&lt;/span&gt;
    &lt;span class="na"&gt;KEYCHAIN_PASSWORD&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.KEYCHAIN_PASSWORD }}&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;echo "starting in $RUNNER_TEMP"&lt;/span&gt;
    &lt;span class="s"&gt;# create variables&lt;/span&gt;
    &lt;span class="s"&gt;CERTIFICATE_PATH=".github/assets/build_certificates.p12"&lt;/span&gt;
    &lt;span class="s"&gt;KEYCHAIN_PATH="$RUNNER_TEMP/app-signing.keychain-db"&lt;/span&gt;
    &lt;span class="s"&gt;echo "vars set"&lt;/span&gt;
    &lt;span class="s"&gt;# import certificate and provisioning profile from secrets&lt;/span&gt;
    &lt;span class="s"&gt;# create temporary keychain&lt;/span&gt;
    &lt;span class="s"&gt;echo "creating keychain"&lt;/span&gt;
    &lt;span class="s"&gt;security create-keychain -p "$KEYCHAIN_PASSWORD" "$KEYCHAIN_PATH"&lt;/span&gt;
    &lt;span class="s"&gt;echo "setting keychain"&lt;/span&gt;
    &lt;span class="s"&gt;security set-keychain-settings -lut 21600 "$KEYCHAIN_PATH"&lt;/span&gt;
    &lt;span class="s"&gt;echo "unlocking keychain"&lt;/span&gt;
    &lt;span class="s"&gt;security unlock-keychain -p "$KEYCHAIN_PASSWORD" "$KEYCHAIN_PATH"&lt;/span&gt;
    &lt;span class="s"&gt;# import certificate to keychain&lt;/span&gt;
    &lt;span class="s"&gt;echo "importing certificate"&lt;/span&gt;
    &lt;span class="s"&gt;security import "$CERTIFICATE_PATH" -P "$P12_PASSWORD" -A -t cert -f pkcs12 -k "$KEYCHAIN_PATH"&lt;/span&gt;
    &lt;span class="s"&gt;echo "listing keychains"&lt;/span&gt;
    &lt;span class="s"&gt;security list-keychain -d user -s "$KEYCHAIN_PATH"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Finally don’t forget to clean all that up, regardless of whether the build failed or succeeded!&lt;/p&gt;

&lt;pre class="highlight"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Clean up keychain and provisioning profile&lt;/span&gt;
  &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ always() }}&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;security delete-keychain $RUNNER_TEMP/app-signing.keychain-db&lt;/span&gt;
    &lt;span class="s"&gt;rm ~/Library/MobileDevice/Provisioning\ Profiles/*&lt;/span&gt;
    &lt;span class="s"&gt;rm .github/assets/git-crypt-key&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This makes it super easy to update all that data. Got a new provisioning profile? Just put it into that folder. Update to the keystore? Just export the p12-file with the same password again. No base64 conversion, no copying into the Github Secrets - just &lt;code&gt;git commit &amp;amp;&amp;amp; push&lt;/code&gt;. &lt;em&gt;*chefskiss&lt;/em&gt;*.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Github Search is the hidden champion
&lt;/h2&gt;

&lt;p&gt;One of Githubs most underrated features is its search. It being the biggest crowd source code knowledge base in the world, including the largest source for all their own configuration files (which the workflow-yamls are one of) their search can be truly amazing. Not everyone who found the hack to make something happen will blog about it or write a stack overflow—this post almost didn’t make it either. But it if works there is a high chance they commit it and it ends up in the Github repo, discoverable via the search.&lt;/p&gt;

&lt;p&gt;Similar as Google, Github’s search has &lt;a href="https://github.com/search/advanced"&gt;many advanced options&lt;/a&gt;. For us looking for alternative ways of doing the Flutter build within the Actions, adding the &lt;a href="https://github.com/search?q=path%3A.github%2Fworkflow+flutter+macos&amp;amp;type=code"&gt;&lt;code&gt;path:.github/workflow flutter macos&lt;/code&gt;&lt;/a&gt; was the key to unlocking a treasure of knowledge. Mind you that even though code is committed doesn’t necessarily mean it runs, though. But it is how we first found out about the git-crypt idea! And that’s also how we found out about the final upload pattern we ended up using.&lt;/p&gt;

&lt;p&gt;Seriously, if you are ever stuck on some Github Action configuration that others probably already attempted try the Github search. Google doesn’t even know about a fraction of it and with the advanced search you can make the queries very specific to your problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. MacOS apps need all their binaries signed
&lt;/h2&gt;

&lt;p&gt;One particularly nasty difference between the iOS and MacOS flutter build is that the latter doesn’t really manage the signing properly for you. Singing the Mac App is different than the iOS one, too: While on iOS you create an &lt;code&gt;ipa&lt;/code&gt;-file (effectively a Zip-File) which is then signed as a whole (oversimplified), the “Mac App” is actually a directory with the extension &lt;code&gt;.app&lt;/code&gt;. You can’t effectively “sign” directories. Instead the people at Apple decided that what you must do is sign each binary within that app directory and provide these signatures in the directory. This is hidden in the docs somewhere but if you tried to Google for this information, you will only find iOS fixes (see No 1). So I am telling you know.&lt;/p&gt;

&lt;p&gt;For most cases that is fairly irrelevant but as we had a bunch of binaries, our own included. We found &lt;a href="https://github.com/acterglobal/a3/commit/6263d5990921ea104daba0abacab0c48dda9e135"&gt;a script that iterates through the final app and signs each binary with the provided credentials&lt;/a&gt;, which we then added to the regular Xcode shell-script build process for release builds. That means that at the end of &lt;code&gt;flutter build macos&lt;/code&gt;, we now have an &lt;code&gt;Acter.app&lt;/code&gt; directory with all the proper signatures included. Yay.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Build code ages quickly
&lt;/h2&gt;

&lt;p&gt;One problem you’ll be facing with the Github Search as well as the Google search answers regardless is that the infrastructure you are building with and against constantly changes. For us, there were several tutorials out there recommending ways of packaging or uploading the app that were outdated to simply not supported anymore for the latest version (this was even worse for building the Windows App). Trying to figure out which is the latest recommended and thus hopefully the longest-lasting code you could write is a tedious and annoying process. Very often you don’t know this isn’t supported anymore until you installed and tried the command. But there is a few tricks to keep in mind, when you find a novel approach you might want to try out: you can check the official docs and see if it is still supported, if it is on Github you can see when it was last run, for StackOverflow and many blogs you can quickly gather whether this is a new or rather old idea. Unfortunately in this space, old often means less likely too still work…&lt;/p&gt;

&lt;p&gt;For us the latest—at the time of writing—and recommended way to package and submit the Flutter MacOS app to the Apple Mac Store is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;build the release version of the app with &lt;code&gt;flutter build macos&lt;/code&gt;; make sure all binaries are signed (see above)&lt;/li&gt;
&lt;li&gt;use &lt;code&gt;productbuild&lt;/code&gt; to create a modern &lt;code&gt;.pkg&lt;/code&gt; and have it signed: &lt;code&gt;productbuild --component Acter.app /Applications --sign "$APPLE_SIGN_CERTNAME" Acter.pkg&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;then use &lt;code&gt;altool&lt;/code&gt; to upload the &lt;code&gt;.pkg&lt;/code&gt; to the Apple Mac AppStore using a private_key credential (which we stored with git-crypt, of course): &lt;/li&gt;
&lt;/ol&gt;

&lt;pre class="highlight"&gt;&lt;code&gt;      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Upload to App Store&lt;/span&gt;
     &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
       &lt;span class="na"&gt;APPLE_API_KEY_BASE64&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.APPLE_API_KEY_BASE64 }}&lt;/span&gt;
       &lt;span class="na"&gt;APPLE_API_KEY_ID&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.APPLE_API_KEY_ID }}&lt;/span&gt;
       &lt;span class="na"&gt;APPLE_ISSUER_ID&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.APPLE_ISSUER_ID }}&lt;/span&gt;
     &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
       &lt;span class="s"&gt;mkdir private_keys&lt;/span&gt;
       &lt;span class="s"&gt;echo -n "$APPLE_API_KEY_BASE64" | base64 --decode --output "private_keys/AuthKey_$APPLE_API_KEY_ID.p8"&lt;/span&gt;
       &lt;span class="s"&gt;ls -ltas private_keys&lt;/span&gt;
       &lt;span class="s"&gt;xcrun altool --upload-app --type macos --file acter-macosx-${{ needs.tags.outputs.tag }}.pkg \&lt;/span&gt;
           &lt;span class="s"&gt;--bundle-id global.acter.a3 \&lt;/span&gt;
           &lt;span class="s"&gt;--apiKey "$APPLE_API_KEY_ID" \&lt;/span&gt;
           &lt;span class="s"&gt;--apiIssuer "$APPLE_ISSUER_ID"&lt;/span&gt;
     &lt;span class="na"&gt;shell&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;bash&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;    

&lt;h2&gt;
  
  
  6. Github artifacts are not a proper package mechanism: measure twice
&lt;/h2&gt;

&lt;p&gt;With that we are all set and everything should work. Yet Apple kept rejecting our app. But only after the upload in the post-processing on the server side, a few hours later we’d receive an email saying something along the lines of:&lt;/p&gt;

&lt;pre class="highlight"&gt;&lt;code&gt;      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Upload to App Store&lt;/span&gt;
     &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
       &lt;span class="na"&gt;APPLE_API_KEY_BASE64&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.APPLE_API_KEY_BASE64 }}&lt;/span&gt;
       &lt;span class="na"&gt;APPLE_API_KEY_ID&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.APPLE_API_KEY_ID }}&lt;/span&gt;
       &lt;span class="na"&gt;APPLE_ISSUER_ID&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.APPLE_ISSUER_ID }}&lt;/span&gt;
     &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
       &lt;span class="s"&gt;mkdir private_keys&lt;/span&gt;
       &lt;span class="s"&gt;echo -n "$APPLE_API_KEY_BASE64" | base64 --decode --output "private_keys/AuthKey_$APPLE_API_KEY_ID.p8"&lt;/span&gt;
       &lt;span class="s"&gt;ls -ltas private_keys&lt;/span&gt;
       &lt;span class="s"&gt;xcrun altool --upload-app --type macos --file acter-macosx-${{ needs.tags.outputs.tag }}.pkg \&lt;/span&gt;
           &lt;span class="s"&gt;--bundle-id global.acter.a3 \&lt;/span&gt;
           &lt;span class="s"&gt;--apiKey "$APPLE_API_KEY_ID" \&lt;/span&gt;
           &lt;span class="s"&gt;--apiIssuer "$APPLE_ISSUER_ID"&lt;/span&gt;
     &lt;span class="na"&gt;shell&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;bash&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;    

&lt;p&gt;The attentive reader might have already noticed the actual problem already. As we had gotten several similar looking emails (especially the top part) from before when the signatures were not properly set up for each binary, we assumed it was something wrong with that part again. Weirdly enough, when doing the entire process manually (rather than via Github Action) it all worked fine and Apple didn’t reject our submission. Weird. So we downloaded the latest &lt;code&gt;Acter.app&lt;/code&gt; from the build artifacts to try to see if we could sign and submit that. The download was larger than usual (220mb rather than the usual 140mb we saw for most builds before) but we didn’t really think much about it. Indeed, trying to package and upload this version Apple rejected it again.&lt;/p&gt;

&lt;p&gt;So, we look into the insides of the &lt;code&gt;Acter.app&lt;/code&gt; build by the Github Action: it is just a folder after all (even though MacOS finder hides it under the &lt;code&gt;right click -&amp;gt; Open Contents&lt;/code&gt;). Right away we noticed something odd: all binaries for the frameworks appeared to be in there &lt;em&gt;twice&lt;/em&gt;: once under &lt;code&gt;$framework/Versions/Current&lt;/code&gt; and once as &lt;code&gt;$framework/Versions/A&lt;/code&gt;. That sure explained why it would be about twice the size. Interestingly our nightly builds didn’t show this behavior: there &lt;code&gt;Current&lt;/code&gt; was a symlink to &lt;code&gt;A&lt;/code&gt; for each as—we’d expect it to be. So although the nightly build system was the baseline we started with, we must have altered something along the way.&lt;/p&gt;

&lt;p&gt;Then it hit us: &lt;strong&gt;the main difference is that in the nightly job packages the &lt;code&gt;.app&lt;/code&gt;-Folder as a &lt;code&gt;tar.bz&lt;/code&gt; directly and submits it to the Github release from the build job, while in the publishing action we store the folder as a Github Artifact that a second job after downloads and submits to the store&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Why does that matter? The Github Artefact is stored as a ZIP, too (to save disk space), after all. Yes, but for zipping by default is that symbolic links &lt;em&gt;are resolved&lt;/em&gt;. As we are zipping the entire folder the symlinks that is usually &lt;code&gt;Versions/Current -&amp;gt; Versions/A&lt;/code&gt; is &lt;em&gt;resolved&lt;/em&gt;, meaning the files are stored twice. Yet the signature is only stored once and only for &lt;code&gt;Versions/A&lt;/code&gt; (not for &lt;code&gt;Versions/Current&lt;/code&gt;). So when we download that zipped version, we have an &lt;code&gt;.app&lt;/code&gt;-folder with each framework version stored twice yet only a signature for one (and the file having about twice the size). Looking at the error messages sent by Apple, the last batch of errors even gives a hint to that problem.&lt;/p&gt;

&lt;p&gt;Finding that issue, one and off, took us a month. Yet the fix was small and trivial: we moved the &lt;code&gt;productbuild&lt;/code&gt; to create a &lt;code&gt;.pkg&lt;/code&gt;-file from the publishing job into the build-job and store that &lt;code&gt;.pkg&lt;/code&gt;-&lt;em&gt;file&lt;/em&gt; as the artefact. Problem solved.&lt;/p&gt;




&lt;p&gt;These are just a few things we wished we had known before. Do you have any additional tips for that—apparently niche—Flutter MacOS build systems you wished someone had told you before? Let us know below!&lt;/p&gt;

</description>
      <category>flutter</category>
      <category>acter</category>
      <category>deploy</category>
      <category>githubactions</category>
    </item>
    <item>
      <title>Hunting down a non-determinism-bug in our Rust Wasm build</title>
      <dc:creator>Benjamin Kampmann</dc:creator>
      <pubDate>Fri, 10 Jul 2020 16:12:38 +0000</pubDate>
      <link>https://dev.to/gnunicorn/hunting-down-a-non-determinism-bug-in-our-rust-wasm-build-4fk1</link>
      <guid>https://dev.to/gnunicorn/hunting-down-a-non-determinism-bug-in-our-rust-wasm-build-4fk1</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Note: Together with a few colleagues, I will be hosting &lt;a href="https://www.reddit.com/r/rust/comments/hr7kdw/parity_technologies_ama_we_are_developers_of_some/"&gt;an AMA on the Rust subreddit Wednesday, July 15th (today!)&lt;/a&gt;. Come join us, if you have questions on this or any other part of our code base or how we handle things at &lt;a href="https://www.parity.io/"&gt;Parity Technologies&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;We recently learned that the WebAssembly build in our system isn't deterministic any longer. This is a short summary of what we did to chase down the bug in the hope that this helps others facing similar issues, give some help and guidance on what to try or how this kind of thing works.&lt;/p&gt;

&lt;h2&gt;
  
  
  A bit of background
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Substrate / our stack
&lt;/h3&gt;

&lt;p&gt;At &lt;a href="https://parity.io"&gt;Parity&lt;/a&gt; we are building software to run the next generation of the Web: a trust-less secure Web 3.0. Most notably we are building the &lt;a href="https://polkadot.network/"&gt;Polkadot Network&lt;/a&gt; based on &lt;a href="https://substrate.dev/"&gt;Substrate&lt;/a&gt;. Substrate is natively compatible with Polkadot, thus making it simple to secure your blockchain and communicate with Polkadot’s network. We have built everything using Rust which means no legacy codebase and it's open source. &lt;/p&gt;

&lt;h4&gt;
  
  
  Wasm Runtimes
&lt;/h4&gt;

&lt;p&gt;A key architecture feature of Substrate is that the chain-specific state transition function (STF) is separated from the rest of the node. The binary WebAssembly (wasm) blob is stored on chain (at the key&lt;code&gt;:code&lt;/code&gt;). The separation allows for network-wide upgrades of the state transition function independently from the rest of the client. These events happen through on-chain governance mechanisms. The client utilizes a wasm runtime, executing updated wasm blobs as the community decides to implement an upgrade. These mechanisms allow Substrate-based blockchains to evolve without the complicated aspects of Hard Forks. We call this &lt;code&gt;forkless-upgrades&lt;/code&gt;, as participants can stay in sync with the network without relying on upgrading their client.&lt;/p&gt;

&lt;p&gt;The way we achieve this is through building our the Rust-runtime-code in a separate crate with the &lt;code&gt;wasm32-unknown-unknown&lt;/code&gt; target into a Wasm blob and store it on chain.&lt;/p&gt;

&lt;h4&gt;
  
  
  Deterministic builds
&lt;/h4&gt;

&lt;p&gt;Wasm is overall a great invention providing a thin multi-plattform abstraction over the machine language target. Thus the wasm output from a compiler can easily be translated into the actual machine code needed for execution. However, WebAssemblz is not human readable, which makes it hard to audit for the people who should vote upon the chain upgrade. Upgrading a blockchain state transition function can't be taken back, so voting on an opaque block is not particularly instilling confidence in its users.&lt;/p&gt;

&lt;p&gt;The way we approach this problem is through "deterministic builds". The rust compiler itself does not, as of now, guarantee that builds are deterministic – meaning that compiling the same code twice will yield the same resulting binary. However, Parity has  managed to solve this problem for our code. So &lt;em&gt;at least&lt;/em&gt; within the same environment (OS, Compiler, libs installed) it is reasonably deterministic for a set of targets to be consistent and &lt;code&gt;wasm32-unknown-unknown&lt;/code&gt; is among them. This allows us to produce a docker-image (and its build description) confirming that the Rust source files indeed result in the binary blob proposed as the next chain runtime. Using the Docker image, auditors can look at the source code directly and do not have to wade through the wasm32 bytecode.&lt;/p&gt;




&lt;p&gt;Whilein the past they did, we recently discovered that two consecutive builds of the runtime code did not yield the same wasm blobs anymore. This was the starting point when we realized that our build process had been broken.&lt;/p&gt;

&lt;p&gt;Unfortunately we didn't have yet tooling in place to alert us about the problem, so we didn't know what introduced the bug. However we know when it last worked for sure: 2020-03-21T16:55:27Z . Here is what I did with that information to track down the problem:&lt;/p&gt;

&lt;h2&gt;
  
  
  0. The test
&lt;/h2&gt;

&lt;p&gt;The first step towards the fix was devising a test that reproducibly displays the problem. In our case this was rather simple: build a specific package in the project (&lt;code&gt;node-runtime&lt;/code&gt;) twice and compare the resulting wasm. We used the SHA256 hash of the compiler output to do that. &lt;br&gt;
The specific step then are:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cargo build &lt;span class="nt"&gt;--release&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; node-runtime   &lt;span class="c"&gt;# build the first time&lt;/span&gt;
sha2sum target/release/wbuild/target/wasm32-unknown-unknown/release/&lt;span class="k"&gt;*&lt;/span&gt;.wasm &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; checksum.sha256      &lt;span class="c"&gt;# store the hash&lt;/span&gt;
cargo clean     &lt;span class="c"&gt;# clean up the artifacts&lt;/span&gt;
cargo build &lt;span class="nt"&gt;--release&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; node-runtime   &lt;span class="c"&gt;# build again&lt;/span&gt;
sha2sum &lt;span class="nt"&gt;-c&lt;/span&gt; checksum.sha256    &lt;span class="c"&gt;# are the build identical?&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  1. Nightly builds
&lt;/h2&gt;

&lt;p&gt;Within our Wasm build we are using a few features that are not yet a fully supported by the Rust Compiler Team (aka Tier 1), so we need the nightly version to use it. As nightly is moving fast, features are added left and right and often times have unexpected side effects affecting determinism. It is not unlikely that upgrading nightly broke our build.&lt;/p&gt;

&lt;p&gt;Luckily, all old rust nightly builds are available. With &lt;code&gt;rustup&lt;/code&gt; and &lt;code&gt;cargo&lt;/code&gt;, the amazing tooling Rust provides, it is easy to use and compile with any compiler version. So we first installed the old version via:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;rustup install nightly-2020-03-19 # install nightly
rustup toolchain add nightly-2020-03-19 wasm32-unknown-unknown # the wasm target toolchain for that compiler version
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will install the last nightly version and the target toolchain we know to produce deterministic builds (Note: despite the name, it isn't released &lt;em&gt;every&lt;/em&gt; night, often the compiler or some of the components fail to build and the version is skipped).&lt;/p&gt;

&lt;p&gt;Now when building the crate we just add &lt;code&gt;+nightly-2020-03-19&lt;/code&gt; to the cargo command to tell it which version to use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cargo +nightly-2020-03-19 build &lt;span class="nt"&gt;--release&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; node-runtime   &lt;span class="c"&gt;# build the first time&lt;/span&gt;
sha2sum target/release/wbuild/target/wasm32-unknown-unknown/release/&lt;span class="k"&gt;*&lt;/span&gt;.wasm &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; checksum.sha256      &lt;span class="c"&gt;# store the checksum&lt;/span&gt;
cargo clean     &lt;span class="c"&gt;# clean up the build&lt;/span&gt;
cargo +nightly-2020-03-19 build &lt;span class="nt"&gt;--release&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; node-runtime   &lt;span class="c"&gt;# build again&lt;/span&gt;
sha2sum &lt;span class="nt"&gt;-c&lt;/span&gt; checksum.sha256    &lt;span class="c"&gt;# are the build identical?&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Unfortunately, that wasn't it: The nightly from back then doesn't produce the same wasm on the current version of our code either. But what about on the older version?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Nightly builds on old version&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Checking out the old version of the code and building it with the old compiler was indeed deterministic. So what changed? The compiler.&lt;/p&gt;

&lt;p&gt;Building the old code with the latest compiler also yields a deterministic build. &lt;/p&gt;

&lt;p&gt;So we know it isn't a change in the compiler that was causing the determinism to break. This doesn't mean it isn't a compiler bug, but only that whatever it was, it wasn't introduced on their side but by changes from us. &lt;/p&gt;

&lt;h2&gt;
  
  
  2. &lt;code&gt;git bisect&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;One of the great things about using popular (among developers) tools is that others add tooling on top that makes your lives easier. A lesser known but super useful of these tools that made it into mainline &lt;code&gt;git&lt;/code&gt; already a few years ago, is &lt;code&gt;git bisect&lt;/code&gt;. It helps you identify which changeset introduced a bug by applying a binary search on the commit history.&lt;/p&gt;

&lt;p&gt;Essentially you tell &lt;code&gt;git bisect&lt;/code&gt; which checkout you know to be &lt;code&gt;git bisect good&lt;/code&gt; and &lt;code&gt;git bisect bad&lt;/code&gt;, it then checks out the changeset in the middle of the two. You then perform your test and indicate the status via the same &lt;code&gt;git bisect good&lt;/code&gt; or &lt;code&gt;git bisect bad&lt;/code&gt; command. It then jumps in the middle between the latest known good and bad one and checks that out. And so you go until &lt;code&gt;git bisect&lt;/code&gt; tells you &lt;code&gt;this is the first change with the bug&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;git bisect run&lt;/code&gt; even allows you to specify a command it should run that either succeeds or fails and lets &lt;code&gt;git&lt;/code&gt; perform all the steps automatically. (Unfortunately I didn't have that script ready at the time and was actually building through a more complicated rsync-remote-build system – but that it can run that by itself is pretty awesome.).&lt;/p&gt;

&lt;p&gt;Git bisect identified &lt;a href="https://github.com/paritytech/polkadot/commit/b361171329213ce41c75ef55d93fb952a4f6c034"&gt;a pretty large PR&lt;/a&gt; to be the culprit . "Pretty large" not only because it adds a significant amount of code itself, it is also the first that activated a few features in the default runtime we are testing against, namely &lt;code&gt;session_historical&lt;/code&gt;, which we already suspected to be related to this issue.&lt;/p&gt;

&lt;p&gt;Unfortunately, this also didn't yield that &lt;em&gt;one line&lt;/em&gt; that was the cause on our side. It could still be any number of aspects that cause it. One way forward could be to activate one feature after another to move closer and closer to the actual source. But that isn't as easy as it sounds, so we opted for wasm-introspection first.&lt;/p&gt;

&lt;p&gt;Git bisect is a powerful tool and it often is crucial to help narrow down the scope of the search for a bug. Like this episode illustrates, it seldom pinpoints a specific line of code by itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Introspecting the Wasm
&lt;/h2&gt;

&lt;p&gt;The wasm output from the rust compiler is a binary blob, but as any modern standard, it has a few introspection features built in. Most importantly, there is a 1-to-1 text-representation – called WAT – that it can the converted back and forth to without problem. And the default wasm toolchain already includes the handy &lt;code&gt;wasm2wat&lt;/code&gt; tool.&lt;/p&gt;

&lt;p&gt;Running &lt;code&gt;wasm2wat&lt;/code&gt; on the wasm output of the first build and then again on the second build give us a human-parsable representation of the binary. Then we can diff the two wat version to identify what changed between them (omitted some lines marked as &lt;code&gt;...&lt;/code&gt; for legibility):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt;&lt;span class="gd"&gt;--- node_runtime_1_1.wat    2020-07-06 09:43:09.122043953 +0200
&lt;/span&gt;&lt;span class="gi"&gt;+++ node_runtime_1_2.wat    2020-07-06 09:43:14.178717225 +0200
&lt;/span&gt;&lt;span class="p"&gt;@@ -791052,10 +791052,10 @@&lt;/span&gt;
                                             i32.add
                                             call $_ZN86_$LT$sp_trie..node_header..NodeHeader$u20$as$u20$parity_scale_codec..codec..Encode$GT$9encode_to17h7b42619bd7dc163dE
                                             local.get 15
&lt;span class="gd"&gt;-                                            br_if 11 (;@9;)
&lt;/span&gt;&lt;span class="gi"&gt;+                                            br_if 12 (;@8;)
&lt;/span&gt;                                             i32.const 0
                                             local.set 1
&lt;span class="gd"&gt;-                                            br 14 (;@6;)
&lt;/span&gt;&lt;span class="gi"&gt;+                                            br 13 (;@7;)
&lt;/span&gt;                                           end
                                           local.get 0
                                           i32.const 1
&lt;span class="p"&gt;@@ -791092,10 +791092,10 @@&lt;/span&gt;
                                           i32.add
                                           call $_ZN86_$LT$sp_trie..node_header..NodeHeader$u20$as$u20$parity_scale_codec..codec..Encode$GT$9encode_to17h7b42619bd7dc163dE
                                           local.get 15
&lt;span class="gd"&gt;-                                          br_if 11 (;@8;)
&lt;/span&gt;&lt;span class="gi"&gt;+                                          br_if 10 (;@9;)
&lt;/span&gt;                                           i32.const 0
                                           local.set 1
&lt;span class="gd"&gt;-                                          br 12 (;@7;)
&lt;/span&gt;&lt;span class="gi"&gt;+                                          br 13 (;@6;)
&lt;/span&gt;                                         end
                                         i32.const 1
                                         local.set 17
&lt;span class="p"&gt;@@ -966366,6 +966366,6 @@&lt;/span&gt;
   (export "__data_end" (global 1))
   (export "__heap_base" (global 2))
   ...
&lt;span class="gd"&gt;-  (data (;0;) (i32.const 1048576) "\04\80\e9\1b\80\14\10\00\80\14\10\00")
&lt;/span&gt;&lt;span class="gi"&gt;+  (data (;0;) (i32.const 1048576) "R\8a\11v\80\14\10\00\80\14\10\00")
&lt;/span&gt;   (data (;1;) (i32.const 1048592) " \00\10\00\17\00\00\00\ee\02\00\00\05\00\00\00src/liballoc/raw_vec.rs\00\c7\00\10\00F\00\00\00b\01\00\00\13\00\00\00J\00\00\00\04\00\00\00\04\00\00\00K\00\00\00L\00\00\00M\00\00\00a formatting trait implementation returned an error\00J\00\00\00\00\00\00\00\01\00\00\00N\00\00\00\b4\00\10\00\13\00\00\00J\02\00\00\05\00\00\00src/liballoc/fmt.rs/rustc/15812785344d913d779d9738fe3cca8de56f71d5/src/libcore/fmt/mod.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The diff is rather short. With just a bit of scrolling through &lt;a href="https://developer.mozilla.org/en-US/docs/WebAssembly/Understanding_the_text_format"&gt;the great Mozilla Wasm Explainer&lt;/a&gt; and the &lt;a href="https://webassembly.github.io/spec/core/text/index.html"&gt;extensive and great wasm documentation by the original working group&lt;/a&gt;, we quickly learn that the first four changes are just labels that might just be having a different numbering because of compiler internals caused by the last change.&lt;/p&gt;

&lt;p&gt;That last one is odd, though: It is a global memory address (at location &lt;code&gt;1048576&lt;/code&gt;), that is filled with differently prefixed values. What makes this particuarly odd is that if we searched for the address-number in the original &lt;code&gt;wat&lt;/code&gt; file, we find it is marked as &lt;code&gt;mutable&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  (table (;0;) 343 343 funcref)
  (global (;0;) (mut i32) (i32.const 1048576))
  (global (;1;) i32 (i32.const 1284452))
  (global (;2;) i32 (i32.const 1284452))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is pretty weird, considering the context in which we are running this blob. Remember that the blob is stored on-chain, transparent to everyone. For the execution of each block, the wasm memory is reset and a new instance is created. In general, having mutable globals is pointless for the wasm we are producing, because it would only be live for the duration of the execution of a single block. If you want to mutate state from the runtime to be persisted between blocks, you'd have to call into the external database-storage functions. Because of this we generally don't have any mutable &lt;code&gt;lazy_statics&lt;/code&gt; or alike in our code.&lt;/p&gt;

&lt;p&gt;That doesn't mean that having globals is pointless for our wasm. It can be used for compiler optimisation. For example, every time we create a new &lt;code&gt;Vec&lt;/code&gt;, it is reading the default values from a global address. This is quicker and needs less storage than pasting the values everywhere.&lt;/p&gt;

&lt;p&gt;But, what is causing this memory address to be allocated here? Why would it be mutuable and why with a different value for every build?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Digging deeper&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
Well, we had to track down where it is being used. The compiler wouldn't mark it as mutable if it wasn't changed at least once. When searching we find it is being read from 8 times, but then once, we also see it being written to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
(func $_ZN14pallet_session10historical20ProvingTrie$LT$T$GT$12generate_for17hb9e80633994986a6E (type 2) (param i32 i32)
    (local i32 i32 i64 i64 i32 i32 i32 i32 i32 i32 i64 i64 i32 i32 i32 i32 i32 i32 i32 i32 i32 i32 i32 i32 i32 i32 i32 i32 i32 i32 i32 i32 i32 i32)
    global.get 0
    i32.const 928
    i32.sub
    local.tee 2
    global.set 0
    block  ;; label = @1
      block  ;; label = @2
        block  ;; label = @3
          block  ;; label = @4
            block  ;; label = @5
              block  ;; label = @6
                block  ;; label = @7
                  block  ;; label = @8
                    block  ;; label = @9
                      i32.const 1
                      call $__rust_alloc
                      local.tee 3
                      i32.eqz
                      br_if 0 (;@9;)
                      local.get 3
                      i32.const 0
                      i32.store8
                      i32.const 0
                      i32.const 0
                      i64.load32_u offset=1048576
                      i64.const 6364136223846793005
                      i64.mul
                      local.get 2
                      i32.const 640
                      i32.add
                      i64.extend_i32_u
                      i64.add
                      i64.const 31
                      i64.rotl
                      local.tee 4
                      i64.store32 offset=1048576
                      local.get 2
                      i32.const 8
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Even without knowing too much WebAssembly ourselves, we have a few interesting hints in here: &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;the function name is &lt;code&gt;_ZN14pallet_session10historical20ProvingTrie$LT$T$GT$12generate_for17hb9e80633994986a6E&lt;/code&gt;, &lt;a href="https://github.com/paritytech/substrate/blob/802a0d0b0ade796a3b2d4663212518315923fe8a/frame/session/src/historical/mod.rs#L171-L209"&gt;which translates to &lt;code&gt;ProvingTrie::generate_for&lt;/code&gt; in the &lt;code&gt;session::historical&lt;/code&gt; module of our code base&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;the indentation and many labels indicate that this is within some loop of loops.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Looking at the code we can identify three objects that are mutable – but only at that time, none of them is global. Or at least, as far as we can see, because in Wasm it clearly is.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Finding the source&lt;/strong&gt;&lt;br&gt;
Now that we have a precise point of focus, we can begin troubleshooting. Unfortunately, &lt;code&gt;git&lt;/code&gt; won't help us here. We could trace back changes to that code base, but as it was only activated in the specific change set we've already identified, there is little that will help identify the source of the error.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;impl&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Trait&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ProvingTrie&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="n"&gt;generate_for&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;I&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;validators&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;I&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Result&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;Self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;'static&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;I&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;IntoIterator&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Item&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nn"&gt;T&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;ValidatorId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nn"&gt;T&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;FullIdentification&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;MemoryDB&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;default&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="n"&gt;root&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;Default&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;default&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="n"&gt;trie&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;TrieDBMut&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="n"&gt;root&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;validator&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;full_id&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;validators&lt;/span&gt;&lt;span class="nf"&gt;.into_iter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.enumerate&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;u32&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;keys&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;match&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;SessionModule&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;load_keys&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;validator&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="nb"&gt;None&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="nf"&gt;Some&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;};&lt;/span&gt;

                &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;full_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;validator&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;full_id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

                &lt;span class="c1"&gt;// map each key to the owner index.&lt;/span&gt;
                &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;key_id&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="nn"&gt;T&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;Keys&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;key_ids&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="nf"&gt;.get_raw&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;key_id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
                    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="nf"&gt;.using_encoded&lt;/span&gt;&lt;span class="p"&gt;(|&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt;
                        &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="nf"&gt;.using_encoded&lt;/span&gt;&lt;span class="p"&gt;(|&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="n"&gt;trie&lt;/span&gt;&lt;span class="nf"&gt;.insert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
                    &lt;span class="p"&gt;);&lt;/span&gt;

                    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="nf"&gt;.map_err&lt;/span&gt;&lt;span class="p"&gt;(|&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="s"&gt;"failed to insert into trie"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;

                &lt;span class="c1"&gt;// map each owner index to the full identification.&lt;/span&gt;
                &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="nf"&gt;.using_encoded&lt;/span&gt;&lt;span class="p"&gt;(|&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="n"&gt;full_id&lt;/span&gt;&lt;span class="nf"&gt;.using_encoded&lt;/span&gt;&lt;span class="p"&gt;(|&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="n"&gt;trie&lt;/span&gt;&lt;span class="nf"&gt;.insert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
                    &lt;span class="nf"&gt;.map_err&lt;/span&gt;&lt;span class="p"&gt;(|&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="s"&gt;"failed to insert into trie"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ProvingTrie&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;root&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When looking at the code, there are three main objects involved. It is also helpful to have a minimal understanding of the code base. We see two objects, both the &lt;code&gt;root&lt;/code&gt; and the &lt;code&gt;MemoryDB&lt;/code&gt; are passed to the trie. This is likely calculating a new trie root when the &lt;code&gt;trie.insert&lt;/code&gt; (&lt;code&gt;let _ = i.using_encoded(|k| full_id.using_encoded(|v| trie.insert(k, v)))&lt;/code&gt; is being called – the only thing we can actually see mutatating state here – and then passes the element through to &lt;code&gt;MemoryDB&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/paritytech/trie/blob/95fd3b5d73a147e357fbb49222e5500309c08d56/memory-db/src/lib.rs#L110-L119"&gt;So then, how is memory DB implemented&lt;/a&gt;?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;MemoryDB&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;H&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;KF&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="k"&gt;where&lt;/span&gt;
    &lt;span class="n"&gt;H&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;KeyHasher&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;KF&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;KeyFunction&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;H&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;HashMap&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nn"&gt;KF&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;i32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;hashed_null_node&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nn"&gt;H&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Out&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;null_node_data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;_kf&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;PhantomData&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;KF&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It uses a &lt;code&gt;HashMap&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;HashMap&lt;/code&gt;s are tricky in a fixed-memory environment like wasm. I will not go into the details here, but most HashMaps use simplistic fast hashing functions that can cause collisions. This can degrade performance of a HashMap to the point of resulting in a DoS of the whole process. Any decent modern &lt;code&gt;HashMap&lt;/code&gt; therefore adds some randomness when relying on hashing keys. Rusts &lt;code&gt;std&lt;/code&gt;-implementation does this for example. However, a source of randomness doesn't exist within a &lt;code&gt;wasm32-unknown-unknown&lt;/code&gt; environment, making generic (and in particular the &lt;code&gt;std&lt;/code&gt;) &lt;code&gt;HashMap&lt;/code&gt;s unsafe for user-controlled input. &lt;/p&gt;

&lt;p&gt;Well, this isn't input users could easily control and use to bomb our hashmap, so that is fine. But the fact that modern implementations require a source of randomness is something to investigate. For a &lt;code&gt;no_std&lt;/code&gt; like the wasm environment, the &lt;code&gt;MemoryDB&lt;/code&gt; implementation takes the &lt;a href="https://github.com/paritytech/trie/blob/95fd3b5d73a147e357fbb49222e5500309c08d56/memory-db/src/lib.rs#L39-L43"&gt;great implementation provided by the &lt;code&gt;hashbrown&lt;/code&gt; crate&lt;/a&gt; – at &lt;a href="https://github.com/paritytech/trie/blob/memory-db-v0.21.0/memory-db/Cargo.toml#L14"&gt;version 0.6.3. with default features disabled as the &lt;code&gt;Cargo.toml&lt;/code&gt; reveals&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;It is important to note that &lt;code&gt;default-features&lt;/code&gt; are disabled because otherwise &lt;a href="https://github.com/rust-lang/hashbrown/blob/4e7acb5aed8ecdd93fc6f4dc9fcf6d9b8cede39d/Cargo.toml#L42"&gt;hashbrown would activate the &lt;code&gt;compile-time-rng&lt;/code&gt; feature in ahash&lt;/a&gt;. This is the hasher it utilizes internally. If the &lt;code&gt;default-features&lt;/code&gt; is activated, the &lt;a href="https://github.com/tkaitchuck/aHash/blob/6cf0438e39cd429b78faa98c63784946abad0043/Cargo.toml#L29"&gt;hasher would include &lt;code&gt;const-random&lt;/code&gt;&lt;/a&gt; to generate a random seed for the hasher &lt;em&gt;at compile time&lt;/em&gt; for &lt;em&gt;some&lt;/em&gt; randomness.&lt;/p&gt;

&lt;p&gt;This is similar to the problem we are debugging: a global constant, updated when we add a new key, that is slightly different on every compile.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Feature leaking&lt;/strong&gt;&lt;br&gt;
With the default features deactivated, how could it have snuck into our build still?&lt;/p&gt;

&lt;p&gt;Well, looking just one line lower in the &lt;code&gt;Cargo.toml&lt;/code&gt; reveals the answer. In order to fix a different compatibilty bug the &lt;code&gt;MemoryDB&lt;/code&gt; attempts to pin the &lt;code&gt;ahash&lt;/code&gt; crate to a non-broken version. In doing so, it activates the default features. And cargo features are additive, meaning that if one activates a features, all instances of the crate within the build have that feature activated. Thus leaking into our build.&lt;/p&gt;

&lt;p&gt;Boom.&lt;/p&gt;
&lt;h3&gt;
  
  
  4. Fixing it
&lt;/h3&gt;

&lt;p&gt;As you might have noticed, I was linking to specific older commits of these crates. On one side to give long-term correct links, secondly because those are the specific versions in use, but also because the problem is already partially address in newer version. &lt;br&gt;
The latest &lt;code&gt;ahash&lt;/code&gt; doesn't have the feature as part of the defaults anymore, it needs to be activated explictly. And by &lt;a href="https://github.com/paritytech/trie/commit/117f9efcd7dcbf36137977f53760721b8986b433"&gt;just removing the dependency pin in MemoryDB&lt;/a&gt; (which isn't needed anymore) and releasing a new version, the problem is gone here, too.&lt;/p&gt;

&lt;p&gt;The test from step 0 is there to prove it. That very same test &lt;a href="https://github.com/paritytech/substrate/pull/6597"&gt;is now added as a proper CI-check&lt;/a&gt; that every PR must pass, so we notice early on if we broke it again.&lt;/p&gt;
&lt;h3&gt;
  
  
  5. Down the label-rabbit-hole
&lt;/h3&gt;

&lt;p&gt;But it doesn't pass. The PR fixes the issue, yet the CI-check complains. Running the script directly and diffing the &lt;code&gt;.wat&lt;/code&gt;'s tells us why:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt;&lt;span class="gd"&gt;--- node_runtime_1_1.wat    2020-07-06 09:43:09.122043953 +0200
&lt;/span&gt;&lt;span class="gi"&gt;+++ node_runtime_1_2.wat    2020-07-06 09:43:14.178717225 +0200
&lt;/span&gt;&lt;span class="p"&gt;@@ -791052,10 +791052,10 @@&lt;/span&gt;
                                             i32.add
                                             call $_ZN86_$LT$sp_trie..node_header..NodeHeader$u20$as$u20$parity_scale_codec..codec..Encode$GT$9encode_to17h7b42619bd7dc163dE
                                             local.get 15
&lt;span class="gd"&gt;-                                            br_if 11 (;@9;)
&lt;/span&gt;&lt;span class="gi"&gt;+                                            br_if 12 (;@8;)
&lt;/span&gt;                                             i32.const 0
                                             local.set 1
&lt;span class="gd"&gt;-                                            br 14 (;@6;)
&lt;/span&gt;&lt;span class="gi"&gt;+                                            br 13 (;@7;)
&lt;/span&gt;                                           end
                                           local.get 0
                                           i32.const 1
&lt;span class="p"&gt;@@ -791092,10 +791092,10 @@&lt;/span&gt;
                                           i32.add
                                           call $_ZN86_$LT$sp_trie..node_header..NodeHeader$u20$as$u20$parity_scale_codec..codec..Encode$GT$9encode_to17h7b42619bd7dc163dE
                                           local.get 15
&lt;span class="gd"&gt;-                                          br_if 11 (;@8;)
&lt;/span&gt;&lt;span class="gi"&gt;+                                          br_if 10 (;@9;)
&lt;/span&gt;                                           i32.const 0
                                           local.set 1
&lt;span class="gd"&gt;-                                          br 12 (;@7;)
&lt;/span&gt;&lt;span class="gi"&gt;+                                          br 13 (;@6;)
&lt;/span&gt;                                         end
                                         i32.const 1
                                         local.set 17
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Although our bug is fixed, the label issues are &lt;em&gt;not&lt;/em&gt; and are persistent after. Other than we assumed, they are not caused by our fixed bug, they are their own bug.&lt;/p&gt;

&lt;p&gt;Darn.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;br&lt;/code&gt; and &lt;code&gt;br_if&lt;/code&gt; are control flow conditions. This means they are marking the respective blocks stopping point, in addition to where the execution should continue after. Considering that there are no other diffs, the code jump seems to not differ, it just caused by the ordering of these labels by the compiler. Most likely this marking occurs by the method in which the labels are sorted within the compiler's final processing – specifcially, storing items by a non-determinstic collection internally (e.g. hashmap etc).&lt;/p&gt;

&lt;p&gt;The most common place were this happens is in the optimization steps. If we build the &lt;code&gt;wasm/wat&lt;/code&gt; in debug mode (without the &lt;code&gt;--release&lt;/code&gt; flag) the diff doesn't show any changes. Thus strongly indicating that this was introduced somewhere in the optimization phase. &lt;/p&gt;

&lt;h3&gt;
  
  
  6. How the sausage is made
&lt;/h3&gt;

&lt;p&gt;Rust, and the rust compiler in particular, is not a full re-implementation of compilers, but is built on a compiler toolchain called LLVM. This allows Rust to focus on implementing its own syntax and features, while leaving most of the heavy lifting to the impressive suite of LLVM compiler tooling. &lt;/p&gt;

&lt;p&gt;&lt;code&gt;rustc -vV&lt;/code&gt; even tells you as much:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ rustc +nightly -vV
rustc 1.46.0-nightly (8ac1525e0 2020-07-07)
binary: rustc
commit-hash: 8ac1525e091d3db28e67adcbbd6db1e1deaa37fb
commit-date: 2020-07-07
host: x86_64-unknown-linux-gnu
release: 1.46.0-nightly
LLVM version: 10.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;LLVM is a modern and very flexible compiler framework with a huge range of supported platforms and compile targets. The way this is achieved, is by compiling your own language into an abstract in-between representation called &lt;code&gt;llvm-ir&lt;/code&gt; that you then feed to the llvm compiler. This can then be compiled to machine code for the requested target. As a result, Rust is one of the first languages to directly support compiling to wasm. &lt;/p&gt;

&lt;p&gt;In addition, most release-level optimizations don't actually happen within the rust compiler but in the llvm framework. Though llvm is mostly hidden from sight, for cases such as ours, rust allows us to pass through various arguments to get more data to inspect. Swapping &lt;code&gt;build&lt;/code&gt; with the specific &lt;code&gt;rustc&lt;/code&gt; command, we can append compiler flags to the compiler after&lt;code&gt;--&lt;/code&gt;. For our case we were interested in learning about the different steps taken after each optimization run so &lt;code&gt;-- -C llvm-args=-print-after-all&lt;/code&gt; is added before piping the entire output into a file.&lt;/p&gt;

&lt;p&gt;We quickly noticed that the output _is too big to work with, and running it in parallel on multiple threads resulted in a unparsable output. Adding &lt;code&gt;-Z no-parallel-llvm&lt;/code&gt; fixes this but our example is still unwieldy to deal with. Fortunately, the random relabeling happens in the same funtion call everytime. Thus isolating the exact call results in a very thin test case for the compiler bug, too.&lt;/p&gt;

&lt;h3&gt;
  
  
  7. Digging into the llvm output
&lt;/h3&gt;

&lt;p&gt;Looking again at the &lt;code&gt;.wat&lt;/code&gt; and tracing back from the differing output we find it located in&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(func $_ZN7trie_db9triedbmut18TrieDBMut$LT$L$GT$12commit_child17h1a851bf4aa72b1bdE (type 4) (param i32 i32 i32 i32)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In order to create a reproducible test, we need a version that tells us what the actual inputs use. Here, the compiler internals can help; by passing &lt;code&gt;-Zsymbol-mangling-version=v0&lt;/code&gt;. As a result, we get a more comprehensive compiler symbol in our &lt;code&gt;.wat&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(func $_RINvNtCs2KrsVm9iLGa_4core3ptr13drop_in_placeINtNtCsfzX9CsLwTkO_7trie_db9triedbmut9TrieDBMutINtCs7x3GksGMAjA_7sp_trie6LayoutNtNtCs42nTExHBz75_10sp_runtime6traits11BlakeTwo256EEECsdbcsBhSPY2w_12node_runtime (type 3) (param i32)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;which we can then pass to &lt;code&gt;rustfilt&lt;/code&gt; (&lt;code&gt;cargo install rustfilt&lt;/code&gt;) to make it human readable again:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ rustfilt _RINvNtCs2KrsVm9iLGa_4core3ptr13drop_in_placeINtNtCsfzX9CsLwTkO_7trie_db9triedbmut9TrieDBMutINtCs7x3GksGMAjA_7sp_trie6LayoutNtNtCs42nTExHBz75_10sp_runtime6traits11BlakeTwo256EEECsdbcsBhSPY2w_12node_runtime
core::ptr::drop_in_place::&amp;lt;trie_db::triedbmut::TrieDBMut&amp;lt;sp_trie::Layout&amp;lt;sp_runtime::traits::BlakeTwo256&amp;gt;&amp;gt;&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Alrighty! This is specific enough for us to create a build, and have something we can analyze. Replacing our main &lt;code&gt;lib.rs&lt;/code&gt; with a static reference and adding the relevant dependencies in the&lt;code&gt;Cargo.toml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="nd"&gt;#![cfg_attr(not(feature&lt;/span&gt; &lt;span class="nd"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"std"&lt;/span&gt;&lt;span class="nd"&gt;),&lt;/span&gt; &lt;span class="nd"&gt;no_std)]&lt;/span&gt;

&lt;span class="nd"&gt;#[no_mangle]&lt;/span&gt; &lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="n"&gt;FOO&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;unsafe&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="nn"&gt;trie_db&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;triedbmut&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;TrieDBMut&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;'static&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="nn"&gt;sp_trie&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Layout&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nn"&gt;sp_runtime&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;traits&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;BlakeTwo256&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;core&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;ptr&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;drop_in_place&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We now have a short crate that triggers the bug in our code path. &lt;/p&gt;

&lt;p&gt;Now, to analyzing the &lt;code&gt;llvm&lt;/code&gt; output, we compile this with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cargo rustc &lt;span class="nt"&gt;-p&lt;/span&gt; tiny-package &lt;span class="nt"&gt;--release&lt;/span&gt; &lt;span class="nt"&gt;--target&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;wasm32-unknown-unknown &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="nt"&gt;-C&lt;/span&gt; llvm-args&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nt"&gt;-print-after-all&lt;/span&gt; &lt;span class="nt"&gt;-Z&lt;/span&gt; no-parallel-llvm 2&amp;gt; llvm-log-1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With a clean example, we can now look at the &lt;code&gt;llvm&lt;/code&gt; output and the diffs between two runs again, revealing the first change to be:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-  successors: %bb.55(0x80000000); %bb.55(100.00%)
+  successors: %bb.58(0x80000000); %bb.58(100.00%
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As assumed the internal order of processing changes between runs. This happens in the step &lt;code&gt;# *** IR Dump After WebAssembly Fix Irreducible Control Flow ***:&lt;/code&gt;. &lt;a href="https://github.com/llvm/llvm-project/blob/a6d8a055e92eb4853805d1ad1be0b1a6523524ef/llvm/lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp#L236"&gt;We find in the llvm code here&lt;/a&gt;. Generally speaking, this has to be a compiler bug, because from the name of it, this step is supposed to prevent exactly that problem we are experincing. &lt;/p&gt;

&lt;p&gt;Once you've found a bug in an external repo, the first thing to do is see if the issue is already reported or patched. And indeed, we can find a &lt;a href="https://github.com/llvm/llvm-project/commit/3648370a79235ddc7a26c2db5b968725c320f6aa"&gt;commit to master&lt;/a&gt;, not yet released.&lt;/p&gt;

&lt;p&gt;As expected, &lt;code&gt;llvm&lt;/code&gt; also stores some of its own internal state in non-order-persistent ways (e.g. HashMaps with randomized keys). To ensure the order in which they are processed and yielded is deterministic, they are sorted before processing. Before this patch, under certain conditions, namely "fixing up a set of mutual loop entries", this wasn't reliably done. It appears that with our code we triggered this bug within the &lt;code&gt;llvm-ir&lt;/code&gt; representation. &lt;/p&gt;

&lt;p&gt;After patching a local &lt;code&gt;llvm&lt;/code&gt; in &lt;code&gt;rustc&lt;/code&gt; and doing multiple runs, we can confirm: &lt;strong&gt;this is the bug and this commit fixes it&lt;/strong&gt;. Although the bug was probably present in older versions of the compiler, only a new code base did actually triggered the path that would lead to it.&lt;/p&gt;

&lt;h3&gt;
  
  
  8. Backporting the compiler fix
&lt;/h3&gt;

&lt;p&gt;Though the fix was merged into llvm back in February, it wasn't yet part of llvm10 nor any of the &lt;code&gt;10.0.1&lt;/code&gt; release candidates, but will probably be part of the upcoming llvm 11 release. Unfortunately, rust is rather slow in picking up and porting to the latest version of llvm. Regardless, the Rust Team is very open for backports to the llvm and these migrate rather quickly into the nightly version of rust, which we are most interested in. As a result, we will can get a fixed wasm32 nightlt compiler pretty soon.&lt;/p&gt;

&lt;p&gt;As of the time of writing the &lt;a href="https://github.com/rust-lang/llvm-project/pull/68"&gt;backport patch has been accepted and merged&lt;/a&gt; and we are just waiting for &lt;a href="https://github.com/rust-lang/rust/blob/e59b08e62ea691916d2f063cac5aab4634128022/.gitmodules#L37"&gt;rustc to update its submodule&lt;/a&gt;. 🤞&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Credits&lt;/em&gt;: Gratitude to &lt;a href="https://github.com/eddyb"&gt;eddyb&lt;/a&gt; for their tremendous help, especially on the compiler part of things, knowing off and fiddling around the llvm args and pin down the bug in llvm ❤️!. Also a thanks to the &lt;a href="https://unsplash.com/collections/9338870/antarctic-expeditions-"&gt;Museums of Victoria for sharing their Arctic Expedition Photos for free for reuse on unsplash&lt;/a&gt;, which I used as the header picture.&lt;/p&gt;

</description>
      <category>rust</category>
      <category>webassembly</category>
      <category>blockchain</category>
      <category>debugging</category>
    </item>
  </channel>
</rss>
