DEV Community: knut

Moving Math

knut — Sun, 10 May 2026 07:48:08 +0000

I made a command-line program in Rust to quiz my kid on basic math, then added animations to it. Adding animations was a pretty interesting task, so I wrote about it on my other blog.

Decoding Arbitrary Data in Wireshark

knut — Thu, 16 Apr 2026 07:16:21 +0000

I wrote a post on my main blog about how to import data from a text file into Wireshark and decode it using one of its built-in parsers. This can be useful when you get the data from, say, a log file produced by your application rather than from a regular capture file.

Rust, WASM, and LOK

knut — Thu, 02 May 2024 04:42:29 +0000

(Cross-posted from my main blog)

Have you played LOK? It's a puzzle game in the form of a grid of squares with letters in them. You pick cells from the board in certain orders to activate "keywords" and then execute the effect of the keyword. The objective of each puzzle is to black out all the squares on the grid.

I purchased the physical book a while back and finished it recently. I had a lot of fun, especially with some of the later levels that have pretty cool reveals. There's also a video game version of LOK in development, which got me thinking about how someone would make that, given the types of puzzles in the book. I love developing in Rust and had been thinking about experimenting with WASM to see its capabilities and how Rust works with it, so decided to combine these two together into a side project I uncreatively called LOK-Wasm.

I'll give a little overview of the architecture, but I think the more interesting parts are the game design aspects and how to set up the interface for the player.

These are of course spoilers for the game's mechanics, but if you want to try it out, with the author's permission, I have it hosted at https://knutaf.com/lok_wasm. And if you want to see the source code, I have it hosted on my github.

Rust + WASM

First of all, a quick rundown of what WASM is. It stands for Web Assembly. In essence, similar to how Java compiles down to a bytecode that is interpreted by a Java Virtual Machine, Web Assembly is a different bytecode interpreted by the browser. Many different languages can compile into WASM, and Javascript can interface with it like a module. In my case, I wrote a lot of the source code in Rust and compiled it down to a WASM module, then called into it from Javascript.

Could I have written the whole thing in Javascript? Sure, but what would be the fun in that?

Like many Rust features, there is a Rust/WASM tutorial and book that was very good. I was able to follow the tutorial to get up and running with development pretty easily... though this entire ecosystem kind of has way too many moving parts. Webpack? NPM? Nodejs? Just for this? It's a bit heavyweight, but I guess like 90% of the development in the world uses stuff like this now, huh?

Using that tutorial, I was also able to run npm run build to produce the distribution package of static files I then uploaded to my web site for sharing. I wrote this paragraph in here as much for me as for you, dear readers, because I had to go looking on the Internet to figure out this magic incantation. Probably someone well-versed in NPM packages would have known this right away, but alas, I'm not that guy.

I was able to easily upload the few files it generated to my web hosting provider. I did have to set up the MIME type for the WASM file there, by adding the following to my .htaccess file:

AddType application/wasm .wasm

The Rust/WASM interface

Having gone through the tutorial, I needed to decide what the architecture of my game was going to be. My main guiding principle was that I wanted most of the code to be in Rust, especially the logic that implemented the rule checking. I decided I was OK having some static HTML and CSS for display, and Javascript to "render" objects exposed by the WASM module into the right HTML.

The LOK puzzles are all grid-based, so I grabbed an old Grid object I had previously written in Rust. Following the Rust tutorial, I exposed the contents of the grid as an array of u8 so that the Javascript side could map the region of memory for fast, direct access. I like this sort of thing in general, so I wanted to see if I could make it work.

Well, I was able to keep that for a while, but as I implemented more of the LOK rules, I ended up having too much state I needed to store on each BoardCell object, so I abandoned the mapped u8 array idea and just exposed a Board::get accessor to fetch a cell by row/col. I'm sure this incurs marshaling overhead, but my n is so small that there's no way it matters--LOK puzzles just aren't that big. Maybe 10x10 at most.

So I had just two classes exposed: Board and BoardCell. Board had getters like the height and width and of course fetching each cell. It also had actions, the verbs allowable by LOK puzzles. BoardCell only exposed getters necessary for rendering the puzzle in HTML: things like whether a cell should be interactable, what letter to draw on it, and how it should be styled.

The Javascript code iterates over all the cells in the Board, fetches each one, and generates a table with tr and td in it, applying different CSS classes to the cells depending on the current state of each one.

One important part of the UI is the ability for the player to enter a puzzle. I just used a textarea and let the player write in monospace font in it. If you want a 2x3 puzzle, just enter two rows of characters with 3 characters in each row. Underscore indicates a blank cell and a dash is a gap in the board. That's all you need to represent LOK puzzles. I thought about having a grid-based editor, but my I thought about wrangling all the event listeners and CSS styles and just felt tired, so I didn't.

The other verbs I implemented in the UI were the different actions you can take in the LOK puzzle, an undo button, and of course one to submit for checking if you have a proper solution entered.

Undo

The undo functionality was implemented in an inefficient but simple way. Every time the player makes a move in the puzzle, I copy the entire grid plus the move they made, modify the new copy of the board, and push them onto a stack. Undo simply pops the top of the stack, so the previous board state is now the thing to render. That also pops the latest move, keeping the board state consistent with the list of moves that got it there.

/// Marks the specified cell as blackened and tracks this move in the solution.
pub fn blacken(&mut self, row: usize, col: usize) {
    assert!(row < self.grid.height());
    assert!(col < self.grid.width());

    // Make a copy of the entire board and store that with the move, for easy undo.
    let target_rc = RC(row, col);
    let mut new_grid = self.get_latest().clone();
    new_grid[&target_rc].blacken();

    self.moves.push(BoardStep {
        mv: Move::Blacken(target_rc.clone()),
        grid: new_grid,
    });
}

/// Removes the latest move from the solution.
pub fn undo(&mut self) {
    let _ = self.moves.pop();
}

Checking the solution

I did most of the game logic for checking the solution with test-driven development. It was easy to do it this way because of the simple nature of the code with no external dependencies. My tests set up a puzzle and then enter in the moves to solve it. The checker returns a result that indicates if it was successful, or if not, specifically why it failed and at which move number in the attempted solution. This let me author tests for every specific failure type.

#[test]
fn lok1x4_correct() {
    let mut board = Board::new("LOK_").unwrap();
    board.blacken(0, 0);
    board.blacken(0, 1);
    board.blacken(0, 2);
    board.blacken(0, 3);
    assert_eq!(board.check_solution(), SR::Correct);
}

#[test]
fn lok1x5_unsolvable_out_of_order() {
    let mut board = Board::new("LKO_").unwrap();
    board.blacken(0, 0);
    board.blacken(0, 2);
    board.blacken(0, 1);
    board.blacken(0, 3);
    assert_eq!(
        board.check_solution(),
        SR::ErrorOnMove(1, ME::BlackenNotConnectedForKeyword)
    );
}

In whatever projects I work on, especially at work, I'm a big fan of having different error codes for different codepaths. If I had it my way, every error would uniquey identify which call-site it came from. I managed to do that with this project, so the test code very nicely describes the expectation in each case.

The checker iterates over the ordered list of moves that the player (or test) entered and modifies a copy of the puzzle grid for each step. It also keeps track of what state it's in--like gathering a keyword or executing a specific keyword, accompanied by any extra info about the current state. Rust's enum types are great for this.

enum BoardState {
    GatheringKeyword(String, Vec<Move>),
    ExecutingLOK,
    /// ...
}

Each move is checked against the current state to decide if it's permitted or not. Finally, after all the moves are done, it checks the overall grid remaining to make sure it's complete.

Code Coverage

As of the time I'm writing this blog, the main file is 2665 lines long, of which 63% of it is tests. I wrote a lot of tests! Towards the end, I also tried out Rust's code coverage support to see what tests I'd missed. It was cool because it found some tests that were not catching the conditions I expected and areas I'd forgotten to cover.

Rust's code coverage support is heavily based on what LLVM provides. There's a pretty decent suite of tools, and they're runnable from cargo (Rust's build tool). There's a good section of the book on code coverage, including how to instrument your code, generate data, and analyze data. I was able to follow it pretty easily.

One tweak is that the book suggests cargo-binutils is optional, but I couldn't really find a good way to run the tools without it.

Here's the script I ended up using for generating a coverage report:

del default_*.prof*
cargo test
cargo profdata -- merge -sparse default_*.profraw -o default.profdata
cargo cov -- show -Xdemangler=rustfilt target\debug\deps\lok_wasm-4d4ce0b412053771.exe --instr-profile=default.profdata --output-dir=coverage --format=html

After a few iterations of generating reports and adding tests, I managed to get 98% line coverage, and the only remaining parts were bits exclusive to the UI. Not bad at all. I don't think I've ever managed that with a project at work, heh.

Mobile version

I was pleasantly surprised to find that it Just Worked on my phone's browser. I guess that's the power of WASM?? I don't know. Actually I had to add one magical incantation:

<meta name="viewport" content="width=device-width, initial-scale=1" charset="utf-8" />

Apparently this fixes font scaling on mobile browsers? I don't know. Without it, the font sizes were all over the place, way too big in some places, way too small in others.

SPOILER WARNING for next section

The sections after this one are full of spoilers for the LOK mechanics. The original game does a great job of guiding you through learning how to play, gradually ramping you up on more complicated mechanics. It's masterfully done, so if you were thinking of playing, I recommend not reading the next section onwards.

Without spoilers, I can say that my number one guiding principle was to have my version reveal as little as possible about how the mechanics work. It is an oracle: you can submit a solution and get a result of yes/no, but if you want to know how the mechanics work, well, I've deliberately tried to make that difficult.

OK, now go away. Unless you've aready beaten the game, in which case, welcome!

Gathering Keywords

Reminder: spoilers from here on out. OK, let's go. So a big part of LOK is keywords. The main verb that players have for a good chunk of the book is "blackening" out a cell. If you black out all the squares in the right order that make up a keyword, you then get to (have to, actually) use the keyword's effect.

So of course I had to add a "blacken" verb to the UI. However, unless I took care with it, I could accidenetally leak information about how the mechanics work.

When you blacken a square, it does change color. However, you can click on it multiple times. Normally this would be a no-op, but I don't want to reveal that, so instead the letter in the cell gets a little superscript showing how many times it was clicked on. This lets the player know their click was registered and also lets them know their "undo" was registered, if they hit Undo.

I guess I could have made it a toggle, even though that's not valid in the game either, but that would be misleading, because a player might think the toggle acts like an Undo, but I didn't implement it that way.

When the player has successfully blackened out a keyword, there's no feedback in the UI that they've done something interesting. The solution checker is expecting the player to execute the keyword correctly next, but the only feedback the player gets is a "yes" or "no" when they click the "Check!" button, not when they interact with the board.

Conductors and Paths

Implementing the keywords LOK, TLAK, and TA were pretty straightforward. They all only use the "blacken" verb. The next choice came when implementing the conductor, 'X'. Conductors allow you to gather a keyword by selecting cells that aren''t in a straight line; you can "bend" the line at right angles.

In terms of interface, I could have chosen to just let the player pick the cells of the keyword without requiring them to enter any additional information about what path they're using. In fact that was my first thought. But then I started thinking about having to write the pathfinding code to verify that the player chose a valid path and realized: hey! why should I do this? The player should be telling me what path they want to use, and I can verify if it's valid. Let them do the hard work, right?

The other advantage of forcing the player to enter the path is that there's less chance of accidentally getting a puzzle correct without fully demonstrating an understanding of the solution. In other words, the player could enter an underspecified solution and get lucky.

The downside is that it reveals information about the mechanics, namely that the concept of a "path" exists. The way I presented it in the UI is there is a set of radio buttons corresponding to different modes, like "blacken" and "mark path". When the player uses "mark path" on a cell, it gets a border and increments the superscript on it--just enough to let them know that their click had some effect, but probably not too much about what it means.

Editing Cells

The final major UI change I had to make was to allow players to edit the contents of cells. Why? Because the BE keyword and the ? wildcard allow for it. I added a new mode, called "Edit". The player can freely edit whenever they want. I only allow changing characters to a single other character, but without much restriction on what they can change into. I hoped the freedom conferred by this mode didn't give them any information about why they'd want to do this or what the rules around it are.

I had a lot of fun with this project. It was simple and I learned some things. And of course I enjoy writing things in Rust. Go buy and play LOK! It's really good!!

netcrab: a networking tool

knut — Sat, 14 Oct 2023 07:10:00 +0000

(Cross-posted from my main blog)

Before I get to this project known as netcrab, I thought it'd be fun to share some history from Xbox's past... call it the origin story of this tool. Let's go back in time a little bit. The year was 2012 and I had joined the Xbox console operating system team a year or so before. We'd wrapped up working on one of the last major updates for the Xbox 360 and were well underway with the next project, the thing that would eventually release as the Xbox One.

I worked on the networking team, and the architecture of the Xbox One was wildly different than the 360. The Xbox One consisted of three virtual machines: a host, a shared partition (for the system UI, etc.), and an exclusive game partition. They all had to share a single network adapter, and so they had a whole lot of new code for this virtualized networking system to multiplex 3 VMs' worth of network traffic through a single adapter. To make matters even more fun, the host VM, the one that actually had access to the physical network adapters and ran their drivers, was (and still is) relatively lightweight. A lot of networking features are just not there, like, uh DHCP, IP fragmentation, and more.

All of that to say, back then I was doing a lot of debugging of extremely simple networking things, like whether the box even gets an IP address, did it send a packet, did it receive a packet, why did the firewall reject this, and why did the firewall allow this? I had a need for a simple networking tool I could use as a TCP client, TCP server, UDP listener, or UDP sender.

Well, such a tool already exists, of course, and has existed for a million years. It's called netcat and is well known to Unix people. There were two problems with it for me, though:

I didn't want to deal with license issues integrating it into tooling at work. I'm not sure what the license is, but Microsoft was much more wary about open source projects back then.
That Xbox host VM I mentioned earlier does not run off-the-shelf programs. You have to recompile your program from source for it to work.

So I took matters into my own hands--because I love having tools--and wrote my own replacement called netfeline. It got the job done for me at work, but it had just one problem: it is private to Microsoft, so I can't share it with anyone. And I'm not sure I'd want to; it's not my best work by a long shot. Now we arrive at April 2023 and I'm trying to get better at Rust and finally got the itch to rewrite netcat as an open source project.

Early Choices

Right from the beginning I knew I wanted to try using Rust's async/await functionality, but the current state of async programming in Rust is a bit weird. The language and compiler have support for certain keywords like async, but there's no standard library that provides an async runtime, which is needed to actually execute asynchronous tasks. The Rust async book has a good chapter on the state of the ecosystem.

So I started by using Tokio, a popular async runtime. The docs and samples helped me get a simple outbound TCP connection working. The Rust async book also had a lot of good explanations, both practical and digging into the details of what a runtime does.

When I work on projects, I like to add breadth first before depth. I want to stretch the code as broadly as it needs to go and see minimal functionality across all the features I want before I polish them. I find this helps me sort out all the structural questions I have. As I'll describe, I had to do a lot of stretching and restructuring throughout this project.

To start with, I got a TCP client and server and UDP listener and sender all minimally working. This set up the code to handle four major, simple scenarios: TCP/UDP and listener/sender, all of which have slightly different ways to work with them.

Wrestling with user input

The Tokio library I am using also provides wrappers for stdin and stdout. Unfortunately I found that this didn't work well for me. From Tokio's stdin docs:

This handle is best used for non-interactive uses, such as when a file is piped into the application. For technical reasons, stdin is implemented by using an ordinary blocking read on a separate thread, and it is impossible to cancel that read. This can make shutdown of the runtime hang until the user presses enter.
For interactive uses, it is recommended to spawn a thread dedicated to user input and use blocking IO directly in that thread.

Well, I wanted an interactive mode to work. You should be able to start a server on one end and a client on the other, and if you push a key on your keyboard, the other end should see it pop up. Tokio's stdin implementation had two problems:

if the program was about to exit due to, say, the socket being closed by the peer, it wouldn't exit until you pressed a key. Unacceptable.
if you pushed a key, it didn't get transmitted until you hit Enter. Boooo.

To address first problem, I ended up having to take the docs' advice and put those blocking reads on my own thread. By "blocking read", I mean my thread calls a read function that will sit there and wait (A.K.A. "block") until there is data available to be read (because the user pressed a key). The first problem exists because the Tokio runtime won't shut down until all the tasks it's waiting on complete, and one of them will be stuck with a blocking read call until the user hits a key. But by putting it on a std::thread, it's not managed by the Tokio runtime, and Rust is happy with tearing it down in the middle of a blocking call at process exit time.

For the second problem, I found a useful crate called console. This gives the ability to read one character at a time without the user needing to hit Enter. It has a weird bug on Unix-type systems though, so it currently defaults to the -i stdin-nochar input mode there.

All these arguments

By this time I had already gotten tired of parsing arguments by myself and had looked for something to help with that. I found a really dang good argument parsing library called clap. What makes it so cool is it's largely declarative for common uses. You simply mark up a struct with attributes, and the parser automatically generates the usage and all the argument parsing code.

Here's a snippet of one of the parts of netcrab's args as an example. This lets the user configure the random number generator for producing random byte streams sent to connected sockets. It exposes three arguments that the user could pass: --rsizemin NUM, --rsizemax NUM, and --rvals binary or --rvals ascii.

#[derive(Copy, Clone, PartialEq, Eq, PartialOrd, Ord, Debug, clap::ValueEnum)]
enum RandValueType {
    /// Binary data
    #[value(name = "binary", alias = "b")]
    Binary,

    /// ASCII data
    #[value(name = "ascii", alias = "a")]
    Ascii,
}

#[derive(Args, Clone)]
#[group(required = false, multiple = true)]
struct RandConfig {
    /// Min size for random sends
    #[arg(long = "rsizemin", default_value_t = 1)]
    size_min: usize,

    /// Max size for random sends
    #[arg(long = "rsizemax", default_value_t = 1450)]
    size_max: usize,

    /// Random value selection
    #[arg(long = "rvals", value_enum, default_value_t = RandValueType::Binary)]
    vals: RandValueType,
}

Here are a few tips about clap, for me to remember and for you to maybe learn.

It's not super straightforward what attributes are available to apply. If you have a #[group(...)], you can use attributes that match any of the getters in ArgGroup.
If you have an #[arg(...)] you can use ones from Arg.
A #[command(...)] corresponds to Command.
If you want to use an enum value in args, remember to add the attribute #[derive(clap::ValueEnum)] or else you'll get cryptic compiler errors.
#[command(flatten)] can be applied to pull in all of a struct's fields into the usage while retaining the nested nature in the struct.
If you want to add your own line to the usage for -h you can add #[command(disable_help_flag = true)].
If you are using the "derive" parser like I did, but you want to execute one of the methods on Command, you can call YourArgs::command().whatever().

Customizing your sockets

One of the useful things to do when testing a networking stack is customizing various socket options. If you click on the link you'll see that there are quite a lot of them. Tokio exposes a few to customize on the TcpSocket, TcpStream, and UdpSocket objects, but by no means all of them.

On the other hand, the socket2 object, upon which Tokio's sockets are built, exposes quite a lot of them from many of the different option groups. Joining multicast groups, enabling broadcast, setting TTL, etc.. I didn't quite know which all I wanted to expose in command line args, but I wanted to set myself up for success by getting access to the socket2::Socket.

Unfortunately, I didn't see a clear way to convert from a Tokio socket to that underlying socket2 socket. All I could see were FromRawFd and FromRawSocket, which could be joined up with the Tokio socket's AsRawFd/AsRawSocket. Well, I pushed forward with this and committed the following crime:

let socket2 = ManuallyDrop::new(unsafe {
    #[cfg(windows)]
    let s = socket2::Socket::from_raw_socket(socket.as_raw_socket());

    #[cfg(unix)]
    let s = socket2::Socket::from_raw_fd(socket.as_raw_fd());

    s
});

At the point I take the raw FD/socket and create a socket2::Socket on top of it, the socket2::Socket takes ownership of the handle and will close it when it goes out of scope. That would be bad, because it would shut down my original Tokio socket. So I had to work around it with an object that Rust provides called ManuallyDrop, which inhibits running the object's destructor.

This solution was ugly but worked, and I had to dip into unsafe APIs for the first time in the project, which made me a bit sad. Is it impossible to write a program of any reasonable complexity in Rust without resorting to unsafe calls?

I tried to hint just now that there actually is a clean way to do this. A recurring theme in this project is finding the right tool for the job. The right tool in this case is socket2::SockRef, which lets you call all those socket options on a AsFd/AsSocket without taking ownership or even requiring a mutable reference. No more unsafe calls either. It's exactly what I needed. I stumbled on it like five minutes ago as part of writing this blog.

The moral of the story is: if you find yourself banging your head against the wall fighting the borrow checker or bringing in unsafe code, look a little harder: most mature libraries have fixes for these rough edges already.

Group talk expansion

At some point I thought I was nearly done: I had the main scenarios of inbound and outbound traffic working, and a bunch of extra scenarios like being an echo server, generating random data to send out, sending broadcast and multicast traffic...

I was just about to call it "feature complete", but then I was browsing around that netcat web page and noticed an interesting feature: they called it "broker mode". It allows multiple clients to be connected at the same time and forwards traffic between all of them.

Suddenly I had a vision of a problem I wanted to be able to solve. I do a lot of my work on a laptop in my home, VPN'd into Microsoft corpnet. I have an Xbox devkit and a test PC at home. Sometimes I hit a bug on a device at home and want to have someone at work take a look at it.

Windbg has a feature called debugger remotes. One end, which is actually attached to the thing being debugged, is called the "debugger server". It can open a port and allow a "debugger client" to connect to it and operate it remotely. In the context of my home/work setup, my home PC can connect to work resources but not the other way around (restrictive VPN firewall), so I thought it would be cool if netcrab could support working around that and allowing a debugger client at work to connect to a debugger server at home.

What we need is a listener at work (since it can accept connections from home) that can accept connections from both home and the remote side of the debugger, and a connector at home that connects to both the local debugger session and the work listener. These two forwarders should be able to send traffic between the debugger session and the remote debugger.

More than one connection

There was a big hurdle to making this work: everything in the program up to this point was narrowly focused on only one remote connection. The program structure had no notion of more than one remote endpoint being connected. In order to expand the breadth of scenarios it can cover, I had to rewrite most of the guts of netcrab.

To work on it in phases, I started off bringing up support for just having more than one incoming connection at a time, postponing the feature of traffic forwarding. The code would create a listening socket and call accept on it. With async programming, the accept call is put in a separate task, and I wait until it completes.

Before that rewrite, when the first accept completes because of an incoming connection, that socket is handled until it closes, then the program either issues another accept to handle another client or exits, depending on user preference. With only one TCP connection at a time to manage, I didn't have to think about having multiple tasks for handling different connections at the same time. Life was good.

My first attempt to expand this went very poorly. I already had a function called handle_tcp_stream that created an async "task" object called a "future" that drove the input and output of the socket to completion, so I figured all I needed to do was call handle_tcp_stream on any new listening socket and stuff the future into a Vec.

I had the right idea but had not yet found the right tool for the job. Putting these futures in a Vec doesn't work because Rust doesn't let you both modify the list for adding to it and asynchronously modify it for removing completed futures from it. This requires mutably borrowing the Vec twice, and that's disallowed by Rust. By the way, around this time I found this good article about ways of thinking about mutability in Rust.

At some point I got the feeling I was barking up the wrong tree, and so with some searching I stumbled upon the right tool for the job, FuturesUnordered. Let's see:

a set of futures that can complete in any order
can add to it without a mutable borrow (wow)
automatically removes a future from the list when one completes (wow)

Suddenly my original idea was simple: every time I accept a new connection, I stuff it in a FuturesUnordered collection of ongoing connections and just await the next one finishing.

Everything is sinks and streams

Once I had multiple connections at the same time, the next step was to enable forwarding between them, and by the way I can't forget local input and output, which also should go to and from all connections.

Before we get too deep into the router's guts, it's worth explaining about streams and sinks, which feature prominently in Rust async programming and consequently in netcrab. A Stream is basically an asynchronous Iterator. While Iterator::next produces subsequent items synchronously, Stream::next returns a future that asynchronously produces the next item. Just like an Iterator, there are many convenience methods to modify the data as it emerges from the Stream (e.g. map, filter, etc.).

A Sink is the opposite: an object that can receive values and asynchronously handle them. You call Sink::send and await the transmission completing. The Sink is templated with the type of value it accepts. You can call with to add an "adapter" before the sink in order to change the data type the Sink accepts or to intercept and process items before the Sink handles them.

A common thing to do is send all the data from some stream to some other sink. That is done using the send_all method.

Where, in Rust, do sinks and streams come from? Well, anything that implements the AsyncWrite trait can be turned into a Sink by using the FramedWrite helper. Likewise with AsyncRead and FramedRead producing a Stream.

And where do you get an AsyncWrite or AsyncRead from? Well, Tokio provides them in many places. For example, you can call TcpStream::split, and you get one for each direction: writing to or reading from the socket asynchronously.

In practice, it looks a little like this:

let (socket_writer, socket_reader) = tcp_socket.split();
let socket_sink = FramedWrite::new(socket_sink, BytesCodec::new());

// Call `freeze` to convert a BytesMut to a Bytes so it can be easily copied around.
let socket_stream = FramedRead::new(socket_reader, BytesCodec::new()).map(|bm| bm.freeze());

router_sink.send_all(&mut socket_stream).await;

Another way to get a sink and stream is to use an mpsc channel. You get a sink and stream, either with a fixed limit of data it can carry or unbounded that allocates from the heap as needed. MPSC stands for "Multiple-Producer, Single-Consumer", and so one of the coolest properties is that the sink part can be cloned and you can have many parts of your program all feeding data into the same channel. This was a tool I reached for a lot. Maybe too much, but we'll get to that later.

This isn't strictly about sinks and streams, but I want to talk about Bytes for a second, since it's such a simple and cool object. In its most common case, it's a reference-counted, heap-allocated buffer, so it's cheap to copy around. It has other fancy things like avoiding reference counting for static allocations, but in the context of netcrab, a FramedRead with the BytesCodec produces BytesMut instances (which can be converted to a read-only Bytes cheaply), so all of the channels use them to pass data around without incurring buffer copies everywhere.

An aside: I am a big fan of writing technical blogs because the process of writing makes me think about things to change or improvements to make. Expounding about BytesMut above helped me remember that I had several places where I made a temporary buffer, filled it, and then created a Bytes from it, incurring a buffer copy.

I made a change to instead fill a BytesMut directly, then freeze it, to remove that buffer copy. Unfortunately, profiling didn't show any change.

The router is also sinks and streams

I started conceptualizing a "router". At its core it is:

a single Stream of data from various sources (local I/O and multiple sockets)
a piece of code that examines the source of each chunk of data and decides where to forward it
a collection of Sinks so that it could forward data wherever it should go
a way for the rest of the program to tell the router that a new socket has just connected

I used this blog post to finally get around to learning a tiny bit of Mermaid so I could make this chart. It's neat but does not give you enough control to make diagrams look just like you want.

mermaid / source

The input type to the router is SourcedBytes. What is that? It's something I added: a Bytes plus a SocketAddr. The reason I need that is the mpsc Sink can be cloned so multiple things can feed into it, but the Stream side of it doesn't indicate which one of the Sink clones inserted each element; I have to bundle that in myself.

By the way, the diagram says Sender and Receiver for readability, but I actually used UnboundedSender and UnboundedReceiver because I was OK with spending more memory for higher throughput and to avoid handling errors with the fixed-size channels being full.

Just like the router requires the remote address to accompany each data buffer, it also stores each socket sink in a map indexed by the remote address. The router can now implement some simple forwarding logic:

examine the source address of a data buffer
enumerate all the known socket sinks and send to each one that doesn't have the same remote address

That's it. That's "hub" mode.

Revisiting windbg remotes with hub mode

Equipped with hub mode, I tested out my idea. I'll show the spew just from a test all on localhost. This is the point of view of the work PC, not the home PC. I'll also add annotations.

// Listen on port 55001. Use forwarding mode "hub". Squash output
>nc -L *:55001 --fm hub -o none

// Successfully listening on that port.
Listening on [::]:55001, protocol TCP, family IPv6
Listening on 0.0.0.0:55001, protocol TCP, family IPv4

// Incoming connection from the "home" machine's netcrab instance, which is also connected to the debugger server and doing hub mode.
Accepted connection from 192.168.1.150:60667, protocol TCP, family IPv6

// Incoming connection from windbg debugging client, running on this same machine.
Accepted connection from 127.0.0.1:60668, protocol TCP, family IPv4

// Wait, what's this? Another?
Accepted connection from 127.0.0.1:60669, protocol TCP, family IPv4

What I discovered is that windbg (and many other programs) make multiple connections, for some application-specific reason. Whatever the reason, hub mode won't work for them, because traffic from one socket is forwarded to all other sockets. You end up with cross-talk that will surely confuse any application.

mermaid / source

Introducing "channels" mode

What you really want is something more like a tunnel. Two sockets to remote machines are associated with each other as two ends of a tunnel (or "channel", as I called the feature). Traffic is forwarded between these two endpoints without any cross-talk with other sockets.

mermaid / source

I already had the code to manage multiple sockets and decide which ones should be forwarded data. For hub mode, I had a broadcast-type policy implemented, and now I needed to add the necessary bookkeeping to use a different forwarding policy. A channel at its core is just a grouping of two remote endpoints, so I created this thing called a ChannelMap.

struct ChannelMap {
    channels: HashMap<SocketAddr, SocketAddr>,
}

When a new socket showed up, the router would try to add it to the channel map by passing in the new SocketAddr. The criteria for selecting the socket at the other end of a new channel are:

the other socket must not be part of a channel aready, and
the other socket must be from a different IP address

That second criterion is a bit weird, but without it you can create channels contained within this machine, which aren't useful.

And of course, the router had to choose to consult the channel map instead of using the broadcast policy when in channel mode.

To support the debugger client scenario where you need multiple outbound connections, I added a convenienece feature to create multiple outbound connections easily. The user can suffix "xNNN" to the port number, like localhost:55000x13 to create 13 outbound connections to the same host.

Applications that connect to a channel socket are expecting them to be transparent, meaning if they disconnect, it should disconnect all the way to the "server" end of the socket, so a socket in a channel disconnecting needs to "forward" that disconnection onwards to the other end of the channel. To allow connecting to the channel again, I had to add the ability to automatically reconnect closed outbound connections: the new -r argument, which is the analog of -L (listen again after client disconnection).

With these features, the channels scenario worked smoothly with windbg.

Socket address to route address

The ability to create multiple outgoing connections actually threw a wrinkle into the router. Above I pasted code that used a SocketAddr (the remote address) as a shorthand for a socket identifier, a way to figure out which socket produced an incoming piece of data. That doesn't work if you make multiple outgoing connections to the same remote host. See this spew:

>nc localhost:55000x3
Targets:
3x localhost:55000
    [::1]:55000
    127.0.0.1:55000

Connected from [::1]:50844 to [::1]:55000, protocol TCP, family IPv6
Connected from [::1]:50845 to [::1]:55000, protocol TCP, family IPv6
Connected from [::1]:50846 to [::1]:55000, protocol TCP, family IPv6

Here I'm connecting three times to the same remote host. Notice that the remote address is the same for all of them. If all I'm tracking on each socket is the remote address, how do I tell the difference between any packet originating from the remote address [::1]:55000? Right, I can't. I need to store a tuple of the local and remote addresses to uniquely identify a socket.

Not a big deal. I created a new type and used it in any place a SocketAddr was previously used to uniquely identify a socket.

struct RouteAddr {
    local: SocketAddr,
    peer: SocketAddr,
}

It did mean that now every piece of traffic flowing through the router had, umm, SIXTY FOUR BYTES extra attached to it!? Hold on, I'll be right back.

...four nights later...

Whew, I fixed that. I replaced it with a 2-byte route identifier. Though, as much as I was hoping it would show some improvement, especially when handling small packets, I wasn't able to measure any real difference. Either my laptop is too fast or it doesn't end up mattering.

It did add some complexity, since now I have to maintain a mapping between these short route IDs and the real route address, but I like having an identifier that doesn't also double as the socket address, so I'm going to keep it.

Removing mpsc channel per socket

Going back to the diagram of the router from before, you might have noticed an overabundance of mpsc channels. I wasn't kidding when I said I used them a lot. They were a very convenient tool for creating sinks and streams without fighting Rust too much: every socket got one, the router got one, and local input got one.

The router sends into each socket's channel, and the Stream side of it is sent to the socket using send_all. That's a pipe that only exists to make Rust happy. Each socket exposes a sink, called tokio::net::tcp::WriteHalf. It feels like it should be possible just to bypass the middleman and have the router send directly to the WriteHalf.

So I tried that. And promptly got suplexed by the borrow checker and/or lifetime errors (can't remember exactly what error I hit). I was passing the ReadHalf and WriteHalf to two different futures, and both require mutably borrowing the TcpStream. Just like when I tried to store futures in a Vec, it was never gonna work.

This was yet another case of not having the right tool for the job, which turned out to be TcpStream::into_split, which consumes the TcpStream instance and gives you "owned" versions of the read and write half. These can be passed around freely, since the original TcpStream object they came from has been consumed rather than borrowed. With that, I could remove a nice layer of queuing from the architecture.

With the removal of a queue, it of course also reduced memory usage in cases where the socket was producing data at a faster rate than the router could consume it. The "unbounded" version of the mspc channel allocates extra storage in this case, and, true to its name, memory usage sometimes grew to over 1 GB. Big yikes. Anyway, here's the new diagram.

mermaid / source

Oh, and making this change increased throughput about 2x.

Removing mpsc channel for local input

While writing this blog, that mpsc channel for local input also kept bugging me. Surely there has to be a way to remove that one too, right? The reason it is a channel is to have a unified model for all input modes. Each input mode is represented by a stream: the "random data" input mode is an iterator that produces random bytes, wrapped in a stream. The "fixed data" input mode is an iterator that produces the same value, wrapped as a stream. Likewise, reading from stdin came from a stream. So the router code would just do a send_all from the local input stream to its main sink and process local input just like any other socket.

In other words, I've been saying "all local input modes are a stream", but really what I've done is construct a model where I have to force all local input through streams. What if I can find a different commonality that fits more naturally without forcing?

What if I said "all local input is a future that sends to the router sink and ends when the local input is done?" Before, I had the type LocalIoStream that was defined like this:

type LocalIoStream<'a> = Pin<Box<dyn futures::Stream<Item = std::io::Result<Bytes>> + 'a>>;

In short, that's a stream that produces Bytes objects. I tried to change it from a stream to a future, like this:

// A future that represents work that drives the local input to completion. It is used with any `InputMode`, regardless
// of how input is obtained (stdin, random generation, etc.).
type LocalInputDriver = Pin<Box<dyn FusedFuture<Output = std::io::Result<()>>>>;

This is a future (task that completes asynchronously) with no result at the end, just notification that it completed.

It's slightly unfortunate that the type of LocalInputDriver doesn't imply anything about its functionality, like the fact that it's supposed to send data to the router sink. But it's all in the name of performance, so I can live with it.

In practice, creating a local input driver is usually the same as creating a local input stream, just with an added router_sink.send_all(&mut stream) call at the end of the future.

I mean, except for reading from actual stdin, which is done on a separate task with individual router_sink.send() calls, and the future ends when stdin hits EOF.

And with that, here's the final diagram of how netcrab is. Not a lot of cruft to cut out anymore.

mermaid / source

What about UDP?

I may not have said it explicitly, but everything I talked about before with the router was actually in the context of TCP sockets. UDP works a little differently, enough so that I annoyingly couldn't reuse the TcpRouter object and instead had to write almost the same router functionality again.

The main difference is that the TcpStream object implicitly embeds the local and remote addresses, whereas a UdpSocket only includes the local address. Well, you could constrain a UDP socket to have a single implicit destination by calling connect, but it prevents you from receiving from other destinations, so that's no good.

So whereas with TCP you have one separated stream for each remote endpoint and each one can end when a disconnect happens, with UDP you have a small set of locally-bound sockets that never end, and an association with a remote peer can be created at any time when the first packet arrives from it. Also, every UDP send needs to include the destination address.

It's just different enough that all the TCP router code can't be reused. I had to write analogous code for the UDP paths, heavily patterned off of the TCP router. Frustrating.

Listening and connecting

When I first started this project, there were four major "modes":

do_tcp_connect
do_udp_connect
do_tcp_listen
do_udp_listen

In other words, I had a clean separation between scenarios with outbound connections and ones with inbound connections. Once I had the router in place and a much better structure to the code, I could re-examine that. In an outbound connection scenario I basically resolve some hostnames, establish connections, and then throw the resultant streams into the router to manager. For inbound, I create some listening sockets and throw any inbound connection into the router.

Why not have one function that does both inbound and outbound, depending on arguments? If the user asks to listen on some port (-L), then start up the listening sockets. If the user passes hostnames to resolve, then do that. Feed everything into the router. With this change, I now have just do_tcp and do_udp.

Here's an example of a place where this proxy-like feature could come in handy. Say you want to force a certain application's connection to go out of a certain network adapter. Let's say that adapter has the local address 192.168.1.150. You can do netcrab --fm channels -s 192.168.1.150 -L *:55000 target-host:55000. That will set up a channel listening on port 55000 and forwarding to target-host over port 55000 using the local adapter with address 192.168.1.150.

Iterators are too fast

An interesting problem I ran into was that my input modes that involved iterators like rand and fixed produced data so quickly that they stalled all other processing. I don't quite know what policy caused this, but I found one workaround is inserting a yield_now call in every iterator step.

How big a `BytesMut` do I want?

I'll end with one last topic, which is kind of fun. Earlier I talked about using BytesMut to create a buffer, fill it, then turn it immutable and pass it around. One place this is used is when reading from stdin. The program lets the user choose what size of data to accumulate from stdin before sending it to the network: the "chunk size" or "send size".

In theory I could allocate a single BytesMut with enough space for several chunks and freeze only the most recently filled chunk, then start writing into the next chunk. BytesMut::split lets you do that. It costs CPU to allocate and free memory, so this would reduce the number of allocations. Here's basically what the scratch buffer then looks like.

// Allocate 4 chunks of space at a time.
let alloc_size = chunk_size * 4;
let mut read_buf = BytesMut::with_capacity(alloc_size);

// Manually set the length because we know we're going to fill it.
unsafe { read_buf.set_len(chunk_size) };

// ... Later, when sending a chunk:

// split() makes a Bytes with just the valid length but retains the remaining capacity in the BytesMut.
let next_chunk = read_buf.split().freeze();

// If we've used up all the capacity, allocate a new one.
if read_buf.capacity() == 0 {
    *read_buf = BytesMut::with_capacity(alloc_size);
}

unsafe { read_buf.set_len(chunk_size) };

I had this working fine. The memory usage looks big, but I was pushing hundreds of MB/sec read from disk, so that was expected.

Then I noticed a function called reserve. It has an interesting note that it will try to reclaim space from the buffer if it can find a region of the buffer that has no further Bytes objects still alive referring to it.

I thought this was pretty cool. Imagine not having to reallocate new chunks of memory, but instead automatically getting to reuse space you had previously allocated. So I swapped the with_capacity calls above for reserve to see if that trick ever kicked in.

Well, uh, the graph ended up looking like this instead.

So clearly my calling pattern with this buffer is such that I always have some overlapping use of it when it comes time to reserve more space, so it just grows and grows. And of course it didn't come with any perf benefit either, so I had to fall back to using with_capacity, which was just fine.

Conclusion

I wrote a lot. It kind of meandered. It was a tour through a bunch of the internals. It was written as much for me as for you (I fixed at least three things due to writing this). Am I a good Rust programmer now? Absolutely not. Did I learn something in the course of making netcrab? Absolutely yes. And I got a useful tool out of it, too.

A Cautionary Rust Tale About IO Redirection

knut — Sun, 24 Sep 2023 05:43:38 +0000

(Cross-posted from my main blog)

Yesterday I hit an absolutely baffling bug with some Rust code I'm working on. Illustrating it is a bit contrived, but it's a cautionary tale, and there's not much I like more than sharing cautionary tales. I'm going to illustrate the bug I hit in a toy executable instead of in the context of netcrab, which is unnecessarily complicated.

So there I was, trying to add a feature to netcrab to allow piping input and output to an executed command on the local machine. I'm getting input from a network socket and want to transfer that over to the child process's stdin; and the child process is producing stdout output that I want to send back over the network.

So I start by creating the child process. In my case I was trying to use an interactive program like cmd.exe.

let mut child = std::process::Command::new("cmd.exe")
    .stdin(std::process::Stdio::piped())
    .stdout(std::process::Stdio::piped())
    .stderr(std::process::Stdio::null())
    .spawn()
    .expect("Failed to execute command");

I set the input and output to be piped so I can get access to them. Next I spawn a thread to read from the child process's stdin and write it out somewhere--in the example it'll just go to the main program's stdout.

// Extract the ChildStdout object and send it to be drained.
spawn_child_stdout_reader(child.stdout.take().unwrap());

// ...

fn spawn_child_stdout_reader(mut child_stdout: ChildStdout) -> JoinHandle<()> {
    // Create a new thread.
    std::thread::Builder::new()
        // In the thread, loop forever.
        .spawn(move || loop {
            let mut buf = [0u8];
            // Keep trying to read from the child's stdout.
            match child_stdout.read(&mut buf) {
                // If we get a 0-byte read, the child's stdout ended.
                Ok(0) => {
                    eprintln!("Child stdout exited!");
                    return;
                }
                Ok(_num_bytes_read) => {
                    // If we got data, write it to the main stdout.
                    std::io::stdout().write(&buf);
                }
                Err(e) => eprintln!("err reading from child stdout: {}", e),
            }
        }).unwrap()
}

I ran this, and the child stdout exited right after printing out the initial prompt. It didn't make sense why it would close stdout so early.

At some point I decided to try adding in the stdin handling to see if it would help. Yes, I know this could be written much better with templating. I'll include that at the end.

spawn_child_stdin_writer(child.stdin.take().unwrap());

// ...

fn spawn_child_stdin_writer(mut child_stdin: ChildStdin) -> JoinHandle<()> {
    std::thread::Builder::new()
        .spawn(move || loop {
            let mut buf = [0u8];
            match std::io::stdin().read(&mut buf) {
                Ok(0) => { eprintln!("stdin exited!"); return; }
                Ok(_num_bytes_read) => {
                    child_stdin.write(&buf);
                }
                Err(e) => eprintln!("err writing to child stdin: {}", e),
            }
        }).unwrap()
}

And this actually helped. Why?? In my quest to figure it out, I started commenting out bits and pieces to narrow it down. But I was sleepy, so I didn't do a good job of it. I ended up at some point with this slight change.

fn spawn_child_stdin_writer(mut child_stdin: ChildStdin) -> JoinHandle<()> {
    std::thread::Builder::new()
        .spawn(move || loop {
            let mut buf = [0u8];
            match std::io::stdin().read(&mut buf) {
                Ok(0) => { eprintln!("stdin exited!"); return; }
                Ok(_num_bytes_read) => {
                    // TODO: re-enable
                    //child_stdin.write(&buf);
                }
                Err(e) => eprintln!("err writing to child stdin: {}", e),
            }
        }).unwrap()
}

And this version of the code I sort of forgot about and left in, becoming more and more confused about why my child stdout was exiting so fast.

The reveal

So what was happening here? Here's what's up.

Fundamentally, if you close the child's stdin stream and the child process tries to read from it, then the child process will close the stdout stream too. This should only happen with programs that actually try to read from stdin. Or maybe programs that gracefully accept stdin being closed without responding by also closing stdout. I admit I don't have 100% understanding of this. With interactive programs, you typically get both streams or neither, bucko.

So how is the child stdin getting dropped? This is is where it gets contrived and painful.

First, I started off only implementing the stdout reader portion, so when the child process object went out of scope, the stdin handle, not having been extracted using take(), got dropped along with it. Oops.

Next, when I added the stdin writer thread, it started working again, because something was now using the child stdin.

Finally, in my blundering attempts to understand what was happening, when I commented out the child_stdin.write(&buf); line in spawn_child_stdin_writer, even though child_stdin was moved into the function as an argument, because I didn't reference it in the thread's closure, the move || syntax didn't move it, so it got dropped outside of the thread. OH COME ON.

Caution

You know, when I write C++ code, I want to be real explicit about everything. For example, I insist on avoiding auto at all times because I've seen how it makes code ambiguous. But in Rust, I trust the language a lot and happily let it type deduce and do all kinds of other things automatically because it seems to have its act together.

But dang now Rust, I'm going to have to start being more caref--what? What's that? Huh? No, I didn't see any warning messag--yeah, of course I look at the warning messages. Fine, I'll go look.

warning: unused variable: "child_stdin"
  --> src\main.rs:24:33
   |
24 | fn spawn_child_stdin_writer(mut child_stdin: ChildStdin) -> JoinHandle<()> {
   |                                 ^^^^^^^^^^^ help: if this is intentional, prefix it with an underscore: "_child_stdin"

Shut up.

Appendix: Better reader/writer thread

As promised, here's a cleaner way to express the concept of spinning up a thread that drains a source and sends it all to a sink. Not efficient, but illustrative. We could all use some illustrations in our lives.

fn spawn_read_write_thread<TSource, TSink>(mut source: TSource, mut sink: TSink) -> JoinHandle<()>
where
    TSource: Read + Send + 'static,
    TSink: Write + Send + 'static,
{
    std::thread::Builder::new()
        .spawn(move || loop {
            let mut buf = [0u8];
            match source.read(&mut buf) {
                Ok(0) => { eprintln!("Source closed!"); return; }
                Ok(_) => { sink.write(&buf); }
                Err(e) => eprintln!("Err reading from source: {}", e),
            }
        })
        .unwrap()
}

A Framework Laptop Hacking Story

knut — Thu, 22 Jun 2023 17:32:29 +0000

(Cross-posted from my main blog)

I bought a Framework Laptop a few months ago. I was really drawn to the idea of a laptop that I could customize and which was built with repairability in mind. That's a really great stance to take, and it didn't hurt that the laptop build and specs looked good, so I went for it. It was also cool that I could buy it with no memory, no hard drive, no power adapter, and no operating system, and supply those things separately. In general I was very happy with it, but it had one little behavior that bugged me. This is a long journey, but I hope it contains some useful information.

The keyboard problem

I noticed sometimes when typing or editing files, keys wouldn't repeat properly when held down. I noticed it most when pressing and releasing combinations of keys in quick succession. Eventually I figured out a simple and concise repro. You can try this at home on your computer and compare the behavior:

Press and hold a key, say 'A'
Press and hold another key, say 'B'. Now you have two keys held down.
Release 'A' but keep 'B' held down

Expected: since you still have the 'B' key held down, it should continue emitting repeated keystrokes.
Actual: no keystrokes are being emitted.

I noticed this reproed not only in all programs in Windows, but even when navigating BIOS settings, which gave me a strong feeling that it was either a hardware or firmware issue.

Support response

Framework support engaged readily with me on it and immediately replaced my Input Cover free of charge, no questions asked. I was already sort of skeptical about whether this would work, but I didn't think it was worth turning down this free troubleshooting.

Sure enough, the problem kept happening with the new Input Cover. By this point I had read some of the source code and was pretty sure I had spotted the bug. In keyboard_8042.c, there's a function called keyboard_state_changed that is called for each key when it changes state between being pressed or not pressed. It has the following code in it:

if (is_pressed) {
    keyboard_wakeup();
    set_typematic_key(scan_code, len);
    task_wake(TASK_ID_KEYPROTO);
} else {
    clear_typematic_key();
}

If a key is pressed, set it up as the new "typematic" (i.e. repeated) key. That sets up timers so that it is automatically sent at some configured interval to do the repeating. However, if it is released, clear the typematic configuration so that no key is being repeated, irrespective of which key that was. In other words, releasing any key will stop any other key repeating.

Bug or no bug, this isn't how basically every other computer I've used in my life works. I reported the code behavior to Framework support, who checked with their firmware engineer and unsurprisingly said "by design", because the same code is present in the upstream ChromeOS Embedded Controller repo.

So every ChromeOS based computer will have this behavior. That's a lot of devices. Is it a bug? At this point, who knows. It's definitely a behavior difference. Is it desirable? Some people are probably used to it by now, so there isn't a clear cut answer. I doubt this behavior will get changed at this point. So, what am I to do?

Framework Hacking

One of the coolest things about the Framework laptop is that because they published their embedded controller (EC) source code, it's possible to rebuild and reflash the firmware, which includes this keyboard code. Although Framework itself doesn't publish guides on this (more thoughts on this later), members of the community like DHowett have. He has a whole series of posts on how to rebuild the EC and flash it. In fact I haven't found any other comprehensive resources like that. It's really amazing stuff, and I'm thankful for it.

I started working my way through the guide on modifying the EC firmware. It was going pretty smoothly. As an aside, I installed WSL so that I had a reasonable dev environment, because I'm a scrub who does all his dev work on Windows. I'd never used WSL before, and I was amazed at how easy it was to follow Linux-centric instructions pretty much to the letter. You get a fully fledged Linux environment running in a Windows command prompt. It's actually magic.

Anyway, with the success there, I was able to spit out "ec.bin", which I could flash to my device. At this point, I promptly chickened out.

Hacking with safety

I am way too risk-averse to potentially brick my $2000+ laptop due to a coding bug or some issue with the way the firmware image I produce. I had to have a backup plan in case something went wrong. I took to the Framework forums to ask if others had tools or suggestions on a backup plan.

It turns out I might be more risk averse than anyone else...? Haha... or everyone else already has the knowhow to fix a problem like this on their own if it arises. I have just dabbled in embedded programming, so that is probably the case.

There are basically three potential backup routes that I could see:

use the JECDB header to connect a SWD probe like a Picoprobe and debug the firmware
use a UART debug adapter from the USB-C port
connect a flash programmer directly to the flash chip that holds the EC firmware and write a backup to it

The JECDB header (labeled JSWDB) is not populated and is so tiny that it would require microsoldering. I don't have a hot air station or any experience here, so I wasn't too keen to try that, although it would involve some new toys. Who doesn't love new toys?

The UART console would be cool and useful. Someone actually used to make a thing called a SuzyQable that would do this. Actually DHowett made a limited run of similar functionality in a Framework laptop expansion card.

But most people on the forum suggested getting a flash programmer and interfacing directly with the flash chip.

Equipment for chip flashing

I didn't realize this before, but many flash chips speak protocols that are well known by tools, so you can take an unpowered one, touch the pins with a connector to the programmer, and read, erase, and write contents. There is a very good write-up I referenced a lot about unbricking a Chromebook (which this laptop sort of is, in this respect).

From that guide, I learned some of the equipment I would need: the CH341a flash programmer itself, potentially a voltage adjuster, and a chip clip or other type of connection to the exposed pins on the mainboard. Not too surprisingly, it's possible to find a bundle with all of this flash programming equipment together. The bundle came with a SOIC-8 chip clip. I wasn't sure yet if that would be the right type.

Here's a useful pic of the programmer, taken from that Unbricking page. I had to consult it a bunch of times to remember the orientation of the pins.

Finding the flash chip, part 1

The unbricking guide of course didn't tell me where specifically to find the flash chip on my laptop's motherboard, so I had to go hunting. Its advice was to look for Winbond chips, and I found one on the top side.

This one looks like a Winbond 25R256JVEN. The chip package is a WSON-8 8mm x 6mm. The SOIC-8 clip, while it had the right dimensions and spacing between pins, isn't physically compatible with the extremely flush mounting that WSON-8 has; the chip's leads don't stick out far enough for a chip clip to attach to it. I would need to buy a test probe instead.

Another important point is the voltage level that the chip requires. From the Winbond web page I saw that Vcc is 2.7V - 3.6V, so it would accept 3.3V from the flash programmer and I wouldn't need to use the 1.8V voltage adjuster.

I found a ton of test probes on AliExpress. Here's the one I ended up buying. Description is "2023 DFN8 QFN8 WSON8 Chip Probe Line Read Write Burning Test Adapter Socket 1.27 6x8 5x6 for CH341A TL866 RT809H/F Programmer". I actually had ordered another one from eBay before that, and it came damaged with one pin bent, which made it effectively useless. You know, it's really frustrating that all this test probe does is hold 8 pins in a particular shape. It has no logic, nothing complicated at all about it. But I don't have any other way of holding 8 wires touching the chip's pins all at the same time, so I'm beholden to this. Curse these inadequate human hands! And it's doubly frustrating because the only place I can get one of these is overseas, so I had to wait 2-3 weeks for it to arrive.

When it finally did arrive, I was able to use a program called AsProgrammer to use the CH341a flash programmer to read the contents of the flash chip. Here are some things I noticed about using this tool:

if you don't have the test probe firmly touching the flash chip's leads, it won't detect the flash chip type correctly
however, once you start the "Read IC" operation to dump the contents, if your test probe doesn't have firm contact, it will silently read zeroes

Therefore after dumping the flash it's important to execute a "Verify IC" command, which compares the dumped buffer with the flash contents. If it fails, it either means you moved the test probe during the "Read IC" or the "Verify IC" operation. Likewise, whenever you write to the flash chip, you need to do a "Verify IC" command after to make sure that you didn't lose contact during the writing process. Even with a pretty steady hand, I messed up the read operation a couple times. Honestly, it's terrible, and I would really rather have a better way.

Top-side 32 MB flash chip

Anyway, on to the chip contents. Firstly, I'm sharing all the things I dumped from the mainboard, plus close up pictures online. I encountered a real dearth of pictures of the under-side of the mainboard, so I took a bunch of close-up pics so people can look at what ICs are found there. They're not the best quality but it's better than nothing.

I wasn't quite able to figure out what this flash was. It was much bigger than the 524,288 bytes that the EC firmware image normally is. I was able to find a copy of the EC firmware image in it at offset 0x1000, but not sure why. I uploaded my backup of this chip under the name "top-side_near-cmos-battery_Winbond_25R256JVEN.bin", but I didn't feel this is the important backup to make.

So if this isn't the EC firmware flash, where is that? I peeked under all the components on the top side of the board that I could (fan, etc.) but couldn't find any other flash chips. I tried looking at the schematic, and while it mentions two flash ROMs, it doesn't mention where to find them. Reluctantly, I took out the mainboard and checked the under-side.

Under-side flash chips

I found three identical Winbond 25Q80DVIG chips on the under side. One of them was quite close to the MEC1521 embedded controller, so I took a guess that this one contains the EC firmware.

These ones are WSON-8 6mm x 5mm package, different than 6mm x 8mm that the upper-side chip was. It's a good thing I had both size test probes. This one also accepts 3.3V. I pulled the contents of this one off under the name "under-side_bottom-left_Winbond_25Q80DVIG.bin". Finally, when I compared its contents to the dump I got from ECTool, it was an exact match.

I also dumped and uploaded the other two flash chips on the under side, though I couldn't quite tell what they're for. Just from looking at strings, they seem to be firmware for other components, but I couldn't quite tell what. I also couldn't find these flash parts in the schematic, so I'm not sure what's up with that.

At this point I felt like I had a route to restoring a backup if I had to, so I felt ready to proceed.

Detour - other problems

At this point I reassembled the laptop and... it didn't turn on and wouldn't charge. I saw forum posts relating to the 11th gen Intel mainboard having some issue, but since I have a 12th gen Intel laptop I didn't think it would apply. I tried stuff like trickle charging with a non-PD USB-C adapter, but it didn't help.

I saw a suggestion to try popping out the CMOS battery and popping it back in. I uh, tried to do that, but managed to snap the receptacle. Within a day, Framework shipped out a replacement mainboard free of charge and shipping. I was blown away by the quality of customer service.

A good baseline

When I was getting ready to flash again, I noticed an issue about the compiler version used to build the firmware binary. I followed the advice, but more importantly I noticed that the issue has been recently fixed, and in the resolution, the maintainer says "Next release (hx20 3.19, hx30 3.07) will include them". It reminded me of something crucial: the Framework EC firmware source code repo doesn't have any particular indication of its level of stability at any given commit. Which commits could be considered fully tested releases? What if the head of the branch introduces a bug that they're working on fixing?

When I build my fix, I want to apply it as a delta on top of something I know is fully tested, or at good enough for them to release. As part of my spelunking through firmware images earlier, I pulled all the strings out of the different firmware images to get clues about what they were. The very first string in the EC firmware image is "hx30_v0.0.1-7a61a89". That looks suspiciously like a commit hash. Can I look it up in the Framework EC repo? Hey, look, it sure is a valid commit!

With this I could git checkout 7a61a89 and then create my topic branch with my fix from here. This version was clearly good enough for them to ship in-box, and that's a pretty good quality bar.

Flashing with ECTool

Finally with the new mainboard and the laptop operational again, I was ready to use ECTool to flash my bug fixed firmware. It actually flashed correctly without a hitch. All the stuff I did before was to prepare for the worst, but it didn't happen.

Well, it wasn't completely without an issue. I spent so much time worrying about the flashing procedure that I was surprised to find my bug fix not only did not fix the bug, it almost made it impossible to reflash without using my backup. What I found is that whenever I pressed a key (not held it down), it would have a slight delay and then start repeating and not stop repeating. Ittt madeeee itttt harddd tooo typeeee stufffff.....

No problem, let me just use ECTool to reflash the backup. Uh oh, the first thing ECTool does is say "press any key to abort". Due to my keys repeating, it kept aborting! Finally after some panicking, I figured out I could press Shift after hitting Enter, and it wouldn't count for its "press any key" logic. With that, I was able to reflash a backup. As a side note, I have a fork of ECTool that adds an option to avoid the "press any key to abort" behavior. I'll see about getting this feature into ECTool proper.

Debugging the keyboard fix

Now that I've managed to dig myself out of the hole my bug created, I need to debug and figure out why my fix didn't work. Let's dig into the code. Here is the entirety of the fix:

if (is_pressed) {
    keyboard_wakeup();
    set_typematic_key(scan_code, len);
    task_wake(TASK_ID_KEYPROTO);
} else {
    // FIX STARTS HERE
    // Only clear typematic key if that is the key being released. This fixes
    // an issue where if keys A and then B are both held down at the same time,
    // and the user releases A, B will also stop repeating.
    if (len == typematic_len &&
        memcmp(scan_code, typematic_scan_code, len) == 0) {
        clear_typematic_key();
    }
    // FIX ENDS HERE
}

When a key is pressed, set_typematic_key is called, which sets global variables to store the scancode of the key that should be repeated. My thinking was when the key is released, I should be able to compare the scancode that's being released and only clear the typematic scancode if it's the one being repeated.

From the observed behavior with this buggy change, clear_typematic_key is somehow not being called on key release. Here were the first level causes I could think of:

the function containing this code isn't being called at all
len doesn't match
scan_code doesn't match

I was able to rule out #1 pretty quickly by looking through the code that calls this. Also I was pretty sure this code was being called before when a key is pressed and then again when released.

Options 2 and 3 are interesting. I don't know what scan_code really is. Is it identical when a key is pressed versus released? I need to look at the code that creates the scancode value:

ret = matrix_callback(row, col, is_pressed, scancode_set, scan_code, &len);

I notice right away that is_pressed is a parameter. It's possible that the resultant scancode embeds a pressed/released bit inside. Let's look deeper.

static
enum ec_error_list
matrix_callback(
    int8_t row,
    int8_t col,
    int8_t pressed,
    enum scancode_set_list code_set,
    uint8_t *scan_code,
    int32_t *len
    ) {
    uint16_t make_code;
// ...
    scancode_bytes(make_code, pressed, code_set, scan_code, len);
// ...
}

Still plumbing through pressed...

static
void
scancode_bytes(
    uint16_t make_code,
    int8_t pressed,
    enum scancode_set_list code_set,
    uint8_t *scan_code,
    int32_t *len
    ) {
// ...
    if (pressed) {
        scan_code[(*len)++] = make_code;
    } else {
        scan_code[(*len)++] = 0xf0;
        scan_code[(*len)++] = make_code;
    }
// ...
}

Aha, my suspicion was correct! The scancode for a key being pressed is different than a key being released, so my attempted fix will never work.

A working fix

Instead of comparing the scancode directly (which won't work, because it's different when releasing a key), I can use the make_code value, which seems to more directly indicate the key without incorporating the pressed/released state.

Here's a link to a working fix. I also threw in a defensive measure for testing, to clear the typematic settings after the key has been repeating for N seconds straight. I built it, flashed it with ECTool, and my laptop keyboard behavior is now perfect.

Feedback for Framework Computers

This process was a real journey. My laptop was out of commission for like three months off and on while waiting for equipment or replacement parts. Framework did an amazing job with their customer service, but there are still things I'd like to see them do to make life easier for customers like me who want to customize it in the future.

Publish an official guide to safe EC firmware flashing. This probably requires also doing some of the other things in this list.
Populate the JECDB/JSWDB header on the mainboard out of the box, so that if we brick a laptop we can more easily debug and fix it.
Productize and sell a UART debugger expansion port card, like the one DHowett made.
Publish official pictures of the mainboard for reference.
Update the schematics to include all the components on the mainboard, for example all the 4 flash chips I found.
Add tags or branches for releases in the EC firmware repo, so we can know which commits are good places to make deltas on top of.
Allow certain kinds of modifications under warranty. I acknowledge this is a bit of a stretch though.

I'm hoping this blog post contains some useful information in this niche space, and that the flash chip dumps and mainboard pics I uploaded may help others who don't feel like taking apart their laptops.

Java and Console Character Encodings

knut — Sun, 26 Mar 2023 07:48:12 +0000

(Cross-posted from my main blog)

So I got nerd sniped by my buddy Snoopy the other daaaaay...

He's studying CS in Europe and is writing a program for an assignment where he has to input some characters from the command line on Windows and process them. The relevant part of the program is pretty simple. It's like this:

Scanner sc = new Scanner(System.in);
String input = sc.next();
for (int i = 0; i < input.length(); i++) {
    System.out.print(String.format("%02x", (int)input.charAt(i)));
    System.out.println();
}

So he runs it and enters a non-ANSI character: š (that's U+0161). The output he gave me is this:

>java PrintBytes
š
00

Now that's weird. I am pretty sure this is not a null character. I expected to see either a Unicode or UTF-8 representation of this. This was about the time I felt the uncontrollable urge to get involved.

Default Codepage Issues

I downloaded the JDK and tried it on my machine.

>java PrintBytes
š
73

Well, that's weird. Oh, my system codepage is set to Windows, whereas his was set to UTF-8. I used chcp to change it to 65001, which is UTF-8, and got the same odd zero result.

Redirected input from a file

Next test: what if I read the same input from a file instead?

>java PrintBytes < input.txt
c5
a1

Hey, that's correct. That's the UTF-8 representation of it. So something is weird with how Java is reading from an interactive command line compared to file input, even when both come through stdin.

How does Rust do it?

Next test, let's see how it does in Rust.

use std::io::Read;
fn main() {
    for b in std::io::stdin().bytes() {
        let val = b.unwrap();
        match val {
            0xd => println!(""),
            0xa => (),
            _ => println!("{:#02x}", val),
        }
    }
}

The output is good:

>target\debug\printbytes.exe
š
0xc5
0xa1

So Rust is doing it right interactively. The Rust code actually checks if stdin is currently a console handle and calls ReadConsoleW, otherwise calling ReadFile, which handles regular file I/O just fine.

Snoopy also tried writing the equivalent program in Python, and it also did it right. So Java seems to be doing something wrong under certain conditions... but what's the reason?

Finding the answer

A good starting point might be to check the Rust source. My first guess was that somewhere I'd see a call to ReadFile on the stdin handle, but instead I see the lowest level Windows call it makes is to a function I'm not familiar with, ReadConsoleW.

Reading the docs, it references something about ANSI compatibility:

ReadConsole reads keyboard input from a console's input buffer. It behaves like the ReadFile function, except that it can read in either Unicode (wide-character) or ANSI mode.

I found another link that gives a good comparison between ReadFile and ReadConsole. It confirms that ReadConsoleA (the ANSI version) only reads ANSI characters, but ReadConsoleW can read Unicode characters. Rust is reading Unicode characters (hopefully UTF-16 but I'm not really sure), then translating them internally into UTF-8, since its string type is natively UTF-8.

Confirming with C++

Easiest way to confirm was write a little C++ program, going straight to the source. In different modes it can try ReadFile or ReadConsoleW

uint16_t c;
if (argc == 1) {
    ReadFile(GetStdHandle(STD_INPUT_HANDLE), reinterpret_cast<uint8_t*>(&c), 1, nullptr, nullptr);
} else {
    DWORD numRead;
    ReadConsoleW(GetStdHandle(STD_INPUT_HANDLE), &c, 1, &numRead, nullptr);
}

printf("%04x\n", c);

First here's ReadFile mode:

>printbytes_c.exe
š
0000

And then ReadConsoleW mode:

>printbytes_c.exe -c
š
0161

U+0161 is the UTF-16 encoding of the character, so that seems to be showing some Unicode support. Interesting to note that ReadConsoleA showed the same behavior as ReadFile.

Conclusion

The behavior is a little unfortunate in Windows, but it seems to be fairly well documented. Most languages seem to be doing a proper job of handling this, but Java isn't. We can even see it in the debugger. I don't have proper symbols, but at least the top of the stack seems to resolve pretty clearly.

0:004> k
 # Child-SP          RetAddr               Call Site
00 00000016`03ffce28 00007fff`7157c7f4     KERNEL32!ReadFile
01 00000016`03ffce30 00007fff`7157bd76     java!handleRead+0x20
02 00000016`03ffce70 00007fff`71572641     java!JNI_OnLoad+0x196
03 00000016`03ffef00 00000171`9146a02e     java!Java_java_io_FileInputStream_readBytes+0x1d

So Java... do better. Have a way to properly handle Unicode interactive console input. Maybe it does...? A Java expert would probably know, but I can't find it on the Internet with any obvious searches. But also this problem is Windows-specific, so Windows... why you gotta be this way? In conclusion, computers are bad.

DEV Community: knut

Moving Math

Decoding Arbitrary Data in Wireshark

Rust, WASM, and LOK

Rust + WASM

The Rust/WASM interface

Undo

Checking the solution

Code Coverage

Mobile version

SPOILER WARNING for next section

Gathering Keywords

Conductors and Paths

Editing Cells

netcrab: a networking tool

Early Choices

Wrestling with user input

All these arguments

Customizing your sockets

Group talk expansion

More than one connection

Everything is sinks and streams

The router is also sinks and streams

Revisiting windbg remotes with hub mode

Introducing "channels" mode

Socket address to route address

Removing mpsc channel per socket

Removing mpsc channel for local input

What about UDP?

Listening and connecting

Iterators are too fast

How big a BytesMut do I want?

Conclusion

A Cautionary Rust Tale About IO Redirection

The reveal

Caution

Appendix: Better reader/writer thread

A Framework Laptop Hacking Story

The keyboard problem

Support response

Framework Hacking

Hacking with safety

Equipment for chip flashing

Finding the flash chip, part 1

Top-side 32 MB flash chip

Under-side flash chips

Detour - other problems

A good baseline

Flashing with ECTool

Debugging the keyboard fix

A working fix

Feedback for Framework Computers

Java and Console Character Encodings

Default Codepage Issues

Redirected input from a file

How does Rust do it?

Finding the answer

Confirming with C++

Conclusion

How big a `BytesMut` do I want?