DEV Community: djmitche

Review: Bazel in Depth (Richard Johnson)

djmitche — Thu, 28 May 2026 20:18:15 +0000

Bazel is a Google-derived build system with some interesting properties but a steep learning curve. In hopes of getting up that curve, I bought a few books recenly, among them Bazel in Depth: Definitive Reference for Developers and Engineers by Richard Johnson. I thought I might share my reactions in case they are helpful.

tl;dr: don't bother.

The text is meandering and repetitive. Every chapter has at least one multi-page superlative-laden discourse on hermetic builds. The chapter on dependency management spends many pages talking about lockfiles (which Bazel doesn't use).

The text rarely gets into details -- it is definitively not a reference, but more of a verbose treatise on build systems as a general concept.

When it does get into details, they're generally wrong. For example, some of the starlark examples use enddef to end a function definition. That's not a thing. The book mentions the @rule decorator for rule definitions. Nope. Apparently list comprehensions aren't allowed? They are, and are quite common. Apparently while is allowed? No, very much not.

I'm almost certain this was written by AI based on a skimpy outline, and charitably given the most cursory review. The language echoes all of the usual slop tropes---it's not just a book full of slop, it's a work of fictional genius! You will learn:

What Bazel might be - the answer will surprise you! Because it's wrong!
That everything comes in lists of three - not one, not two. Four is right out.
The --action_env flag - this revolutionary flag allows passing environment variables into the hermetic build process!

Clipboards, Terminals, and Linux

djmitche — Fri, 29 Dec 2023 00:00:45 +0000

I've recently switched to Neovim, and with it begun using the terminal mouse support. But, this has the side-effect that I can't just click-and-drag to select text in the terminal anymore -- Neovim controls that as well.

Which leads me to clipboards. Linux has two of them! Adding to the interest, I typically use Neovim remotely, via an SSH connection to a Tmux session. And on my Linux system, I use urxvt as my terminal program. All of these are very UNIX-y tools, and somehow they all need to play nicely together.

With this sort of challenge, typically the best approach is to learn as much as possible about the different parts, do some experimentation to see how they work in practice, and then try to stitch together a working solution. Sometimes Reddit posts or other information you come across can be useful, but as often as not it's wrong or outdated, so using it without understanding it might just make things more confusing.

In the interest of sharing, here's some of what I've learned.

Security

A brief word about security: clipboards often have passwords in them, especially if you use an external password manager like LastPass. So it's worth thinking carefully about what un-trusted software might be able to read from your clipboard.

Depending on what you do with the information in this post, you might enable any binary you run to access your clipboard from any host you SSH to. Maybe that's OK in your situation, but it is worth thinking about.

X Clipboards

X Windows has lots of "selections", but the two we'll be concerned with are PRIMARY and CLIPBOARD. Most times you select some text, that's automatically moved to the PRIMARY selection. When you specifically request copying a value (i.e., control-C), the value is put into the CLIPBOARD selection. Note that this means selecting some text and pressing Control-C will usually put that text in both selections!

The common xclip utility can be used to get or set clipboards. And, there's a tiny little utility script called clipnotify that I found really useful for debugging this. I cloned and built it, then ran it in a loop:

./clipnotify -l | while read x; do
  clear
  for s in primary clipboard; do
    echo == $s
    xclip -o -selection $s
    echo
  done
done

I ran this in a dedicated terminal so I could easily see what was in the X selections at all times.

OSC-52

OSC-52 is how information about clipboards is transmitted over a terminal. This is an ANSI terminal escape, so a bunch of bytes beginning with ESC that mean something to a terminal emulator. There are a bunch of OSC codes, but 52 is the one that handles clipboards. Documentation is a little hard to find, so I'll summarize how it works here.

Set Clipboard

Sending the sequence <ESC>]52;<board>;<content><BEL> to the terminal sets clipboard <board> to <content>. Here <ESC> is 0x1b and <BEL> is 0x07, sometimes written \e and \a, respectively. The <content> is base64-encoded. The <board> can be c (clipboard) or p (primary) on Linux, but things like MacOS only support c.

You can experiment with this with printf in the shell:

printf $'\e]52;c;%s\a' "$(echo foobar | base64)"

This will put "foobar" in your clipboard.

Get Clipboard

Sending the sequence <ESC>52;<board>;?<BEL> to the terminal will "query" the value of <board>. The terminal responds with a sequence like the "set clipboard' above, containing the content of the requested board.

⸩ printf $'\e]52;p;?\a' ; sleep 1; echo
^[]52;;Zm9vYmFyCg==^G

This makes a bit of a mess of the terminal, but you can see the response, and that base64 decodes to the value in the PRIMARY selection.

Advertising Support for OSC-52

Terminals advertise their support for functionality via some really arcane symbols. The particular piece that indicates clipboards are supported is Ms=\E]52%p1%s;%p2%s\007. This is typically found in the "terminfo" database, keyed by the terminal name, which an application finds in $TERM:

⸩ echo $TERM
rxvt-unicode-256color

Bracketed Paste Mode

While in principle "paste" into a terminal application can just pretend the pasted data was typed on a keyboard, in practice this can go very badly. For example, if pasting into Vim when it's not in insert mode, the result is usually a mess. Even in insert mode, if autoindent is enabled when you are pasting code, the indentation is not what's expected. Vim has a paste option for this, but in fact there's a better way: bracketed paste mode.

An application using the terminal "sets" this mode, after which the terminal will prefix pasted content with <ESC>[200~ and suffix it with <ESC>]200~.

You can see this with a quick little Python program:

import sys

sys.stdout.write('\x1b[?2004h') # set bracketed-paste mode
sys.stdout.flush()
try:
    while True:
        c = sys.stdin.read(1)
        if c:
            print(repr(c))
        else:
            break
finally:
    sys.stdout.write('\x1b[?2004l') # reset the mode

urxvt

The urxvt terminal emulator doesn't support OSC-52 out of the box. Honestly, it does very little out of the box! But there's a tiny script, 52-osc that adds this support. This is actually nice, because you can see what it does:

sub on_osc_seq {
    my ($term, $op, $args) = @_;
    return () unless $op eq 52;

    my ($clip, $data) = split ';', $args, 2;
    if ($data eq '?') {
        my $data_free = $term->selection();
        Encode::_utf8_off($data_free); # XXX
        $term->tt_write("\e]52;$clip;".encode_base64($data_free, '')."\a");
    }
    else {
        my $data_decoded = decode_base64($data);
        Encode::_utf8_on($data_decoded); # XXX
        $term->selection($data_decoded, $clip =~ /c/);
        $term->selection_grab(urxvt::CurrentTime, $clip =~ /c/);
    }

    ()
}

Breaking that down, it's hooking into the OSC sequences, and specifically number 52. If it gets a query ('?'), it gets the current selection from $term->selection() and sends that back to the app running in the terminal in an OSC-52 escape, using $term->tt_write(). Otherwise, it decodes the data and sets the X selection with $term->selection(data, clipboard). The /c/ is a regular expression matching c.

Putting the printf and clipnotify bits from above together shows that when <board> is c then this plugin updates the CLIPBOARD selection, otherwise PRIMARY. This still works over an SSH connection to a remote host.

Unfortunately, because this is accomplished with a plugin, the terminal info for urxvt doesn't advertise this support.

Copy and Paste

Urxvt has a built-in plugin, selection-to-clipboard, to copy every selection to the clipboard. In fact, it populates both the PRIMARY and CLIPBOARD selections.

Right-click will paste from the PRIMARY selection. With the following in .Xresources, shift-ctrl-V will paste from the CLIPBOARD selection.

Rxvt.keysym.Shift-Control-V: eval:paste_clipboard

Either paste option supports bracketed-paste mode.

tmux

I typically run tmux on remote systems, so that I can leave everything running while my laptop is asleep, or if I lose my network connection. Tmux is basically a terminal emulator that runs in a terminal: you can run other things in a tmux window, and tmux lets you switch between those windows, show them on different systems, etc. But, that means that tmux is intercepting terminal escape codes, whether they're for cursor positioning or clipboard management. It is then using (possibly different) escape codes to draw the tmux UI on your terminal.

In its default settings, the printf above won't do anything in tmux when running inside urxvt, because urxvt doesn't advertise support for it. That can be fixed in .tmux.conf:

set-option -ga terminal-override ',rxvt-uni*:XT:Ms=\E]52;%p1%s;%p2%s\007'

Note that this "fix" only works for things running in tmux. I do everything in tmux (even locally), so it's fine for me.

Buffers

Tmux has a concept of buffers which are named bits of text that can be injected into an application as if they were keyboard input. Tmux also has a "copy mode" where keyboard navigation can be used to select text that will be put into a buffer. The concept is pretty general, although I don't know what would be built from it.

Tmux is designed around the idea that you would only use buffers for copy/paste. That isn't especially practical, since for example web browsers tend not to be terminal applications!

Clipboard Integration

Tmux has a wiki page about clipboards. The main integration is set-clipboard. If this is "external", then tmux will issue an OSC-52 sequence to update the system clipboard whenever a buffer is set. If this is on, then tmux will additionally accept OSC-52 sequences from applications running inside it.

set-option -g set-clipboard on

With both of these options set, repeating the same experiment within a tmux session reveals that any OSC-52 sequence now updates only the PRIMARY selection. That is, the implementation of set-clipboard only updates the PRIMARY selection.

Furthermore, running the get-clipboard printf from above confirms that tmux only returns its own paste buffer -- it never queries the external terminal emulator for the value of the system clipboard.

Even tmux's support for external commands only supports copy operations. There's no way to feed data on the system clipboard into tmux.

Bracketed Paste

When an application in a tmux pane has bracketed paste enabled, tmux will enable it in the parent terminal. It will also "pass through" the brackets from that parent terminal when a paste is performed. Basically, it just works.

Tmux also has a -p option to its paste-buffer command to bracket the buffer contents.

Neovim

Neovim's :help clipboard lays out the situation pretty clearly, and it works quite nicely out of the box. It uses xsel to interact directly with X selections when $DISPLAY is set, and it uses tmux commands when running in tmux. Its checkhealth command makes it easy to see what it's chosen:

vim run locally, outside of tmux: xsel
vim run locally, in tmux: xsel
vim run via SSH, outside of tmux: no support
vim run via SSH, in tmux: tmux

Here, I'm not forwarding X11 via SSH. If I do so, and install xsel remotely, then Neovim will use xsel in all four of the above situations.

Tmux does enable bracketed paste mode.

Summary

All in all, this is not pretty!

Two clipboards (selections), handled slightly differently by each application -- so maybe copy in one app doesn't use the same selection that paste does in another. That's certainly more than ctrl-c/ctrl-v muscle memory can deal with, especially for those who also use more .. ahem .. user-friendly systems on a daily basis.

Most people think of having one clipboard.

Tmux is a pretty substantial impediment here, too, basically rendering tmux's buffers useless, since they can't mirror the "one clipboard".

At any rate, this continues my habit of writing posts as a way to learn about a new topic. I hope this information is useful to others as well!

Collaborating with CLs in Chromium

djmitche — Mon, 23 Oct 2023 19:11:22 +0000

EDITED to address issues with downstream CLs when the colleague's CL is updated.

I'm working closely with some of my colleagues on a project, meaning that we are frequently working on functionality that depends on CLs still in review, written by other people. The Chromium docs on how to handle this are out-of-date and contradictory, so in the fine blogging tradition of making a post so I don't forget something, here it is.

How do I make a CL that depends on another CL?

When you git cl upload, depot-tools looks in the Git history for your HEAD commit, until it finds another branch. If that branch has dependency information (unclear if that's in .git/config or a Change-Id: .. header in its commit message), then the upload will link the CLs.

First, get a local copy of the parent CL, and give it an appropriate branch name. This is my pattern, but it is only local do what suits you best!

git cl patch -b bug-$NUM-$USERNAME-$SUMMARY $GERRIT_URL

This may give an error about hashes not matching, but that's OK, since you won't be uploading to that CL. I suspect this has something to do with the committer name / email.

Then, make a new branch with that one as the upstream.

git new-branch --upstream-current bug-$NUM-$MY_SUMMARY

Make your changes. When you're ready to upload, do so as usual:

git cl upload --cp

That --cp is telling the command to not try to upload all of the other CLs in the stack.

Updating the Parent CL

When the CL you're depending on is updated, you need to use precisely the right formulation of git cl patch or things will go wildly off the rails.

Switch to the branch containing that parent CL and run

git cl patch --force --reapply

do not use --pull - I think that does something like pulling origin/main and rebasing on top of it? At any rate, it's a bad idea.

Chromium Spelunking: Connecting to Proxies

djmitche — Thu, 28 Sep 2023 20:39:57 +0000

Having successfully fetched a URL with the Chromium network stack, it's time to return to the reason I began this journey: how does Chromium connect to proxies?

Proxy Review

First, a quick review of HTTP proxying.

In the beginning, there was a simple protocol to proxy an HTTP transaction: connect to a proxy and, instead of sending just a path in the request line, send the entire URL. For example:

GET http://httbin.org/uuid HTTP/1.1

The proxy server then makes an outbound connection to httpbin.org, sends GET /uuid HTTP/1.1 with the headers supplied by the client, and then relays the response back to the client.

This sort of proxy exposes all of the details of the transaction to the proxy (or to anyone, if not using TLS for the connection to the proxy). If the origin URL has scheme https, the proxy will use TLS to connect to the origin, but will still handle the transaction content in cleartext. This has some obvious downsides, but an advantage is that the proxy can cache responses, saving bandwidth. This was a popular use of proxies in the aughts, but is far less common now that bandwidth is cheap and privacy is important.

An alternative method, CONNECT, was defined about 25 years ago. It creates a "tunnel" through the proxy which can carry arbitrary data to and from another host. The request looks like

CONNECT httpbin.org:443 HTTP/1.1

The proxy server makes a TCP connection to the given host and port, sends back a response header, and then forwards bytes bidirectionally without further analyzing them. In fact, that protocol doesn't have to be HTTP (but it must use TCP, which will become important later).

To test proxying,I'm using tinyproxy, running a very simple config on port 8080. This supports SPDY (HTTP/2), which is a complication I don't really want to consider at this point, but the analysis ends up quite similar to HTTP/1.

Configuring a Proxy

So, how does the network service decide to use a proxy for a request? It's complicated.

Let's start at the bottom. net::ProxyServer defines an actual server to connect to. It has a scheme, host, and port. One of those schemes is "DIRECT" which means do not use a proxy. The others include HTTP, HTTPS, SOCKS, and QUIC, and just describe how to connect to the proxy server.

net::ProxyList represents a list of ProxyServer instances and handles fallback from one to the next. This allows, for example, an enterprise to configure traffic to go via a local proxy but fall back to DIRECT when that proxy is down, such as when using a caching proxy.

Scoping out another level, net::ProxyConfig represents a configuration for when to use which proxies. This can be manually configured or can refer to a PAC script that defines the configuration.

The ProxyConfig comes from a net::ProxyConfigService, which is used by a net::ProxyResolutionService to determine the ProxyList to use for a given URL. There's a lot of complexity here that we won't get into, including auto-configuration of proxies and downloading and executing PAC scripts, but the end result is a ProxyList.

For churl, we want to hard-code a proxy, so I updated the ProxyConfigService I defined earlier in this series to return a config pointing to a proxy server:

class ProxyConfigServiceHardCoded : public net::ProxyConfigService {
 public:
  // ProxyConfigService implementation:
  void AddObserver(Observer* observer) override {}
  void RemoveObserver(Observer* observer) override {}
  net::ProxyConfigService::ConfigAvailability GetLatestProxyConfig(
      net::ProxyConfigWithAnnotation* config) override {
    auto traffic_annotation = kHardCodedProxyTrafficAnnotation;
    auto proxy_config = net::ProxyConfig::CreateDirect();
    auto& proxy_rules = proxy_config.proxy_rules();
    proxy_rules.ParseFromString("localhost:8080");
    *config = net::ProxyConfigWithAnnotation(proxy_config, traffic_annotation);
    return CONFIG_VALID;
  }
};

Tracing a Simple Request

OK, let's see how this works. My strategy for this sort of investigation is to add lots of debugging output -- at the beginning of each relevant function, and sometimes at key points in longer functions. Then I can follow execution within the source, comparing to the debugging output. I prefer this approach over using a debugger like GDB because I find it more efficient. Use the tools you prefer!

I'll be making a request to https://ip.cow.org using tinyproxy running on http://localhost:8080. Because the origin URL is https, this should use CONNECT.

HTTP Cache

We saw in previous posts that the URLRequest ends up using an HttpTransactionFactory::CreateTransaction to create an HttpCache::Transaction, and starting it. That class has a rather large number of states, but adding some logging in DoLoop shows the sequence of states (formatted to fit your screen):

HttpCache::Transaction -> STATE_GET_BACKEND
HttpCache::Transaction -> STATE_GET_BACKEND_COMPLETE
HttpCache::Transaction -> STATE_INIT_ENTRY
HttpCache::Transaction -> STATE_OPEN_OR_CREATE_ENTRY
HttpCache::Transaction -> STATE_OPEN_OR_CREATE_ENTRY_COMPLETE
HttpCache::Transaction -> STATE_ADD_TO_ENTRY
HttpCache::Transaction -> STATE_ADD_TO_ENTRY_COMPLETE
HttpCache::Transaction -> STATE_SEND_REQUEST
[0925/201309.849178:ERROR:churl_bin.cc(89)] OnConnected
HttpCache::Transaction -> STATE_SEND_REQUEST_COMPLETE
HttpCache::Transaction -> STATE_FINISH_HEADERS
HttpCache::Transaction -> STATE_SUCCESSFUL_SEND_REQUEST
HttpCache::Transaction -> STATE_OVERWRITE_CACHED_RESPONSE
HttpCache::Transaction -> STATE_CACHE_WRITE_RESPONSE
HttpCache::Transaction -> STATE_CACHE_WRITE_RESPONSE_COMPLETE
HttpCache::Transaction -> STATE_TRUNCATE_CACHED_DATA
HttpCache::Transaction -> STATE_TRUNCATE_CACHED_DATA_COMPLETE
HttpCache::Transaction -> STATE_PARTIAL_HEADERS_RECEIVED
HttpCache::Transaction -> STATE_FINISH_HEADERS
HttpCache::Transaction -> STATE_FINISH_HEADERS_COMPLETE
[0925/201310.106633:ERROR:churl_bin.cc(165)] OnResponseStarted
[0925/201310.106669:ERROR:churl_bin.cc(171)] Got HTTP response code 200
HttpCache::Transaction -> STATE_NETWORK_READ_CACHE_WRITE
HttpCache::Transaction -> STATE_NETWORK_READ_CACHE_WRITE_COMPLETE
HttpCache::Transaction -> STATE_NETWORK_READ_CACHE_WRITE
HttpCache::Transaction -> STATE_NETWORK_READ_CACHE_WRITE_COMPLETE

So this provides a map to focus on what's going on in this particular request. Looking at the code, the backend- and entry-related items are just looking for values in the cache, of which there are none. The OnConnected callback in the URLRequest delegate occurs during the STATE_SEND_REQUEST segment. So let's dig in there.

HttpCache::Transaction::DoStartRequest calls cache_->network_layer_->CreateTransaction(..). That network_layer_ is another HttpTransactionFactory. The code-search for that class shows a few classes that extend it, and a little debug printing reveals that this layer is an HttpNetworkLayer, and CreateTransaction returns an HttpNetworkTransaction.

HTTP Network Layer

This class also has a large collection of states, but the same trick helps us find the right one:

HttpNetworkTransaction -> STATE_NOTIFY_BEFORE_CREATE_STREAM
HttpNetworkTransaction -> STATE_CREATE_STREAM        
HttpNetworkTransaction -> STATE_CREATE_STREAM_COMPLETE
HttpNetworkTransaction -> STATE_CONNECTED_CALLBACK     
[0925/205926.726872:ERROR:churl_bin.cc(89)] OnConnected
HttpNetworkTransaction -> STATE_CONNECTED_CALLBACK_COMPLETE
HttpNetworkTransaction -> STATE_INIT_STREAM
HttpNetworkTransaction -> STATE_INIT_STREAM_COMPLETE
HttpNetworkTransaction -> STATE_GENERATE_PROXY_AUTH_TOKEN
HttpNetworkTransaction -> STATE_GENERATE_PROXY_AUTH_TOKEN_COMPLETE
HttpNetworkTransaction -> STATE_GENERATE_SERVER_AUTH_TOKEN
HttpNetworkTransaction -> STATE_GENERATE_SERVER_AUTH_TOKEN_COMPLETE
HttpNetworkTransaction -> STATE_INIT_REQUEST_BODY
HttpNetworkTransaction -> STATE_INIT_REQUEST_BODY_COMPLETE
HttpNetworkTransaction -> STATE_BUILD_REQUEST 
HttpNetworkTransaction -> STATE_BUILD_REQUEST_COMPLETE
HttpNetworkTransaction -> STATE_SEND_REQUEST
HttpNetworkTransaction -> STATE_SEND_REQUEST_COMPLETE
HttpNetworkTransaction -> STATE_READ_HEADERS
HttpNetworkTransaction -> STATE_READ_HEADERS_COMPLETE
[0925/225438.616076:ERROR:churl_bin.cc(165)] OnResponseStarted
[0925/225438.616113:ERROR:churl_bin.cc(171)] Got HTTP response code 200
HttpNetworkTransaction -> STATE_READ_BODY
HttpNetworkTransaction -> STATE_READ_BODY_COMPLETE
HttpNetworkTransaction -> STATE_READ_BODY
HttpNetworkTransaction -> STATE_READ_BODY_COMPLETE

The STATE_CONNECTED_CALLBACK is just calling the OnConnected callback. The more interesting bit is in the states before that, STATE_CREATE_STREAM(_COMPLETE). The important bit of this state seems to be calling HttpStreamFactory::RequestStream. This calls back through a delegate method named OnStreamReady, and in this case HttpStreamFactory itself is the delegate. Adding a debug print in its OnStreamReady method shows that used_proxy_info includes localhost:8080, so that stream involves the proxy.

HTTP Stream

The HttpStreamFactory class has three nested helper classes: JobFactory, Job, and JobController. The HttpStreamFactory::JobFactory class is trivial: it has a CreateJob method that calls the Job constructor. I suspect this was done as a kind of dependency injection to support testing the JobController.

HttpStreamFactory::JobController is a bit more interesting: it has a small state machine that simply resolves the proxy and then creates some jobs. The proxy resolution simply calls the ProxyResolutionService described above. Some debug prints confirm that this returns a ProxyInfo containing localhost:8080.

The job controller manages several jobs that run in parallel, implementing the "happy eyeballs" that I mentioned in the Life and Times post. All of these jobs are implemented with the same class. I would have expected different job subclasses per job type. Anyway, since we're not using QUIC or pre-connecting or any of that stuff, we'll just focus on the "MAIN" job.

Among many parameters, the HttpStreamFactory::Job constructor takes a ProxyInfo, so we can look for where that is used.

HttpStreamFactory::Job -> STATE_START                                                                                                  
HttpStreamFactory::Job -> STATE_WAIT
HttpStreamFactory::Job -> STATE_WAIT_COMPLETE
HttpStreamFactory::Job -> STATE_INIT_CONNECTION
HttpStreamFactory::Job -> STATE_INIT_CONNECTION_COMPLETE
HttpStreamFactory::Job -> STATE_CREATE_STREAM
HttpStreamFactory::Job -> STATE_CREATE_STREAM_COMPLETE

The STATE_WAIT(_COMPLETE) states are related to the job controller's coordination of multiple parallel jobs. The interesting bit is STATE_INIT_CONNECTION, in the HttpStreamFactory::Job::DoInitConnectionImpl method. This method embodies dozens of concerns -- in my opinion this a perfect example of how not to implement something like this. But, ignoring QUIC, SPDY, WebSockets, TLS, PRECONNECT, and all the rest, it comes down to a call to InitSocketHandleForHttpRequest, passing along the ProxyInfo and a plethora of additional arguments.

Let's take a moment here to notice the shift from deeply nested Java-style factories and controllers to a plain old C-style function. There's probably some interesting history here, perhaps in who wrote which bits of this code, or when they were written.

HTTP Connection Pools

InitSocketHandleForHttpRequest, or more accurately InitSocketPoolHelper, gets a pool from the current HttpNetworkSession with session->GetSocketPool(socket_pool_type, proxy_info.proxy_server()). In this case the socket_pool_type is NORMAL_SOCKET_POOL, so this amounts to a call to ClientSocketPoolManagerImpl::GetSocketPool passing the proxy server through which the connection should be made (which might be ProxyServer::Direct() when not using a proxy). The function also creates a ClientSocketPool::GroupId built from the endpoint URL (ip.cow.org in this case) and a few partitioning parameters.

Summarizing, then, the HttpNetworkSession stores a connection pool for each proxy server (including direct), and within each pool indexes connections by group ID.

When a socket in a socket pool is claimed, that claim is represented by a ClientSocketHandle, an empty instance of which is among the parameters to InitSocketHandleForHttpRequest, which calls ClientSocketHandle::Init(..).

This Init method calls the pool's RequestSocket method. There are two implementations of this method, one of which is for WebSockets, so in this case we're calling TransportClientSocketPool::RequestSocket and on to TransportClientSocketPool::RequestSocketInternal. Assuming that there are no existing connections in the pool, and there are free slots to create new connections, this makes a new connection.

Creating a Connection

This occurs with another set of jobs and job factories, this time with subclasses.

ClientSocketPool::CreateConnectJob uses a ConnectJobFactory to create a ConnectJob, passing along the origin URL (endpoint) and proxy server.

ConnectJobFactory::CreateConnectJob examines the proxy server and, in the case that it's not direct (and HTTP-like, not SOCKS) defers to an HttpProxyConnectJob::Factory, which simply creates an HttpProxyConnectJob, a subclass of ConnectJob. This, too, has a large set of states, although only a few are used in this situation:

HttpProxyConnectJob -> STATE_BEGIN_CONNECT
HttpProxyConnectJob -> STATE_TRANSPORT_CONNECT
HttpProxyConnectJob -> STATE_TRANSPORT_CONNECT_COMPLETE                                                                                                                                                             
HttpProxyConnectJob -> STATE_HTTP_PROXY_CONNECT
HttpProxyConnectJob -> STATE_HTTP_PROXY_CONNECT_COMPLETE

Checking the implementation of those states, STATE_TRANSPORT_CONNECT involves creating a TransportConnectJob (since the connection to localhost:8080 is not using HTTPS). But at this point my head is starting to spin at the number of nested "jobs", so I'll stop here and assume that TransportConnectJob::Connect does what it says on the tin: connects to the host (the proxy server) specified in the HttpProxySocketParams.

Initializing the Connection

The next state, STATE_HTTP_PROXY_CONNECT, wraps the socket returned from the TransportConnectJob in an HttpProxyClientSocket and calls its Connect method. And no surprise, there's another state machine here:

HttpProxyClientSocket -> STATE_GENERATE_AUTH_TOKEN
HttpProxyClientSocket -> STATE_GENERATE_AUTH_TOKEN_COMPLETE
HttpProxyClientSocket -> STATE_SEND_REQUEST
HttpProxyClientSocket -> STATE_SEND_REQUEST_COMPLETE
HttpProxyClientSocket -> STATE_READ_HEADERS
HttpProxyClientSocket -> STATE_READ_HEADERS_COMPLETE

We're not using proxy authentication (which is an additional complication sprinkled evenly over this entire stack!), so the interesting state here is STATE_SEND_REQUEST. This calls out to the ProxyDelegate, if one is configured, and then calls ProxyClientSocket::BuildTunnelRequest which finally does something recognizable: creates a "CONNECT" request line, with the host and port for the endpoint (so, CONNECT ip.cow.org:443 HTTP/1.1 in this example).

The next state is STATE_READ_HEADERS, which reads the response from the proxy. If that's a 200 OK, then the socket is connected through the proxy and to the endpoint, and from here on out can be treated just like a socket connected directly to the endpoint.

Re-Surfacing

So, let's trace the result back up through the stack. The interleaving of the logging added above helps quite a bit here:

HttpProxyConnectJob -> STATE_HTTP_PROXY_CONNECT_COMPLETE
HttpStreamFactory::Job -> STATE_INIT_CONNECTION_COMPLETE
HttpStreamFactory::Job -> STATE_CREATE_STREAM
HttpStreamFactory::Job -> STATE_CREATE_STREAM_COMPLETE
HttpNetworkTransaction -> STATE_CREATE_STREAM_COMPLETE
HttpNetworkTransaction -> STATE_CONNECTED_CALLBACK
[0926/173834.596927:ERROR:churl_bin.cc(89)] OnConnected

STATE_HTTP_PROXY_CONNNECT_COMPLETE calls the parent class's SetSocket to use the socket prepared earlier.

HttpStreamFactory::Job gets the result wrapped in a ClientSocketHandle. As always, it handles a half-dozen concerns in STATE_INIT_CONNECTION_COMPLETE, then wraps that in an HttpBasicStream in STATE_CREATE_STREAM.

The HttpNetworkTransaction STATE_CREATE_STREAM_COMPLETE then calls the OnConnected callback, which results in a debug log message in churl.

From that point, there's no further special handling of proxies -- this is a socket carrying an HTTP stream, like any other.

What's Next

This will be the last post on this topic -- I've learned the things I wanted to learn already. However, there are certainly more things to explore:

What happens when making a simple proxy request, rather than tunneled?
What happens when a proxy tunnel fails?
What happens when a proxy requires authentication and the browser must prompt the user?
What happens when a proxy implements QUIC?

All of these are handled somewhere in the stack, but I've skipped over them to try to reduce the breadth of knowledge I had to understand. And it was still quite broad!

ffizz: Build a Beautiful C API in Rust

djmitche — Tue, 20 Jun 2023 01:56:24 +0000

Foreign Function Interface, FFI, is an umbrella term for interfacing between programming languages. Most languages support a way to interface with C: C-style function calls, C-compatible memory layouts for data types, and so on. Interfacing two languages that are not C -- for example, Python to Rust -- typically involves gluing both languages together with some C code.

The choice of C as the lingua franca for communication between modern programming languages is, I think, one of the great tragedies of the history of computing.

Rust FFI Today

Rust supports two kinds of FFI: calling into Rust from another language; and calling into another language from Rust. Most of the thought and tooling that exists right now is organized around the second kind. For example, bindgen is a popular tool that generates useful Rust wrappers from a C or C++ header file.

The tooling for the first kind -- calling Rust from another language -- is a bit less developed, and tends to rely on code generation that doesn't necessarily produce a natural C API. cbindgen, uniffi, cxx, and Diplomat all take this course.

Natural C APIs

It gets a bad reputation, but C can actually be a pleasure to write, when using a nicely designed API. For example, libcurl provides a C API to support making HTTP requests from C. It's carefully and thoughtfully designed to minimize surprises and make correct usage easy. See, for example, curl_slist_append, a succinct, efficient tool for creating lists of strings to pass to the API.

I don't know of any authoritative document, but in my experience good C APIs have a few properties:

Allocate and free functions, for each type. In libcurl, these are curl_easy_init and curl_easy_cleanup. A C programmer will know that they must allocate a new object before using it, and that once that object is freed, it cannot be used again.
An "owner" for every allocated object, responsible for freeing it and making sure it isn't freed while still in use. Ownership semantics are usually described in comments.
Integer return values with negative numbers signaling an error and zero or positive values indicating success.
Selective use of "output parameters" to support functions that have multiple results. For example, int query_execute(query_t *query, rowset_t **rowset_out) probably returns a negative error or positive number of rows matching the query, and writes a pointer to a newly allocated rowset_t in *rowset_out.
Clear documentation of thread safety: what functions can be called concurrently, and what data structures can be accessed from multiple threads.

In general, the Rust FFI tools mentioned above do not generate a very natural C API. At best, they generate a C interface to the Rust API, with the expectation that the C developer will understand both the Rust API and how it is represented in C.

Going it Alone

The alternative is to forget about the tools and create the perfect API by hand. This is not easy!

You'll need to write a C header file, complete with types, function declarations, and extensive documentation comments.

Then you'll need to implement those functions in Rust with extern "C", and write Rust struct definitions to match the C types. Careful: nothing verifies that the header declarations and Rust function and type signatures match, and type layouts differ across architectures.

You'll also need to ensure that the C API does not create undefined behavior for the Rust code. All of those extern "C" functions are unsafe, after all. This usually involves writing clear but concise instructions in the header, and then convincing yourself that any C code satisfying those instructions maintains the invariants of the Rust code.

I set out on this course with taskchampion-lib about three years ago. It quickly became clear that some tooling would help.

Ffizz

Thus was born ffizz. This is a collection of tools for building natural C APIs in Rust.

The simplest is ffizz-header, which supports building a header file from doc comments in the Rust source. While it's still up to the API designer to ensure that the C and Rust function declarations match, that's much easier when they are just a few lines apart in the same source file.

Strings are a very common data type, and Rust and C handle them differently, so passing strings back and forth can be a lot of work and is an easy place to introduce bugs.
The ffizz-string crate supplies a FzString type that manages conversion between types on demand, eliminating a lot of boilerplate and providing simple safety requirements.

Finally, ffizz-passby provides utility types to handle common methods of passing data across C API boundaries:

Pass-by-value: values are copied as necessary when passed to or returned from functions. This is typically used for Rust types that implement Copy.
Boxed objects: values that are allocated and freed and always referenced by a pointer. In Rust, these use Box<T>, while C uses a raw pointer.
Unboxed objects: values that can be placed on the stack or in another struct. This storage location must be initialized before it is used, and is typically passed to Rust functions as a raw pointer. When the caller is finished with the value, it must be released, leaving the storage uninitialized again.

As an example, FzString is an unboxed object: it is up to the C caller to allocate enough space to store the value (which may contain a pointer and two 64-bit integers), initialize the space, and finally call a function like fz_string_free to free the associated memory when the value is no longer needed.

Give It a Try!

If you've worked on this sort of interface before, please give ffizz a try, even if just experimentally. I've tried to solve general problems and not just what I needed for Taskchampion, but surely my imagination has come up short somehow. Let me know! Leave an issue on GitHub, or ping me on Mastodon at @djmitche@mastodon.social.

Chromium Spelunking: The IO Thread

djmitche — Tue, 06 Jun 2023 15:18:28 +0000

The error at the end of the last post looks like this (I've omitted further lines of backtrace as they're not relevant):

[0605/175824.724186:ERROR:churl_bin.cc(190)] URLRequestContext created: 0x7fb224002df0
[0605/175824.725417:ERROR:churl_bin.cc(198)] calling start     
[0605/175824.730763:ERROR:churl_bin.cc(200)] started     
[0605/175824.734067:FATAL:current_thread.cc(197)] Check failed: sequence_manager. 
#0 0x7fb23107ca8c base::debug::CollectStackTrace()
#1 0x7fb2310332da base::debug::StackTrace::StackTrace()                                                
#2 0x7fb231033295 base::debug::StackTrace::StackTrace()       
#3 0x7fb230d575f9 logging::LogMessage::~LogMessage()           
#4 0x7fb230d02bac logging::(anonymous namespace)::DCheckLogMessage::~DCheckLogMessage()
#5 0x7fb230d02bd9 logging::(anonymous namespace)::DCheckLogMessage::~DCheckLogMessage()
#6 0x7fb230d028bd logging::CheckError::~CheckError() 
#7 0x7fb230ed6cc2 base::CurrentIOThread::Get()     
#8 0x7fb232194f2d net::SocketPosix::Connect()                                                            
#9 0x7fb232199906 net::TCPSocketPosix::Connect()

Looking at the failing DHCECK:

CurrentIOThread CurrentIOThread::Get() {
  auto* sequence_manager = GetCurrentSequenceManagerImpl();
  DCHECK(sequence_manager);
  DCHECK(sequence_manager->IsType(MessagePumpType::IO));
  return CurrentIOThread(sequence_manager);
}

suggests that this is complaining that it's not running in the IO thread. That seems reasonable -- SocketPosix::Connect is, indeed, an IO operation. URLRequest is a pretty low-level tool, and other components are generally expected to use URLFetcher instead. Still, it's notable that URLRequest documents that all uses must be in the same thread but not that the thread must be the IO thread.

OK, Run It On the IO Thread

I suppose the easiest thing to do is run the whole Fetch method on the IO thread. It took me some time to figure out how to create an IO thread. I tried using TestIOThread but it's not defined in non-test cases. However, its implementation is simple enough that I can just duplicate it:

int main(int argc, char *argv[]) {
  // ...
  base::Thread io_thread("IO Thread");
  CHECK(io_thread.StartWithOptions(
      base::Thread::Options(base::MessagePumpType::IO, 0)));
  io_thread.task_runner()->PostTask(
      FROM_HERE, base::BindOnce(&Churl::Fetch, base::Unretained(&churl)));
  base::PlatformThread::Sleep(base::Seconds(5));                                                                                                                                                                   }

Amazingly, the request completes!

[0605/193023.588918:ERROR:churl_bin.cc(68)] OnConnected
[0605/193025.549050:ERROR:churl_bin.cc(144)] OnResponseStarted
[0605/193025.549318:ERROR:churl_bin.cc(154)] Read completed immediately
[0605/193025.549342:ERROR:churl_bin.cc(170)] OnReadCompleted with 281 bytes_read
[0605/193025.549373:ERROR:churl_bin.cc(172)] GOT: {
  "args": {}, 
  "headers": {
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "User-Agent": "Dustin's Experiment", 
    "X-Amzn-Trace-Id": "Root=1-647e37cf-251697116275c21857a3f844"
  }, 
  "origin": "34.86.234.200", 
  "url": "http://httpbin.org/get"
}

So that's pretty cool! 🥳

A Bad Ending

Unfortunately, once that 5-second sleep expires:

[0605/193351.314239:FATAL:thread_checker.cc(21)] Check failed: checker.CalledOnValidThread(&bound_at). 
#0 0x7fa54a87ca8c base::debug::CollectStackTrace()
#1 0x7fa54a8332da base::debug::StackTrace::StackTrace()
#2 0x7fa54a833295 base::debug::StackTrace::StackTrace()
#3 0x7fa54a5575f9 logging::LogMessage::~LogMessage()
#4 0x7fa54a502bac logging::(anonymous namespace)::DCheckLogMessage::~DCheckLogMessage()
#5 0x7fa54a502bd9 logging::(anonymous namespace)::DCheckLogMessage::~DCheckLogMessage()
#6 0x7fa54a5028bd logging::CheckError::~CheckError()
#7 0x7fa54a7b9029 base::ScopedValidateThreadChecker::ScopedValidateThreadChecker()
#8 0x7fa54b89bdeb net::URLRequest::~URLRequest()
#9 0x7fa54b89c229 net::URLRequest::~URLRequest()
#10 0x55723b34422c std::__Cr::default_delete<>::operator()()
#11 0x55723b34414a std::__Cr::unique_ptr<>::reset()
#12 0x55723b343759 std::__Cr::unique_ptr<>::~unique_ptr()
#13 0x55723b3415aa Churl::~Churl()
#14 0x55723b340ccd main

So main is returning, at which point it destroys the Churl instance, which is destroying the URLRequest instance, which was created on the IO thread. Chromium generally expects objects to be used (and destroyed) in a single thread, and that's what the thread-checker is checking.

I suspect that one way to fix this would be to refactor things so that the Churl instance is destroyed on the IO thread. But that actually doesn't sound like a very interesting challenge -- it's probably easiest in this experiment to just call exit(0) when the fetch is done (in OnReadComplete).

By the way, I wondered if I could drop the thread pool now that everything's in the IO thread. But, no, at least one thing involved in fetching a URL requires a thread pool - net::RunHaveOnlyLoopbackAddressesJob. So, we'll need both.

What's Next?

So, we've successfully fetched a URL (and learned some stuff along the way). If you'd like to see the code, it's in this CL. It's not pretty, and please don't emulate it, but maybe it helps clarify something in one of these posts that wasn't clear.

Next up, I'd like to start learning how proxies are used in this process. That will involve a new delegate, at least. Once I've got a basic HTTP proxy working, I'll want to experiment with various HTTP versions inside and outside of the proxy (HTTP/2 over HTTP/3, etc.).

Chromium Spelunking: A Stuck Task

djmitche — Mon, 05 Jun 2023 18:03:53 +0000

The last installment was a bit of a detour from our task -- fetching a URL. But, it was a deliberate one, to learn about things I didn't understand well enough. Before that, I had gotten churl up to the point where it crashed because its delegate was NULL. So, let's give it a delegate!

Delegates

Delegates are a common pattern in Chromium, forming a way for lower-level code to interact with higher-level code. It's a kind of dependency inversion. For example, an embedder like Chrome or Fuchsia provides a delegate (several, in fact) to the content layer, which that content layer uses to call back for embedder-related features like autofill.

In this case, we're providing a delegate to the URLRequest, a subclass of net::URLRequest::Delegate. Happily, most of the methods have default implementations, and in fact the only required method is OnReadCompleted. My delegate overrides that method as well as OnConnected and OnResponseReceived, just to log when those events occur. In fact, I copied from TestDelegate, including the buf_ stuff and the Read calls.

So, at this point I have a delegate:

class ChurlDelegate : public net::URLRequest::Delegate {
 public:
  ChurlDelegate()
    : buf_(base::MakeRefCounted<net::IOBuffer>(kBufferSize)) {}
  ~ChurlDelegate() override {}
  int OnConnected(net::URLRequest* request,
                  const net::TransportInfo& info,
                  net::CompletionOnceCallback callback) override {
    DLOG(ERROR) << "OnConnected";
    return net::URLRequest::Delegate::OnConnected(request, info, std::move(callback));
  }

  void OnResponseStarted(net::URLRequest* request, int net_error) override {
    DLOG(ERROR) << "OnResponseStarted";
    if (net_error != net::OK) {
      DLOG(ERROR) << "Error: " << net_error;
      request->Cancel();
      return;                                                                                                                                                                                                      
    }
    // TODO: print response and headers?

    int bytes_read = request->Read(buf_.get(), kBufferSize);
    if (bytes_read >= 0) {
      DLOG(ERROR) << "Read completed immediately";
      OnReadCompleted(request, bytes_read);
    } else if (bytes_read != net::ERR_IO_PENDING) {
      DLOG(ERROR) << "Error from READ: " << bytes_read;
      request->Cancel();
      return;
    }
  }

  void OnReadCompleted(net::URLRequest* request, int bytes_read) override {
    DLOG(ERROR) << "OnReadCompleted with " << bytes_read << " bytes_read";
    do {
      DLOG(ERROR) << "GOT: " << buf_->data();                                                                                                                                                                      
      bytes_read = request->Read(buf_.get(), kBufferSize);
    } while (bytes_read > 0);
  }
 private:
  scoped_refptr<net::IOBuffer> buf_;
};

and the Churl class:

class Churl {
 public:
  void Fetch() {
    net::URLRequestContextBuilder url_request_context_builder;
    url_request_context_builder.set_user_agent("Dustin's Experiment");
    url_request_context_builder.set_proxy_config_service(
      std::make_unique<ProxyConfigServiceDirect>());

    auto url_request_context = url_request_context_builder.Build();

    auto url_request = url_request_context->CreateRequest(
        GURL("http://httpbin.org/get"), // TODO: get this from command line
        net::RequestPriority::HIGHEST,
        &delegate_,                                                                                                                                                                                                
        net::NetworkTrafficAnnotationTag(net::MutableNetworkTrafficAnnotationTag(TRAFFIC_ANNOTATION_FOR_TESTS)),
        false);
    DLOG(ERROR) << "calling start";
    url_request->Start();
    DLOG(ERROR) << "started";
  }

 private:
  std::unique_ptr<net::URLRequestContext> url_request_context_;
  ChurlDelegate delegate_;
};

and all of this created in main with

  Churl churl;
  base::ThreadPool::CreateSingleThreadTaskRunner({})->PostTask(
      FROM_HERE, base::BindOnce(&Churl::Fetch, base::Unretained(&churl)));

Note that the delegate is passed to CreateRequest as a raw pointer. It's up to me to make sure that the delegate continues to exist for the duration of that request, and it will crash otherwise. Ask me how I know!

I put it into an instance variable in Churl, since that Churl instance lives until the end of main. The receiver for the Fetch method needs to be included in the task callback, and that's done with base::Unretained to indicate that the callback does not "own" the pointer.

An Easy Fix

So, the first run fails with this:

[0525/191840.310712:FATAL:command_line.cc(247)] Check failed: current_process_commandline_.

A little exploring of other "utility" tools like root_store_tool suggests

  base::CommandLine::Init(argc, argv);

And indeed, this gets things running -- sort of.

[0530/190907.523753:ERROR:churl_bin.cc(190)] URLRequestContext created: 0x7f4874003780
[0530/190907.524119:ERROR:churl_bin.cc(198)] calling start    
[0530/190907.526291:ERROR:churl_bin.cc(200)] started

That's it -- no logging from OnConnected. Maybe not so easy?

Finding a Stuck Task

I am sort of old-fashioned, and I'd rather debug things by running them and watching the output, than using fancy tools like lldb. My usual tactic is to add lots of debug logging (in Chromium, DLOG(ERROR) << "got here 1") to narrow down where a problem occurs. In this case, Start() started things off, but got "stuck" somewhere.

Looking at URLRequest::Start, it creates a new job with a job factory, and I can verify that it does, indeed call StartJob(..) with this new job by adding a debug print after that line.

That job is a subclass of URLRequestJob, but which subclass? Codesearch gives a list of subclasses, and URLRequestHttpJob seems a likely suspect. Adding some debug prints there confirms this guess. URLRequestJob then calls job->Start().

URLRequestHttpJob does its work in a series of callback tasks. Each possible bit of work is a separate method, and any conditionals or loops are accomplished "manually" by conditionally calling or posting a task for a different method. It's almost like assembly language, tracing out all of the conditional jumps and control-flow cycles. In this case, adding debug prints to the beginning of each method quickly shows that StartTransactionInternal is the last method called.

`HttpRequestJob::Start`

The StartTransactionInternal method seems to set up a transaction and call its Start method with a callback bound to URLRequestHttpJob::OnStartCompleted. If the Start method finishes immediately, it calls OnStartCompleted directly. This answers my question from the first post: when a function takes a callback, but does not return ERR_IO_PENDING, it also does not invoke the callback and expects its caller to do so. This seems like it would be a source of bugs, if one of those two control-flow paths is rarely taken.

Anyway, OnStartCompleted is never called, and one more debug print shows that Start is returning ERR_IO_PENDING so Start should be calling it.

I briefly wondered if this was another problem with threads and tasks, where the process was exiting before the task had a chance to run. Maybe FlushForTesting sees no tasks running while some kind of IO is pending, and returns? I added a sleep in the main task (base::PlatformThread::Sleep(base::Seconds(5))) and OnStartCompleted still did not run, so it's nothing quite that simple.

So let's dive one layer deeper. URLRequestHttpJob is using a factory to make an instance of an interface, in this case an HttpTransactionFactory creating an HttpTransaction.There's no easy way to print the class name of an object, but codesearch can (usually) list the subclasses of an interface, so a bit of guess-and-check works. My first guess was an HttpNetworkTransaction, but a debug print in its Start method did not appear. Then I remembered that the HttpCache is a "read-through" cache, and a debug print confirms that the HttpTransaction is an HttpCache::Transaction.

`HttpTransaction::Start`

The HttpCache::Transaction class uses the DoLoop pattern. I used a quick vim macro to print the states as they occurred:

[0525/195452.887052:ERROR:http_cache_transaction.cc(861)] STATE_GET_BACKEND
[0525/195452.887096:ERROR:http_cache_transaction.cc(866)] STATE_GET_BACKEND_COMPLETE
[0525/195452.887189:ERROR:http_cache_transaction.cc(870)] STATE_INIT_ENTRY
[0525/195452.887230:ERROR:http_cache_transaction.cc(875)] STATE_OPEN_OR_CREATE_ENTRY
[0525/195452.887406:ERROR:http_cache_transaction.cc(880)] STATE_OPEN_OR_CREATE_ENTRY_COMPLETE
[0525/195452.887450:ERROR:http_cache_transaction.cc(902)] STATE_ADD_TO_ENTRY

So, it's stuck after DoAddToEntry. More debug prints there show that it's returning the rv from cache_->AddTransactionToEntry. cache_ is an HttpCache, so off we go!

`HttpCache::AddTransactionToEntry`

  // Adds a transaction to an ActiveEntry. This method returns ERR_IO_PENDING
  // and the transaction will be notified about completion via its IO callback.
  // In a failure case, the callback will be invoked with ERR_CACHE_RACE.
  int AddTransactionToEntry(ActiveEntry* entry, Transaction* transaction);

So, what's the transaction's "IO callback" in this case? It has an io_callback_ property, and elsewhere in http_cache.cc I see transaction->io_callback().Run(OK) so at a guess that's what it's calling. But, what is that set to at the time? The HttpCache::Transaction constructor sets this to a binding for the OnIOComplete method, and it seems it's only set to other things in order to support testing. And that just calls DoLoop. And that would log. So, what's going on in the cache that causes it not to call the IO callback?

Adding some more debug prints to HttpCache, it seems that ProcessQueuedTransactions is executing, and posting a task, but HttpCache::OnProcessQueuedTransactions never executes.

I got stuck here for a bit, and decided I'd put in enough work to ask for help. Pairing with a colleague, we pored over the code, looking for what I might have missed. Nothing jumped out as obviously wrong.

As an experiment, we tried posting another task that just contained a lambda expression logging a message. That task ran! So, this isn't an issue of the thread pool not running tasks.

  base::SingleThreadTaskRunner::GetCurrentDefault()->PostTask(
      FROM_HERE,
      base::BindOnce(&HttpCache::OnProcessQueuedTransactions, GetWeakPtr(),
                     base::UnsafeDanglingUntriaged(entry)));

The GetWeakPtr() in the PostTask invocation is curious, though -- what happens if that pointer becomes invalid before the task runs? Replacing it with base::Unretained(this) "fixes" the issue: the callback gets called. So it seems like the task machinery detects that a weak pointer has been invalidated and drops the task depending on it -- a pretty smart way to avoid use-after-free errors! That means that the HttpCache is getting destroyed before this callback can begin.

`HttpCache` Lifetime

So, let's see if we can figure out what's creating that and how long it lasts. It looks like HttpCache is a subclass of HttpTransactionFactory, and the URLRequestContext references it as http_transaction_factory_. That's set by URLRequestContextBuilder.

Ah, the URLRequestContext is in a local variable so it is being destroyed while the URLRequest is still in progress. So that's no good. Rearranging things to keep the context as an instance variable of the Churl class fixes the issue.

Which is not to say it's working, but that's a job for the next post in the series:

[0605/175824.724186:ERROR:churl_bin.cc(190)] URLRequestContext created: 0x7fb224002df0
[0605/175824.725417:ERROR:churl_bin.cc(198)] calling start     
[0605/175824.730763:ERROR:churl_bin.cc(200)] started     
[0605/175824.734067:FATAL:current_thread.cc(197)] Check failed: sequence_manager. 
#0 0x7fb23107ca8c base::debug::CollectStackTrace()
#1 0x7fb2310332da base::debug::StackTrace::StackTrace()                                                
#2 0x7fb231033295 base::debug::StackTrace::StackTrace()       
#3 0x7fb230d575f9 logging::LogMessage::~LogMessage()           
#4 0x7fb230d02bac logging::(anonymous namespace)::DCheckLogMessage::~DCheckLogMessage()
#5 0x7fb230d02bd9 logging::(anonymous namespace)::DCheckLogMessage::~DCheckLogMessage()
#6 0x7fb230d028bd logging::CheckError::~CheckError() 
#7 0x7fb230ed6cc2 base::CurrentIOThread::Get()     
#8 0x7fb232194f2d net::SocketPosix::Connect()                                                            
#9 0x7fb232199906 net::TCPSocketPosix::Connect()

[update: it turns out it's also necessary to keep the URLRequest in an instance variable]

Chromium Spelunking: Threads and Tasks

djmitche — Thu, 18 May 2023 00:10:02 +0000

In the last post, I "solved" a problem with incorrect usage of task runners, but I still don't feel like I understand how these things work. I happen to have this simple little binary with few moving pieces, so this seemed like a good time to try to learn a bit more.

The Question

I have the following code, which is working fine (meaning it crashes because there's no delegate):

base::ThreadPoolInstance::CreateAndStartWithDefaultParams("churl");
auto task_runner = base::ThreadPool::CreateSingleThreadTaskRunner({base::TaskPriority::USER_VISIBLE});
task_runner->PostTask(FROM_HERE, base::BindOnce(Task));

The "Threading and Tasks" document says

A task that can run on any thread and doesn’t have ordering or mutual exclusion requirements with other tasks should be posted using one of the base::ThreadPool::PostTask*() functions.

My Task() is such a task, so I tried changing the last line to

base::ThreadPool::PostTask(FROM_HERE, base::BindOnce(Task));

This fails. When run directly, it fails with a SIGSEGV from a NULL pointer, but when run under gdb it fails with the familiar Check failed: has_sequenced_context || !post_task_success. I don't know why this doesn't appear when run directly -- maybe the SIGSEGV occurs while trying to print the message? But I'll let that mystery remain.

Deep Background

So, I've misunderstood something here. Perhaps it's time to just start gathering information in hopes I can piece together the big picture. I did this in the first post in the series, but this time I won't include all of my findings inline.

However, my process may be interesting. I began with another re-read of "Threading and Tasks", this time jotting down a few notes for each section but more importantly keeping a running list of questions. As I came across an answer to a question, I'd include that in the notes and cross off the question. I find that this helps me to focus my thinking a little bit: when I have a question, I can write it down and set it aside; and when I find new information, I can compare it to the list of questions in case it answers one or two.

It also means that, after working for a while, any remaining questions on the list are good questions to ask someone with more expertise.

Here's a partial list of questions I jotted down:

Why does base::ThreadPool::PostTask fail?
What is a "Sequence" and how is it different from a "Virtual Thread"?
Can there be multiple task queues per thread?
How are tasks from multiple queues in a thread handled?
When a function takes a callback or delegates, what runner/queue does it use to call those?
How do the PostTask..AndReply methods work?
If there is one SequenceManager per thread, how do thread pools work? Are there multiple SequenceManagers all competing for tasks from the same TaskQueue?
How do the "current defaults" work for SequencedTaskRunner and SingleThreadTaskRunner?

Once I finished reading the document, I started looking at the code itself. I've learned that, in general, this is the slowest way to learn things about Chromium: most source files are sparsely commented and not ordered in a way that helps a newcomer to understand them. I suspect IDEs make this worse, especially in .cc files, as they happily index the functions in a file regardless of the order they appear. But there are usually a few useful tidbits of information to be found. I used code search to find the declarations of various classes I'd read about, spending more time on those that had more to offer.

In the end, I had a few questions left, and I took those to Gabriel Charette (gab@) [1]. His answers prompted some more reading and more questions (and a link to some non-public resources), but helped me make progress quickly! Asking the right person the right questions is an effective way to navigate a low-information environment, but it is difficult to do well. I'm working on it!

Answers

Why does base::ThreadPool::PostTask fail?

I'll answer this at the end.

What is a "Sequence" and how is it different from a "Virtual Thread"?

They are different names for the same thing. A virtual thread is conceptually a sequence of tasks executed one after the other, with the effects of one task being visible to the next. Tasks in a sequence may execute on separate threads. A Sequence implements this, by managing a TaskQueue (and a heap of delayed tasks). More on this below.

Can there be multiple task queues per thread?

Yes. Leading to the followup question:

How are tasks from multiple queues in a thread handled?

SequenceManager handles selecting tasks from those multiple TaskQueues using a TaskQueueSelector.

SequenceManager's doc comment suggests that it multiplexes tasks "into a single backing sequence", but this does not appear to be the case. Instead, ThreadControllers call SequenceManager::SelectNextTask (an override of a method from SequencedTaskSource). That returns the best task to run next, and that task gets run.

When a function takes a callback or delegates, what runner/queue does it use to call those?

This question was sort of off-topic. It's usually written somewhere in the docs for the class, and usually one of the "named threads" like the UI thread or IO thread.

How do the PostTask..AndReply methods work?

PostTask - just run a task (on the task runner associated with the method receiver).
PostDelayedTask - similar to PostTask, but running the task after a delay. This adds a lot of complexity to the implementation, but is pretty simple to use.
PostTaskAndReply - post a task in the runner associated with the method receiver, and when that completes post a second task in the sequence that called PostTaskAndReply. This provides a way to do work on a background thread and then resume the current sequence when it is done.
PostTaskAndReplyWithResult - like PostTaskAndReply but pass return value of first task callback as an argument to the second callback. This provides an even more RPC-like call out to a background thread.

All of these imply some kind of TaskRunner (plain, sequenced, or single-threaded). Task runners are just the gateway to send tasks into the scheduler, and the kind of task runner dictates the execution mode for the task.

If there is one SequenceManager per thread, how do thread pools work? Are there multiple SequenceManagers all competing for tasks from the same TaskQueue?

There's quite a bit of complicated work done to support both prioritizing lots of tasks in a single (named) thread and distributing work evenly in threads pools. SequenceManager is only used for named threads.

One of the key abstractions that bridges both named threads and thread pools is Sequence. This contains a queue of tasks that must be executed sequentially.

In a single thread, lots of Sequences can exist at the same time, and SequenceManager takes care of treating them all fairly.

In a thread pool, Sequences are kept in a priority queue shared among all the workers, and popped from that queue one by one. Popping sequences, rather than tasks, from the queue supports the robust guarantee that the tasks in a sequence are executed sequentially. Still, a worker thread only executes one task from a sequence, re-queuing the sequence if it's not empty.

How do the "current defaults" work for SequencedTaskRunner and SingleThreadTaskRunner?

In a named thread, these are fixed and are just defaults for the SequenceManager. They can all be the same object, since the named thread is a single thread and thus implicitly runs things sequentially.

In a pool, the entire "environment" in which a task runs is set up and torn down by TaskTracker::RunTask, and this includes the CurrentDefaultHandle for both SequencedTaskRunner and SingleThreadTaskRunner.

Which of these is set depends on the "execution mode" of the current task. So, if the current task was run via a SequencedTaskRunner, then its execution mode is kSequenced, and only the SequencedTaskRunner::CurrentDefaultHandle will be set during execution of the task.

This makes sense when you consider that a task may be invoking functions down a dependency chain. In churl, we're invoking URLRequest functions which call into more specific //net libraries. All of those may need to schedule callbacks in the "current" sequence, and it's possible that one of the deeply buried dependencies needs to be single-threaded or (in the case that began this whole journey!) sequenced. Using the different CurrentDefaultHandles at least makes such a thing crash immediately if the requirement isn't met, rather than subtly introducing race conditions. It'd be nice if that could be caught at compile time [2]!

Why does base::ThreadPool::PostTask fail? for real this time

Briefly: my top-level Task includes its dependencies, so it doesn't match the description doesn’t have ordering or mutual exclusion requirements in the text quoted at the top of this post.

The base::ThreadPool::PostTask API is a shortcut to base::ThreadPool::CreateTaskRunner(...)->PostTask. Which is to say, it runs the task in the thread pool's TaskRunner. A TaskRunner makes no guarantees about ordering, meaning that there is no "current" SequencedTaskRunner when the task is running.

The base::ThreadPool::CreateSingleThreadTaskRunner({}) method gets a new SingleThreadTaskRunner (actually, it may be a singleton, but that's not important) that knows how to schedule work into the thread pool. Posting the task on that runs the task in the thread pool, in a single-threaded context.

Conclusion

I feel like I understand the situation quite a bit better now:

I can avoid making mistakes that would lead to concurrency errors.
I can understand and debug task-related errors like the one at the top of this post.
I can reason about the behavior of code that uses tasks.

There's a lot more detail about jobs, efficiency (both in terms of CPU time and power consumption), responsiveness, fairness, object ownership, and so on that was interesting to learn about, but ultimately doesn't serve these goals, so I've omitted it here.

I'll be making a CL to update and expand the documentation somewhat, in hopes of making this more efficient for the next person. Leaving things better than you found them is a good habit to cultivate!

Next time, we'll get back to the task at hand: loading a URL. I promise.

[1] All of the errors in this post are mine, not Gabriel's! All of them. All mine.

[2] The distinction between "Sequenced" and "SingleThread" is basically what Rust's Send bound represents. Rust's async support is conceptually similar to task scheduling, in that an executor schedules polls of various futures. Some executors poll all futures in a single thread, meaning that the futures do not need to be Send. But more "advanced" executors use a pool of worker threads and work-stealing to allow futures to be polled from multiple threads (so the futures must be Send), but guarantee that it won't happen simultaneously (so the futures need not be Sync). I don't know of an executor that can handle both Send and non-Send futures.

Chromium Spelunking: Creating a Request

djmitche — Thu, 11 May 2023 16:52:41 +0000

In the last post, I built an executable which created a URLRequestContext and then exited. Not very exciting, but it was a bit of work! In this post, I'll be using that instance to create an actual URLRequest.

`URLRequestContext::CreateRequest`

A skim of url_request_context.h shows a helpful-looking function, CreateRequest.

This takes a few straightforward arguments plus a NetworkTrafficAnnotationTag, and the comments focus on that type. Unfortunately, network_traffic_annotation.h doesn't contain a lot of comments that might help me understand what this is and what it does. Running codehistory on the file doesn't show much, either. From what I can guess, this is some kind of annotation that can be found in the source code and reviewed or audited. Anyway, in one of the commits that modified this file I find net::MutableNetworkTrafficAnnotationTag(TRAFFIC_ANNOTATION_FOR_TESTS). That didn't work immediately, but I see that the MutableNetworkTrafficAnnotationTag has an explicit cast operator to NetworkTrafficAnnotationTag, which does.

The types work out, and the compile succeeds, but it seems I've still not squared away the task runners:

[0509/212941.923727:FATAL:url_request.cc(597)] Check failed: base::SingleThreadTaskRunner::HasCurrentDefault().

I thought I had handled this in the last post, so this was a bit disappointing. One of things about Chromium that I'm adjusting to is that a lot of the information is stored in people, and not as data. So, I asked a few people via chat, and their responses improved my understanding of things a little. One suggestion was to look at other one-off executables like net_watcher, or to use base::test::TaskEnvironment. But I wanted to learn what was going on, rather than just getting things done. With a little more digging, I determined that SingleThreadTaskRunner is a subclass of SequentialTaskRunner that is capable of running tasks on the same thread, if they have some reason to be tied together like that. This is contrary to some other suggestions that SingleThreadTaskRunner is only for testing!

Anyway, switching

-  auto thread_runner = base::ThreadPool::CreateSequentialTaskRunner(
+  auto thread_runner = base::ThreadPool::CreateSingleThreadTaskRunner(
       {base::TaskPriority::USER_VISIBLE});

fixed this issue.

Next Up

This was a pretty short post, mostly because I was only able to devote a few minutes at a time to this work. Next up, I will call the Start method on the URLRequest, which will require a URLRequest::Delegate. Once the header is read, I'll call Read to read the response body. The end is in sight!

Chromium Spelunking: Churl

djmitche — Mon, 08 May 2023 18:47:41 +0000

At the end of my last post I indicated I would try to build a simple executable to fetch a URL, similar to curl. churl sounds like a suitably silly name for it.

This post is going to be a little on the long side -- it's leaning into the "lightly edited lab notebook" format, in hopes of highlighting some of the places I got stuck.

An Executable

I want to focus on my goals here -- understanding the network stack, specifically proxies and QUIC -- so I don't want to get distracted with things like how to set up Ninja to create a new executable. So, I went looking for something that was near to what I wanted and which I could copy/paste and modify. I got lucky! There is a quic_client executable that seems quite close, really. However, it's specific to QUIC and I want to make something that begins at URLRequest. Still, this gives me an easy way to create a new executable in net/BUILD.gn and do a simple "Hello World":

int main(int argc, char* argv[]) {
  DLOG(ERROR) << "Hello, world.";                                                                                                                                                                                  
}

Getting to URLRequestContext

From the previous post, step one is going to be getting a URLRequestContext up and running, and that requires a builder.

ThreadPoolInstance

Naively trying to create such a thing gets me an error because I haven't initialized a thread pool. The network stack does seem to use a lot of the task-posting magic, so I suppose that's necessary! A bit of guessing based on base/test/task_environment.cc leads me to

base::ThreadPoolInstance::Set(std::make_unique<base::internal::ThreadPoolImpl>("my histogram"));

and on to the next error. I am hoping to only have to make two or three of these "wild guesses" before things start working. Beyond that point, when something doesn't work it will be almost impossible for me to figure out which of the n>3 things I do not understand might be malfunctioning.

SequencedTaskRunner

Next up, I need a SequencedTaskRunner to run tasks on that thread pool:

[0508/164407.189292:FATAL:post_task_and_reply_impl.cc(158)] Check failed: has_sequenced_context || !post_task_success.

Checking sequenced_task_runner.h shows a lot of docs about the type, but not how to construct it. The bottom of the doc comments mentions some "theoretical implementations", suggesting that this is an abstract base class, and in fact a few methods are = 0. So, where are the implementations? Ordinarily I'd use CodeSearch for this, but it seems that the 91 subclasses are more than its little brain can handle.

I continued on a little down this path, but I am learning that when I find myself reading the Chromium source, I'm looking in the wrong place. In fact, the nice Threading and Tasks document includes snippets that create task runners:

auto thread_runner = base::ThreadPool::CreateTaskRunner({base::TaskPriority::USER_VISIBLE});

The same error occurs, I think because while this creates a new task runner, it does not set it as the default. It looks like CreateDefaultHandle is an RAII wrapper for setting a task runner as the default, so let's try

base::SequencedTaskRunner::CurrentDefaultHandle active_thread_pool(thread_runner);

This call itself fails:

[0508/171849.025900:FATAL:sequenced_task_runner.cc(99)] Check failed: task_runner_->RunsTasksInCurrentSequence().

The comment docs for RunsTasksInCurrentSequence suggest that it is meant to be called within a task, so maybe I need to run the remainder of churl in a task in this runner?

void Task() {
  net::URLRequestContextBuilder url_request_context_builder;
  url_request_context_builder.set_user_agent("Dustin's Experiment");

  auto url_request_context = url_request_context_builder.Build();

  DLOG(ERROR) << "Hello, world." << url_request_context;
}

int main(int argc, char* argv[]) {
  base::ThreadPoolInstance::Set(std::make_unique<base::internal::ThreadPoolImpl>("my histogram"));

  auto thread_runner = base::ThreadPool::CreateSequencedTaskRunner({base::TaskPriority::USER_VISIBLE});
  thread_runner->PostTask(FROM_HERE, base::BindOnce(Task));
}

Indeed, that compiles, but since main exits immediately after posting the task, it doesn't print "Hello, world."

Referring to the Threading and Tasks document again, I tried base::RunLoop().RunUntilIdle(), but this seems to require that it be called with the current loop set, and that seems a bit circular. Lower down I see base::RunLoop::Run, which will call until a QuitClosure is run. That sounds better, but ultimately has the same issue:

[0508/173608.004631:FATAL:single_thread_task_runner.cc(44)] Check failed: handle. Error: This caller requires a single-threaded context (i.e. the current task needs to run from a SingleThreadTaskRunner). If you're in a test refer to //docs/threading_and_tasks_testing.md.

Further down the document:

// To block until all tasks posted to thread pool are done running:
base::ThreadPoolInstance::Get()->FlushForTesting();

But that just hangs without actually running the task. Even further down the document, there's a section entitled "Using ThreadPool in a New Process", and it turns out that this is what I'm trying to do. It uses

base::ThreadPoolInstance::CreateAndStartWithDefaultParams("process_name");

instead of the initialization I was using. I don't really understand the difference, but this appears to make things work and gets the next error from net::URLRequestContextBuilder::Build:

[0508/174429.973862:FATAL:configured_proxy_resolution_service.cc(800)] Check failed: proxy_config_service.

Proxy Config Service / Proxy Resolution Service

Proxies! At least that's relevant to my interests.

Following the traceback for that failure:

#6 0x7f00db30f76f logging::CheckError::~CheckError()
#7 0x7f00dd28f8f7 net::ConfiguredProxyResolutionService::CreateUsingSystemProxyResolver()
#8 0x7f00dd63b1b7 net::URLRequestContextBuilder::CreateProxyResolutionService()
#9 0x7f00dd637485 net::URLRequestContextBuilder::Build()

it looks like Build is passing a NULL unique_ptr for proxy_config_service to CreateProxyResolutionService, and adding some debug prints confirms that. In fact, the debug prints confirm that BUILDFLAG(IS_LINUX) is not true, despite building this on a Linux system. I don't know a quick way to fix that, so hopefully I can work around it.

Commenting out the build conditionals in url_request_context_builder.cc gets a different error, I think to do with ProxyConfigService::CreateSystemProxyConfigService requiring a SingleThreadTaskRunner when we're using a SequencedTaskRunner, so maybe that's a dead-end.

Maybe I can provide a stubbed-out implementation of these types? It looks like ProxyConfigService can create a ProxyResolutionService, or I can just specify a ProxyResolutionService directly. Looking for subclasses of both abstract base classes, I found a local ProxyConfigServiceDirect which just always returns DIRECT. That looks useful, so I'll copy-paste it (and add the necessary namespaces):

// Config getter that always returns direct settings.
class ProxyConfigServiceDirect : public net::ProxyConfigService {
 public:
  // ProxyConfigService implementation:
  void AddObserver(Observer* observer) override {}
  void RemoveObserver(Observer* observer) override {}
  net::ProxyConfigService::ConfigAvailability GetLatestProxyConfig(
      net::ProxyConfigWithAnnotation* config) override {
    *config = net::ProxyConfigWithAnnotation::CreateDirect();
    return CONFIG_VALID;
  }
};

void Task() {
  net::URLRequestContextBuilder url_request_context_builder;
  url_request_context_builder.set_proxy_config_service(
    std::make_unique<ProxyConfigServiceDirect>());
  // ..
}

and with that, I've successfully built a UrlRequestContext!

Checking In

OK, so I've made some progress today. A lot of this has been trial-and-error. There are two problems with this approach. First, it means I don't really learn how things work -- for example, I don't know why so many options for setting up a task runner failed. But I can add an item to my task list to learn more about that some day, and carry on with my current project.

Second, the distinction between "OK" and "Error" in this kind of trial-and-error is not always clear. In this case, I thought I had finished the creation of the thread pool successfully, but it turned out I had to revisit this and do it a different way. That was down to luck, and with a few more repetitions of trial-and-error, the universe of unknown things that could be wrong is so large that no amount of luck is enough to make progress.

The tools also let me down a bit here: CodeSearch was overwhelmed and unable to give me answers. I omitted it above, but I fell back to git grep. Like a pocket-knife, it's a tool that isn't fancy but is always there and always works. Another go-to tool that I used here was debug prints (in this case, DLOG(ERROR) << "HERE 1"; and incrementing the number). It's not especially cool, but it provides reliable, ground-truth information.

The code itself let me down a little, too. Classes in base seem to be well-commented, but with deep technical detail and not answers to questions like "how do you create an instance". Maybe that's OK -- it's a lot to ask from code comments -- but it's a reminder to me that I need to look outside of the code for higher-level documentation.

Chromium Spelunking: Life and Times

djmitche — Fri, 05 May 2023 16:21:30 +0000

In the last post, I read through one of the guides to the network stack and summarized my findings, with two more to go. In this post, I'll cover those two subsequent documents, and then plot how I'll start digging in deeper.

Life of a URLRequest

This document is a top-down summary of how URLs are fetched, meaning it begins with some function that says "here's a URL, go get it" and probably ends with some details of TCP connections and HTTP transactions. I tend to think in the opposite order: bottom-up. So, I want to understand how TCP connections are handled and what the API is for that implementation. Once I've got that down, I want to know how the next higher layer (HTTP?) operates and what its API is. And so on.

Preliminaries

This document begins with some general observations, which may help when it comes time to unravel how to find instances of the dozens of classes involved here.

URLRequestContext is the top-level entry point for loading a URL, and creates URLRequest instances. It seems like it encapsulates the "top half" of the network stack, down to where actual network connections occur.
That second level is encapsulated in HttpNetworkSession, which handles network streams, socket pools, and so on.
Following a pattern that is common to Chromium, sets of callbacks for users of the network stack are bundled together in "Delegate" classes, in this case URLRequest::Delegate (specific to a request) and NetworkDelegate (global to theURLRequestContext`).

There are some details about how other parts of Chromium communicate with the network stack via Mojo, but for the moment my focus is within that boundary, so I'll ignore that. In fact, that makes quite a bit of this document irrelevant to our purposes.

Tip to Toe and Back

network::URLLoader (part of the network Mojo service, by the network:: namespace) creates a URLRequest. This is handed to network::ResourceScheduler to actually start the request. This suggests that a URLRequest doesn't start immediately on creation -- something to look out for later.
URLRequest gets an implementation of URLRequestJob from the URLRequestJobFactory. Specifically, that will be a URLRequestHttpJob instance.
URLRequestHttpJob attaches cookies to the request (and probably some other stuff!) and then makes an HttpCache::Transaction and activates it. It seems the HTTP cache is a read-through cache, as on a miss the cache is responsible for the next steps:
Use the HttpNetworkLayer to create a new HttpNetworkTransaction. The document says it "transparently wraps" this object, but it's unclear what that might mean.
HttpNetworkTransaction then gets an HttpStream from the HttpStreamFactory.

I imagine that by the time we have an HttpStream, we're in the lower of the two "big layers", but I don't see any mention of HttpNetworkSession here. Presumably HttpStream is an abstraction for a connection that can carry requests and responses, but doesn't get into the specifics of HTTP versions or connection mechanisms. Continuing with the process of creating an HttpStream (assuming the simple case with no pre-existing sockets):

HttpStreamFactory::Job needs to get a client socket (which it will store in a ClientSocketHandle) from the ClientSocketPoolManager. It sounds like this object is where proxies might get hooked in, probably with some recursive calls, but in this simple case it relies on the TransportClientSocketPool. I suppose "Transport" here means over an actual HTTP/x protocol on a network connection (so, not proxied). There's a ClientSocketPoolBase and ClientSocketPoolBaseHelper involved here, too - are you getting some strong Java vibes here?

In this case the pool is empty, so it needs to create a new connection, via a TransportConnectJob (there's that word "job" again..). This will handle DNS resolution, which is probably fascinating with the advent of DoH but out of scope for me at the moment.
The HttpStreamFactory::Job gets the connection object (wrapped in a ClientSocketHandle) and creates an HttpBasicStream. I'm guessing this is a subclass of HttpStream, as it passes this back to the HttpNetworkTransaction.
The HttpNetworkTransaction then passes the request header and body to HttpBasicStream, which uses an HttpStreamParser to write the headers and body to the stream. That's an interesting use of a "parser", but OK.
The HttpStreamParser then waits for the response header, parses it, and sends it back up the stack: HttpNetworkTransaction, HttpCache::Transaction (which probably caches a copy, if possible), and URLRequestHttpJob (which saves cookies), and URLRequest.

This section mentions HTTP/1.x, so it's possible that H2 and QUIC diverge from this process somewhere before this point.
The body is read by passing buffers all the way up and down the stack.
Once the request is complete, HttpNetworkTransaction determines whether the connection is reusable -- depending on headers in the connection, the response, and so on -- and either returns it to the pool or destroys it.

All of that seems comprehensible enough to provide a scaffolding for understanding this later. I've noted a few questions that I'd like to answer, too:

What is a "job"? This seems like a pattern like factories and builders, but maybe more specific to the network stack or chromium (like delegates).
Where do H2 and QUIC diverge in this process?
What do things look like, at this level of detail, when there's a proxy involved?
Where does TLS fit in?

Happily, most of these are covered in the remainder of the document.

Ownership (??!)

The next bit of the document contains a comically complex ownership diagram that seems to combine ownership, inheritance, templating, and interfaces. It has footnotes for additional information that does not appear "clearly" in the diagram! Perhaps this will be a useful reference for me later as I try to avoid introducing use-after-free or double-free bugs.

Socket Pools

Socket pools are keyed by a "group name", such that connections with the same group name can be used interchangeably. This is made up of a host, port, protocol, and "privacy mode".

Sockets aren't OS-level sockets, and it seems there are a number of implementations of sockets, all with their own pools. In fact, these can be layered, so a higher-level socket utilizes a lower-level socket. I suppose the obvious case here is a TLS socket utilizing a TCP socket. ConnectJob is another "job" implementation here, in this case performing the operations to initiate a socket connection.

There are some details here of the class relationships that I will want to refer back to.

Proxies

HttpStreamFactory::Job uses a "Proxy Service" to determine which proxies to use for a request. Each proxy then exposes a socket pool for connections via that socket, and HttpStreamFactory gets a socket from that pool.

HTTP/2

HTTP/2 (a.k.a. SPDY) has a slightly different "shape" from HTTP/1.x. It works over a TCP connection just like HTTP/1.x, and can be activated during TLS negotiation. It allows multiple, concurrent connections in a single session (= TCP connection). The network stack will multiplex multiple concurrent requests over a single session, but it appears that's not done via another layer of connection pooling. Rather, the HttpStreamFactory::Job creates a SpdySession and from that a SpdyHttpStream, which it passes to the HttpNetworkTransaction. But it's not clear from the text how an existing SpdySession would be used to create a new SpdyHttpStream.

There's some extra optimization here to avoid making multiple TCP connections to a server that supports HTTP/2.

QUIC

QUIC (the transport beneath HTTP/3) has a very different shape from HTTP/1.x. To begin with, it operates over UDP, not TCP. A server's support for QUIC is advertised in headers, so the browser must "remember" which servers support QUIC and try to connect with QUIC when that server is next used.

When a server supports QUIC, HttpStreamFactory will "race" two jobs - one for QUIC and one for all previous protocols -- and pick the one that gets a stream first. This strategy is reminiscent of the "happy eyeballs" algorithm for IPv4 and IPv6. It gets the best performance for the user at the cost of "wasting" some connections.

Proxy support in Chrome

I set out to read this document in the previous post, but on closer inspection it's not especially relevant. It mostly covers how proxies are configured, and mostly from the perspective of someone doing the configuring.

It does link to crbug 969859 where support for QUIC proxies was disabled by default. As with many Chromium bugs, it and the blocked/blocking bugs are pretty low on details!

Next Steps

That exhausts the "obvious" sources of documentation, although I'm sure I'll find some more as I proceed. Chromium development has a common practice of putting documentation in Google Docs documents. These are usually (but not always) linked from somewhere (a CL, a bug, or maybe in the source), and they are sometimes publicly readable (I won't be able to comment on anything that is not). These documents are generally "design documents", so they discuss a proposed change along with alternatives and potential impacts. What they do not do is document how things work -- they generally only make sense if you understand the state of the codebase before the proposed change, and if no subsequent change has been made to the same code.

I hope it's clear why this situation is a nightmare from an approachability perspective!

I have two next steps in mind:

Begin exploring the code from the bottom up (so, beginning with some of the simpler socket pool implementations). I have written a useful script to help me dig up the "hidden documentation" for a piece of code, so I'll be interested to see how that works in practice.
Try to write a curl-like utility that embeds the network stack and fetches the URL given on the command line. I expect this will be a substantial amount of work -- I think it involves building a new "embedder" and likely implementing lots of complex delegate methods -- but I might learn something from the attempt even if I don't finish it.

So far I've just been passively "absorbing" information, and that's typically not a great way to learn, so I am inclined to get a start start on the curl-like utility just to get my fingers on the keyboard for a bit.

Chromium Spelunking: Getting Started

djmitche — Mon, 24 Apr 2023 21:23:29 +0000

Introduction

I'm now working day-to-day on Chromium, the open-source code-base behind Chrome. It's a C++ codebase with a healthy dose of Java and JS thrown in, although I'm mostly in the C++ bits of the codebase. It's been a steep learning curve, and I'd like to start documenting that learning curve with a few goals in mind:

Structure my own thinking and learning about Chromium.
Help others who are beginning to work on Chromium.
Start to identify some ways that Chromium could improve its approachability.

That term will become a theme. An approachable codebase is one where it is easy for a newcomer to get started. I think this is an important aspect of any codebase, but especially open source codebases. Developers move from project to project all the time, as I just did. An approachable codebase lets a developer get going quickly. It benefits existing developers, too: if new developers can find answers to questions, then more experienced developers don't have to spend time answering those questions.

In the last few weeks, several of my questions have garnered answers of the form "I don't work on Chromium anymore, but ..." While I feel for these experienced engineers haunted by the ghosts of projects past, if the codebase was more approachable then I wouldn't have to ask them!

What to Expect

I'll be blogging about my spelunking as it happens. Think of this as a lightly-edited lab notebook: all of the false starts, incorrect assumptions, and missed connections are here for you to see. I'll try to use complete sentences, explain things clearly, and organize my thoughts into a coherent order within each post.

The Task

The project I'm gearing up for involves adding some additional functionality to Chromium's network stack, to allow it to proxy QUIC connections over other QUIC connections. It's OK if you don't know what that means just yet -- I only have the vaguest sense myself. For the moment, I need to know how the network implementation is put together, so that I can see what parts I will need to modify.

Big Picture

My first step is to get the "big picture": the major components and patterns. With this information, I can start looking at the details, confident that I understand where those details fit. This is a "top-down" approach. And I typically begin by looking for developer documentation. In the case of the Chromium network layer, I found the following:

Chrome Network Stack Common Coding Patterns - an overview of how the code in //net is built. This will help me understand code as I begin reading it.
Life of a URLRequest - a step-by-step tour of the network stack's main job: fetching a URL. This document names a lot of classes and methods that will be useful later.
Proxy Support in Chrome - more detail on the network stack's proxy support.

A few general lessons from the coding patterns:

Lots of functions in the stack's API use variants of the libc return value style: negative numbers are error codes, positive numbers indicate success, perhaps as a byte count, and zero can indicate simple success or EOF, depending on context.
Some functions can either finish synchronously or asynchronously. As a baseline, a C syscall like write(2) will return EAGAIN when it would otherwise block, and the expectation is that it will be called again when the application believes the write might succeed. This would continue until the call is successful. The Chromium functions all take a callback, and it's unclear from this description whether a synchronous completion invokes this callback, or expects the caller to do so. So, I'll need to figure that out, and document it.
It's common for network types to be structured as state machines, with a DoLoop at the core calling DoXxx methods based on a next_state_ instance variable until either it must wait for something to complete (ERR_IO_PENDING) or the request is complete. There's a bit more data about what is responsible for calling the callback which I'll need to explore as I start reading the code.

The "Life of a URLRequest" document is long and detailed, so I'll save that and the proxy document for the next post.