loading...

Figuring out how `curl` stores configurations

captainsafia profile image Safia Abdalla ・4 min read

In my last blog post, I started diving into the code base for the curl command line tool.

Sidebar: I don’t know if I’ve been consistent about spelling "command line" correctly throughout my blog posts. It turns out the proper way to spell it is as two separate words. I believe I’ve done it incorrectly and spelled it as one word in a couple of blog posts. Anyways, just a fun fact.

At the end of the last blog post, I drafted up a list of questions that I wanted to answer in my analysis of the core code of the command line tool.

  1. What calls to the libcurl are made in the operate function?
  2. What preprocessing, if any, is done to the arguments passed to the operate function?

So I headed over to the src/tool_operate.c (link) and started diving into the code.

Another sidebar: When I reference a filename, I’m usually linking directly to the source file on GitHub. I just realized that my blog theme doesn’t make this obvious when I style the text using a monospaced font. From now on, I’ll reference the files with the “(link)” parenthetical.

In the operate (link function, the first couple of lines are responsible for, and this should come as no surprise to anyone who’s been following along with these blog posts for a while, parsing out the arguments that are passed to the curl command.

  /* Parse .curlrc if necessary */
  if((argc == 1) ||
     (!curl_strequal(argv[1], "-q") &&
      !curl_strequal(argv[1], "--disable"))) {
    parseconfig(NULL, config); /* ignore possible failure */

...
      else if(res == PARAM_ENGINES_REQUESTED)
        tool_list_engines(config->easy);
      else if(res == PARAM_LIBCURL_UNSUPPORTED_PROTOCOL)
        result = CURLE_UNSUPPORTED_PROTOCOL;
      else
        result = CURLE_FAILED_INIT;
    }
    else {
#ifndef CURL_DISABLE_LIBCURL_OPTION
      if(config->libcurl) {
        /* Initialise the libcurl source output */
        result = easysrc_init();
      }
#endif

From the last blog post, I learned that the config parameter referenced here is the global configuration file that is stored in .curlrc, as you can see from some of the code comments in the code snippet above.

Once the parameters have been parsed, the main operations of the function commence. First, the function extracts some OperationConfig object from the global config object.

size_t count = 0;
struct OperationConfig *operation = config->first;

I looked through the curl docs to learn a little bit more about what this OperationConfig object is. I found a definition of the OperationConfig struct (file). Essentially, it contains specific attributes that are used when making the transfer request. This includes things like the user-agent to use in an HTTP request or the port to use in an FTP request. This made sense to me, but one of the things that intrigued me was the definition of the GlobalConfig object (referenced as config in the code snippet above). Below is a snippet of the parts that interested me highlighted.

struct GlobalConfig {
  ...
  struct OperationConfig *first;
  struct OperationConfig *current;
  struct OperationConfig *last; /* Always last in the struct */
}

So it looks like it stores three different OperationConfig objects under the names first, current and last. Why is this the case? When I saw that the OperationConfig object was referenced using the config->first statement, I figured that it might be some linked-list type structure. This suspicion was confirmed when I re-read the definition of the OperationConfig struct. Specifically, the last lines of the struct definition.

struct GlobalConfig {
  struct OperationConfig *prev;
  struct OperationConfig *next; /* Always last in the struct */
}

This makes sense now. The OperationConfig objects are structured in a linked-list structure. Given a GlobalConfig called config, you could find the second OperationConfig object in memory by calling config->first->next and the third by calling config->first->next->next and so on. Since this is a doubly linked list, you can find the second-to-last-item by calling config->last->prev. Cool! So now I know what this code is doing, but why does it exist? Why would you need to maintain a list of the configuration details for each operation? To be honest, I was a little bit lost on what the best approach for answering this question would be. I think exploring how the GlobalConfig and OperationConfig structs are used throughout the code base would be a waste of time and might end up leading me on tangents. I don’t wanna replay what went down with the Git code base read (yikes). I wondered if I should look through the commit logs of the code base to track down why the configuration might be structured this way. I found this commit which references that multiple operations might be executed with curl. An operation is something like a single HTTP request. With curl, you can send multiple requests out at once using a single command. The doubly-linked list structure is used to separate out the configuration options for each operation.

So this was a good way to answer the second question that I outlined above. The configuration for curl is operation-dependent and stored in a doubly-linked list. In the next blog post, I’ll try to answer question number one by looking at the next couple of lines of code.

        /* Perform each operation */
        while(!result && config->current) {
          result = operate_do(config, config->current);

          config->current = config->current->next;

          if(config->current && config->current->easy)
            curl_easy_reset(config->current->easy);
        }

See you next time!

Posted on by:

captainsafia profile

Safia Abdalla

@captainsafia

I make open source at @nteractio, make software at @Microsoft, and write books and blogs. Dream big and follow through even bigger.

Discussion

markdown guide