tarantool

Posted on Nov 17, 2021 • Edited on Jan 26, 2022

How to write three times fewer lines of code when doing load testing

#programming #coding #go #javascript

The key concept of load testing is automating everything that can be automated. Take a tool, write a configuration and a test scenario, then run a simulation of an actual load. The less code the better.

Automating load testing is not as difficult as it may seem at first glance. All it takes is the right tool.

In this article, I will show how I reduced the code of my testing utility threefold without any performance losses. I'm also going to explain why Yandex.Tank combined with Pandora didn't work for me.

What is load testing

My name is Sergey, and I'm a developer on the architecture team at Tarantool. Tarantool is an in-memory computing platform designed to handle exceptionally high loads, up to hundreds of thousands of RPS. That makes load testing essential for us, so I perform it every day. I am sure that almost everybody knows precisely why load testing matters, but let's review the basics just in case. The results of load testing show how your system behaves in different scenarios:

What parts of the system are idle in what cases?
What is the approximate request response time?
At what load does the system become unstable?
What part of the system causes malfunctions?
What part of it puts a limit on the overall performance?

Why we need special tools for load testing

When developing an application on Tarantool, we often have to test the performance of a stored procedure. The application accesses the procedure over the iproto binary protocol. Not every language can be used to test over iproto. There are Tarantool connectors for a number of languages, and you have to write your tests in one of them.

Most testing tools only support HTTP, which is not an option for us. Sure, we could add some controls and make the best of it, but that wouldn't help the end user. Since we pass the stored procedures to the client side, testing via HTTP is unreliable.

Common load testing tools

At first, we considered a popular tool called JMeter. However, we were not impressed by its performance. It's written in Java and therefore is memory-hungry and slow. Besides, we used it to test via HTTP, which meant indirect testing performed through special controls. Then we tried writing custom Go utilities for each project, which was a road to nowhere, it's no use writing code over and over when it's thrown away right after the testing is complete. That's no systematic approach. Let me reiterate that we want to automate as much as we can in load testing. That's how we got to Yandex.Tank and Pandora, as this combination seemed like a perfect tool satisfying all the requirements:

It can easily be adapted to any project.
It's fast, since Pandora is written in Go.
Our team has a lot of experience with Go, so working out the scenarios won't be a problem.

But there were also disadvantages.

Why we stopped using Yandex.Tank

Our time with Yandex.Tank was brief, and here are a few key reasons we gave up on it.

Lots of utility code. The Pandora wrapper that allows you to work with Tarantool contains ~150 lines of code, most of which don't bear any testing logic.

Constant source code recompilation. We encountered this problem when we had to keep loading the system while simultaneously generating various amounts of data. We couldn't find a convenient external way to control data generation parameters, and pre-generation wasn't an option. So we changed the data and compiled a new source every time. Such manipulations could spawn up to 20 loader binaries per test scenario.

Scarce data when using standalone Pandora. Yandex.Tank is a wrapper that provides a pretty neat metrics visualization. Pandora is the engine that generates the load. Effectively, we were using two different tools, which was not always convenient (thankfully, we have Docker).

Configuration file options are not very intuitive. JSON and YAML configurations are a sensitive topic per se. But it becomes really unpleasant when it isn't clear how an option works depending on the values. For us, startup was such an option. It produced the same results on entirely different values, making it difficult to assess the system's actual performance.

All that created the following situation in one of our projects:

huge piles of source code
unclear metrics
overly complicated configuration.

What led us to k6

k6 is a load testing tool written in Go, just like Pandora. Therefore, performance is nothing to worry about. What's appealing about k6 is its modularity, which helps avoid constant source code recompilation. With k6, we write modules to access the Tarantool interface and do other things like generating data. Since modules are independent of one another, it's not necessary to recompile every one of them. Instead, you can customize data generation parameters within a scenario written in... JavaScript! Yep, that's right. No more JSON or YAML configurations, k6 testing scenarios are code! The scenario can be divided into stages, each of which models a different type of load. If you alter the scenario, there's no need to recompile the k6 binary, as they don't depend on one another. That makes two fully independent components written in programming languages. You can finally forget about configurations and just write your code.

Our application

This testing application in Lua stores information about car models. I use this application to test database writes and reads. The application has two main components, API and Storage. The API component gives the user HTTP controls for reading and writing, while Storage is responsible for the application's interaction with the database. Here is the interaction scenario: the user sends a request, and the controls call the database functions necessary to process that request. Check out the application on GitHub.

Getting k6 to work with the application

To create a k6 Tarantool interaction module, we first need to write a Go module using the xk6 framework. This framework provides tools for writing custom k6 modules. First, register the module so that k6 can work with it. We also need to define a new type and its receiver functions, that is, methods to call from the JavaScript scenario:

package tarantool

import (
    "github.com/tarantool/go-tarantool"
    "go.k6.io/k6/js/modules"
)

func init() {
    modules.Register("k6/x/tarantool", new(Tarantool))
}

// Tarantool is the k6 Tarantool extension
type Tarantool struct{}

We can already use this module, but it doesn't do much yet. Let's program it to connect to a Tarantool instance and to invoke the Call function provided by the Go connector:

// Connect creates a new Tarantool connection
func (Tarantool) Connect(addr string, opts tarantool.Opts) (*tarantool.Connection, error) {
    if addr == "" {
        addr = "localhost:3301"
    }
    conn, err := tarantool.Connect(addr, opts)
    if err != nil {
        return nil, err
    }
    return conn, nil
}

// Call invokes a registered Tarantool function
func (Tarantool) Call(conn *tarantool.Connection, fnName string, args interface{}) (*tarantool.Response, error) {
    resp, err := conn.Call(fnName, args)
    if err != nil {
        return nil, err
    }
    return resp, err
}

The full code of the module can be found in this GitHub repo.

This code is already far more compact than what Pandora requires for working with Tarantool. The Pandora version had about 150 lines of code, and now we have 30. However, we haven't implemented any logic yet. Spoiler alert: we're going to end up with ~50 lines of code. k6 will take care of everything else.

Interacting with the module from a scenario

First, we'll import that custom module into our scenario:

import tarantool from "k6/x/tarantool";

Now let's create a connection:

const conn = tarantool.connect("localhost:3301");

connect is the receiver function we've declared in our module. If you want to pass an object that stores connection options, provide it as a second parameter in a simple JSON object. All that's left is to declare testing stages and launch the test:

export const setup = () => {
  tarantool.insert(conn, "cars", [1, "cadillac"]);
};

export default () => {
  console.log(tarantool.call(conn, "box.space.cars:select", [1]));
};

export const teardown = () => {
  tarantool.delete(conn, "cars", "pk", [1]);
};

There are three testing stages in this example:

setup is performed before the test. Here is where you prepare the data or display an information message.
default, which is the main test scenario.
teardown is performed after the test is completed. Here you can erase the test data or display another information message.

After the test is launched and finished, you will see an output like this:

Here is what you can learn from this output:

What scenario is running.
Whether the data is being written to the console or aggregated via InfluxDB.
Scenario parameters.
Scenario console.log output.
Execution process.
Metrics.

The most interesting metrics here are iteration_duration, representing latency, and iterations, representing the total number of iterations performed and their average number per second — the desired RPS.

How about something more substantial?

Let's create a test bench consisting of three nodes, with two of them combined in a cluster. The third node will host k6's load system and a Docker container with Influx and Grafana. This is where we'll send the metrics and visualize them.

Each cluster node will look like this:

We don't place the storage and its replicas in the same nodes: If the first storage is in the first node, its replica is in the second node. Our spaceв (basically a table in Tarantool) will have three fields: id, bucket_id, and model. We'll create a primary key based on id and another index based on bucket_id:


local car = box.schema.space.create(
        'car',
        {
            format = {
                {'car_id', 'string'},
                {'bucket_id', 'unsigned'},
                {'model', 'string'},
            },
            if_not_exists = true,
        }
    )

    car:create_index('pk', {
        parts = {'car_id'},
        if_not_exists = true,
    })

    car:create_index('bucket_id', {
        parts = {'bucket_id'},
        unique = false,
        if_not_exists = true,
    })

Let's test the creation of car objects. To do so, we're going to write a k6 module for generating data. Earlier, I mentioned 30 lines of utility code, and here are the remaining 20 lines of test logic:


var bufferData = make(chan map[string]interface{}, 10000)

func (Datagen) GetData() map[string]interface{} {
    return <-bufferData
}

func (Datagen) GenerateData() {
    go func() {
        for {
            data := generateData()
            bufferData <- data
        }
    }()
}

func generateData() map[string]interface{} {
    data := map[string]interface{}{
        "car_id": uniuri.NewLen(5),
        "model":  uniuri.NewLen(5),
    }

    return data
}

I left out the part with the initialization function and the definition of the type used to call other functions. Now let's create receiver functions that we'll invoke from our JavaScript scenario. Interestingly, we can work with channels without losing any data. Suppose you have a function that writes to bufferData and another that reads from that channel. If you invoke the second function in the read scenario, no data will be lost.

generateData is a function that generates the car model and its id. This is an internal function not extended to our module. generateData launches a goroutine so that we always have enough data generated for insertion. The test scenario for this bench looks like this:


import datagen from "k6/x/datagen";
import tarantool from "k6/x/tarantool";

const conn1 = tarantool.connect("172.19.0.2:3301");
const conn2 = tarantool.connect("172.19.0.3:3301");

const baseScenario = {
  executor: "constant-arrival-rate",
  rate: 10000,
  timeUnit: "1s",
  duration: "1m",
  preAllocatedVUs: 100,
  maxVUs: 100,
};

export let options = {
  scenarios: {
    conn1test: Object.assign({ exec: "conn1test" }, baseScenario),
    conn2test: Object.assign({ exec: "conn2test" }, baseScenario),
  },
};

export const setup = () => {
  console.log("Run data generation in the background");
  datagen.generateData();
};

export const conn1test = () => {
  tarantool.call(conn1, "api_car_add", [datagen.getData()]);
};

export const conn2test = () => {
  tarantool.call(conn2, "api_car_add", [datagen.getData()]);
};

export const teardown = () => {
  console.log("Testing complete");
};

It got a little bigger. There's a new options variable that allows us to configure testing behavior. I created two scenarios and a dedicated function for each one. As the cluster consists of two nodes, we need to test simultaneous connection to these nodes. If you do that with a single function, which was the default earlier, you can't expect the cluster to be fully loaded. Every time unit, you send a request to the first router while the second one is idle, then you send a request to the second one while the first one is idle. Thus, performance goes down. However, it can be prevented, and we'll get back to it soon.

Now let's take a look at our testing scenarios. Under executor, we specify what type of testing we want to launch. If this value is set to constant-arrival-rate, the scenario will simulate a constant load. Suppose we want to produce 10,000 RPS for 100 virtual users during one minute. Let's use the database, not the console, to output the results, so that the information is then displayed on the dashboard:

With the objective of 10,000 RPS, we got only 8,600 RPS, which is not so bad. There was likely just not enough computing power on the client machine where the loader was located. I performed this test on my MacBook Pro (Mid 2020). Here is the data on latency and virtual users:

What about flexibility?

As far as flexibility is concerned, everything is perfect. Scenarios can be modified to check metrics, collect metrics, and more. In addition, you can optimize scenarios in one of the ways described below:

n connections — n scenarios

It is the basic scenario that we've discussed above:

const conn1 = tarantool.connect("172.19.0.2:3301");
const conn2 = tarantool.connect("172.19.0.3:3301");

const baseScenario = {
  executor: "constant-arrival-rate",
  rate: 10000,
  timeUnit: "1s",
  duration: "1m",
  preAllocatedVUs: 100,
  maxVUs: 100,
};

export let options = {
  scenarios: {
    conn1test: Object.assign({ exec: "conn1test" }, baseScenario),
    conn2test: Object.assign({ exec: "conn2test" }, baseScenario),
  },
};

n connections — 1 scenario

In this scenario, the connection to be tested is selected randomly at each iteration. The test unit is 1 second, which means that once per second, we randomly choose one connection among those declared:


const conn1 = tarantool.connect("172.19.0.2:3301");
const conn2 = tarantool.connect("172.19.0.3:3301");

const conns = [conn1, conn2];

const getRandomConn = () => conns[Math.floor(Math.random() * conns.length)];

export let options = {
  scenarios: {
    conntest: {
      executor: "constant-arrival-rate",
      rate: 10000,
      timeUnit: "1s",
      duration: "1m",
      preAllocatedVUs: 100,
      maxVUs: 100,
    },
  },
};

This scenario can be reduced to a single connection. To do so, we need to set up a TCP balancer (nginx, envoy, haproxy), but that's a story for another day.

n connections — n scenarios + restrictions and checks

You can use restrictions to control the obtained metrics. If the 95 percentile latency is greater than 100 ms, the test will be considered unsuccessful. You can set several restrictions for one parameter. You can also add checks, for example, to see what percentage of requests reached the server. The percentage rate is expressed as a number between 0 and 1:


const conn1 = tarantool.connect("172.19.0.2:3301");
const conn2 = tarantool.connect("172.19.0.3:3301");

const baseScenario = {
  executor: "constant-arrival-rate",
  rate: 10000,
  timeUnit: "1s",
  duration: "10s",
  preAllocatedVUs: 100,
  maxVUs: 100,
};

export let options = {
  scenarios: {
    conn1test: Object.assign({ exec: "conn1test" }, baseScenario),
    conn2test: Object.assign({ exec: "conn2test" }, baseScenario),
  },
  thresholds: {
    iteration_duration: ["p(95) < 100", "p(90) < 75"],
    checks: ["rate = 1"],
  },
};

n connections — n scenarios + restrictions and checks + sequential launch

The sequential launch scenario is the most sophisticated among those described in this article. Suppose you want to check n stored procedures without loading the system at that exact time. In this case, you might want to specify the time to start the tests, and you can do so in the second scenario. Keep in mind, however, that your first scenario may still be running at that moment. You can set the time limit for its execution via the gracefulStop parameter. If you set gracefulStop to 0 seconds, the first scenario will definitely be stopped by the time the second one starts:


const conn1 = tarantool.connect("172.19.0.2:3301");
const conn2 = tarantool.connect("172.19.0.3:3301");

const baseScenario = {
  executor: "constant-arrival-rate",
  rate: 10000,
  timeUnit: "1s",
  duration: "10s",
  gracefulStop: "0s",
  preAllocatedVUs: 100,
  maxVUs: 100,
};

export let options = {
  scenarios: {
    conn1test: Object.assign({ exec: "conn1test" }, baseScenario),
    conn2test: Object.assign({ exec: "conn2test", startTime: "10s" }, baseScenario),
  },
  thresholds: {
    iteration_duration: ["p(95) < 100", "p(90) < 75"],
    checks: ["rate = 1"],
  },
};

Performance in comparison to Yandex.Tank + Pandora

We compared both tools on the application described above. Yandex.Tank loaded the router CPU by 53% and the storage CPU by 32%, yielding 9,616 RPS. As for k6, it loaded the router CPU by 54% and the storage CPU by 40%, producing 9,854 RPS. These are the average data from 10 test runs.

Why is that so? Both Pandora and k6 are written in Go. However, despite these similar fundamentals, k6 allows you to test applications in a more programming-like manner.

Conclusion

k6 is a simple tool. Once you've learned how to use it, you can reconfigure it for any project and spend fewer resources. Start by creating a core module, and then attach logic to it. There's no need to rewrite tests from scratch because you can use modules from other projects.

k6 is also a lean tool for load testing. My test logic with the wrapper fit within just 50 lines of code. You can write custom modules to suit your business logic, scenarios, and client requirements.

k6 is about programming, not configuration files. You can try k6 out here and play around with the sample application here.

Get Tarantool on our website and feel free to ask questions in our Telegram chat.