loading...
Cover image for Add chaos in your network!
Zenika

Add chaos in your network!

vga profile image Victor Gallet Updated on ・4 min read

An important rule in production is to never trust your network! There will always be problems and Netflix has clearly understood this statement by creating Chaos Monkey in 2011.
In this article, we will focus on chaos in the network. For example to assert that:

  • an application is resilient to network latency
  • a website still offers a comfortable experience despite a limited bandwidth
  • a distributed system cannot go into split-brain.

tc is a Linux command to manage network traffic. For example:

tc qdisc add dev eth0 root netem delay 200ms

This command add 200 milliseconds latency on the network of eth0 interface. Let’s dissect it.

As quoted from linux man page, qdisc is

short for 'queueing discipline' and it is elementary to understanding traffic control.
Whenever the kernel needs to send a packet to an interface, it is enqueued to the qdisc configured for that interface.
Immediately afterwards, the kernel tries to get as many packets as possible from the qdisc, for giving them to the network adaptor driver.

add is to add a new rule.

dev eth0 means the rule is applied to the device, the network interface eth0.

root is the class attached to the network packet. In this case, the rule is applied to all packets.

netem means Network Emulator. It’s the tool to add a behaviour.

delay 200ms is the rule to apply. Here it’s a 200 milliseconds latency.

Once the rule is applied, we can list all the rules applied to eth0.

tc qdisc show dev eth0

And to delete it

tc qdisc del dev eth0 root

In the first example, the latency impacts the whole network, including a ssh connection. In order to not impact all the network but only a port, an IP or a range of IP, it’s possible to use the class concept of qdisc.
By default, a qdisc is divided into 3 bands: 0, 1 et 2. This command helps to see these bands:

tc qdisc ls

qdisc noqueue 0: dev lo root refcnt 2 
qdisc pfifo_fast 0: dev eth0 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev eth1 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1

On my computer, eth0 has 3 bands with a priority map of 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1.
The band used is determined by network packet TOS. TOS stands for Type Of Service. This is simply four bits indicating the type of priority.

Binary Decimal  Meaning
-----------------------------------------
1000   8         Minimize delay (md)
0100   4         Maximize throughput (mt)
0010   2         Maximize reliability (mr)
0001   1         Minimize monetary cost (mmc)
0000   0         Normal Service

Here’s the table used to determine the band. There is more explanation of how it works directly on the man page.

First of all, a rule is added to change the priority map. All the traffic will go to the first band.

tc qdisc add dev eth0 root handle 1: prio priomap 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Next, this rule defines the behaviour added but only on the third band. Here it’s a delay of 1000 milliseconds.

tc qdisc add dev eth0 parent 1:2 handle 20: netem delay 1000ms

Finally, a filter is applied to redirect all the traffic matching a destination IP to the third band. To ensure it works correctly, a simple ping to that IP does the trick.

tc filter add dev eth0 parent 1:0 protocol ip u32 match ip dst <mon ip> flowid 1:2
64 bytes from <server> (<mon ip>): icmp_seq=143 ttl=62 time=1000 ms
64 bytes from <server> (<mon ip>): icmp_seq=144 ttl=62 time=1000 ms
64 bytes from <server> (<mon ip>): icmp_seq=145 ttl=62 time=1000 ms
64 bytes from <server> (<mon ip>): icmp_seq=146 ttl=62 time=1000 ms
64 bytes from <server> (<mon ip>): icmp_seq=147 ttl=62 time=1001 ms

The rule is correctly applied. Here it’s using a filter on an IP but it works with a subnet mask and/or a port. For example:

tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 match ip dst <mon IP> match ip dport <mon port> 0xffff flowid 1:2

Other options

Here some other possibilities.

netem delay 100ms 10ms

this rule adds 100 milliseconds of delay plus or less 10 milliseconds in an uniform disribution.

netem loss 10%

10% of packets are loss.

netem duplicate 50%

50% of packets are duplicated

netem corrupt 5%

5% of packets are corrupt.

Limit bandwidth

Finally, the tool Token Bucket Filter (TBF) enables to limit the outgoing bandwidth.

tbf rate 20kbit buffer 1600 limit 3000

The outgoing bandwidth is limited to 20 kilobits.

A big thanks to Marc Barret for his time and proofreading.

Photo by American Public Power Association on Unsplash

Posted on by:

Zenika

We are a software development company whose mission is to drive change via IT innovation. Many of our consultants have written books, do open-source contributions, teach classes and speak at popular meet-ups and conferences.

Discussion

markdown guide