DEV Community

Cover image for Simulate Clock Skew in Docker Container
Franck Pachot for YugabyteDB

Posted on

Simulate Clock Skew in Docker Container

In real deployments, without atomic clocks, the time synchronized by NTP can drift, and servers in a distributed system can show a clock skew of hundreds of milliseconds. A simple way to test this in a Docker lab is to fake the clock_gettime function. Here is an example with a 2-node RF1 YugabyteDB cluster (PostgreSQL-compatible Distributed SQL database).

I create a yb network and start the first node, yb1 in the background:

docker network create yb
docker run -d  --rm --network yb --hostname yb1 -p 7000:7000 yugabytedb/yugabyte yugabyted start --background=false --tserver_flags="TEST_docdb_log_write_batches=true"

Enter fullscreen mode Exit fullscreen mode

I start a shell in a second node:

docker run -it --rm --network yb --hostname yb2              yugabytedb/yugabyte bash

Enter fullscreen mode Exit fullscreen mode

In this container, I wait to be sure that yb1 is up and start yb2 that joins yb1

until postgres/bin/pg_isready -h yb1.yb ; do sleep 1 ; done
yugabyted start --join yb1.yb --tserver_flags="TEST_docdb_log_write_batches=true"

Enter fullscreen mode Exit fullscreen mode

Here, running on the same host, both containers show the same Physical Time in http://localhost:7000/tablet-server-clocks
Image description

I install gcc and compile a fake_clock_gettime.so that overrides clock_gettime, calls the original one, and subtracts 499 milliseconds to its result:

cat > fake_clock_gettime.c <<'C'
#define _GNU_SOURCE
#include <stdlib.h>
#include <dlfcn.h>
int clock_gettime(clockid_t clk_id, struct timespec *tp)
{
  static int skew_millisecond = 499;
  static int (*origin_clock_gettime)();
  static int ret;
  // define the real clock_gettime and call it
  if(!origin_clock_gettime) {
   origin_clock_gettime = (int (*)()) dlsym(RTLD_NEXT, "clock_gettime");
  }
  ret=origin_clock_gettime(clk_id,tp);
  // add clock skew and return
  if (tp->tv_nsec >= skew_millisecond * 1000000 ) {
      tp->tv_nsec -= skew_millisecond * 1000000  ;
  } else {
      tp->tv_sec -= 1;
      tp->tv_nsec += 1000000000 - skew_millisecond * 1000000 ;
  }
  return(ret);
}
C

dnf install -y gcc

gcc -o fake_clock_gettime.so -fPIC -shared fake_clock_gettime.c -ldl

Enter fullscreen mode Exit fullscreen mode

This library can be loaded with LD_PRELOAD, and I test it by calling date:

[root@yb2 yugabyte]# date +"%T:%N" ; LD_PRELOAD=$PWD/fake_clock_gettime.so date +"%T:%N" ; date +"%T:%N"
21:31:44:015385334
21:31:43:518894559
21:31:44:020271039
[root@yb2 yugabyte]# date +"%T:%N" ; LD_PRELOAD=$PWD/fake_clock_gettime.so date +"%T:%N" ; date +"%T:%N"
21:31:45:955772746
21:31:45:459189786
21:31:45:960576587

Enter fullscreen mode Exit fullscreen mode

The date called with the library shows a lower time.

I restart YugabyteDB on yb2 with this hack:

yugabyted stop
LD_PRELOAD=$PWD/fake_clock_gettime.so yugabyted start

Enter fullscreen mode Exit fullscreen mode

I can see the clock skew on the Physical Time and Hybrid Time:
Image description

I run some workload that involves tablets in both nodes to get some Lamport logical clock synchronization:

/home/yugabyte/postgres/bin/ysql_bench -i -h $(hostname) -s 10

Enter fullscreen mode Exit fullscreen mode

With the messaging between the nodes, the Physical Time still shows a clock skew, but the Logical Time is synchronized:
Image description

If you are curious, here is more information about clock synchronisation in distributed databases: https://www.yugabyte.com/blog/evolving-clock-sync-for-distributed-databases/

Top comments (0)