Franck Pachot for YugabyteDB

Posted on Aug 19, 2021 • Edited on Sep 19, 2021

pgbench --client --jobs

#postgres #pgbench

In this post I'll explain, with examples, the following options of pgbench: --client and --jobs used when running concurrent activity. Because their name is misleading: --client is about the number of servers and --jobs about the number of clients 🤨

Added after listening to Creston Jamison review:
For a benchmaking tool, I like to think about resources and where they are allocated. A database connection (--client) takes resources on the database server and application threads (--jobs) takes resources on the database client and that's why I find those names misleading.

I'm using the short options in the examples below, here are the equivalents:


pgbench --help | grep -E -- " -[stTcjfn],"

  -n, --no-vacuum          do not run VACUUM during initialization
  -s, --scale=NUM          scaling factor
  -f, --file=FILENAME[@W]  add script FILENAME weighted at W (default: 1)
  -c, --client=NUM         number of concurrent database clients (default: 1)
  -j, --jobs=NUM           number of threads (default: 1)
  -n, --no-vacuum          do not run VACUUM before tests
  -s, --scale=NUM          report this scale factor in output
  -t, --transactions=NUM   number of transactions each client runs (default: 10)
  -T, --time=NUM           duration of benchmark test in seconds

In order to run something simple and predictable, I'll use a custom script which simply waits one second in database: select pg_sleep(1) and because I like one-liners I pass it though STDIN:


pgbench -T 30 -nf /dev/stdin <<< "select pg_sleep(1)"

transaction type: /dev/stdin
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 10/10
latency average = 1005.139 ms
tps = 0.994887 (including connections establishing)
tps = 0.995271 (excluding connections establishing)

I've set it to run 30 seconds and, without surprises, it has run with 1 transaction per second given that I have 1 thread running 10 transactions through 1 client connection.
Those are the defaults -j 1 -c 1. I'll run with different values.

`-c` `--client` number of concurrent database clients (default: 1)

This is the most important to control the load on the database. Each client is a connection to the DB, which means a backend process, and transactions are executed concurrently. Let's run the same as above, now with 2 clients:

pgbench -j 1 -c 2 -T 30 -nf /dev/stdin <<< "select pg_sleep(1)"

transaction type: /dev/stdin
scaling factor: 1
query mode: simple
number of clients: 2
number of threads: 1
duration: 30 s
number of transactions actually processed: 60
latency average = 1004.676 ms
tps = 1.990692 (including connections establishing)
tps = 1.990951 (excluding connections establishing)

I can achieve 2 transactions per second now, still with the 1 second latency query. Increasing the number of clients will linearly increase the throughput if there are no bottlenecks elsewhere. This is where we say "it scales".

Because a sleep(1) doesn't take lot of resources, I still have the same latency with 100 connections:

pgbench -j 1 -c 100 -T 30 -nf /dev/stdin <<< "select pg_sleep(1)"

transaction type: /dev/stdin
scaling factor: 1
query mode: simple
number of clients: 100
number of threads: 1
duration: 30 s
number of transactions actually processed: 2900
latency average = 1056.748 ms
tps = 94.629933 (including connections establishing)
tps = 94.652406 (excluding connections establishing)

Asynchronous libq

But you can see that I still have 1 thread (--jobs=1) default here. How can I run through 100 connections (aka server threads aka --client) concurrently running transactions from 1 client thread (aka --jobs)?

Here is a trace of interesting system calls about the communication with the database, with 3 clients from 1 thread:

strace -T -s 1000 -e trace=sendto,recvfrom,pselect6 -yy -o /dev/stdout pgbench -j 1 -c 3 -t 1 -nf /dev/stdin <<< "select pg_sleep(1)" | grep 5432

sendto(3<TCPv6:[[::1]:41360->[::1]:5432]>, "Q\0\0\0\30select pg_sleep(1)\n\0", 25, MSG_NOSIGNAL, NULL, 0) = 25 <0.000116>
recvfrom(3<TCPv6:[[::1]:41360->[::1]:5432]>, 0xaaac8fb02dd0, 16384, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) <0.000008>
sendto(4<TCPv6:[[::1]:41362->[::1]:5432]>, "Q\0\0\0\30select pg_sleep(1)\n\0", 25, MSG_NOSIGNAL, NULL, 0) = 25 <0.000037>
recvfrom(4<TCPv6:[[::1]:41362->[::1]:5432]>, 0xaaac8fb11750, 16384, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) <0.000007>
sendto(5<TCPv6:[[::1]:41364->[::1]:5432]>, "Q\0\0\0\30select pg_sleep(1)\n\0", 25, MSG_NOSIGNAL, NULL, 0) = 25 <0.000026>
recvfrom(5<TCPv6:[[::1]:41364->[::1]:5432]>, 0xaaac8fb1c180, 16384, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) <0.000005>

pselect6(6, [3<TCPv6:[[::1]:41360->[::1]:5432]> 4<TCPv6:[[::1]:41362->[::1]:5432]> 5<TCPv6:[[::1]:41364->[::1]:5432]>], NULL, NULL, NULL, NULL) = 2 (in [4 5]) <1.001285>

recvfrom(4<TCPv6:[[::1]:41362->[::1]:5432]>, "T\0\0\0!\0\1pg_sleep\0\0\0\0\0\0\0\0\0\10\346\0\4\377\377\377\377\0\0D\0\0\0\n\0\1\0\0\0\0C\0\0\0\rSELECT 1\0Z\0\0\0\5I", 16384, 0, NULL, NULL) = 65 <0.000009>
sendto(4<TCPv6:[[::1]:41362->[::1]:5432]>, "X\0\0\0\4", 5, MSG_NOSIGNAL, NULL, 0) = 5 <0.000327>
recvfrom(5<TCPv6:[[::1]:41364->[::1]:5432]>, "T\0\0\0!\0\1pg_sleep\0\0\0\0\0\0\0\0\0\10\346\0\4\377\377\377\377\0\0D\0\0\0\n\0\1\0\0\0\0C\0\0\0\rSELECT 1\0Z\0\0\0\5I", 16384, 0, NULL, NULL) = 65 <0.000009>
sendto(5<TCPv6:[[::1]:41364->[::1]:5432]>, "X\0\0\0\4", 5, MSG_NOSIGNAL, NULL, 0) = 5 <0.000029>
pselect6(4, [3<TCPv6:[[::1]:41360->[::1]:5432]>], NULL, NULL, NULL, NULL) = 1 (in [3]) <0.000009>
recvfrom(3<TCPv6:[[::1]:41360->[::1]:5432]>, "T\0\0\0!\0\1pg_sleep\0\0\0\0\0\0\0\0\0\10\346\0\4\377\377\377\377\0\0D\0\0\0\n\0\1\0\0\0\0C\0\0\0\rSELECT 1\0Z\0\0\0\5I", 16384, 0, NULL, NULL) = 65 <0.000014>
sendto(3<TCPv6:[[::1]:41360->[::1]:5432]>, "X\0\0\0\4", 5, MSG_NOSIGNAL, NULL, 0) = 5 <0.000035>

I can clearly see 3 calls sendto(...->...5432...Q...select pg_sleep(1) returning immediately. Then pselect6(...:5432...:5432...:5432)...<1.001285> waiting for the first response from one of them, which takes 1 second. And then receiving the results with recvfrom(...) from each one.

Those are asynchronous calls and I know many developers expecting that for years in other databases.

If I add -k to strace I can get the call stack:

sendto(5<TCPv6:[[::1]:42120->[::1]:5432]>, "Q\0\0\0\30select pg_sleep(1)\n\0", 25, MSG_NOSIGNAL, NULL, 0) = 25 <0.000031>
 > /usr/lib64/libpthread-2.28.so(__send+0x34) [0x11a2c]
 > /usr/lib64/libpq.so.5.13(pqsecure_raw_write+0x6f) [0x1f52f]
 > /usr/lib64/libpq.so.5.13(pqSendSome+0x77) [0x19547]
 > /usr/lib64/libpq.so.5.13(PQsendQuery+0x7b) [0x14f3b]
 > /usr/bin/pgbench(threadRun+0x12e7) [0x84b7]
 > /usr/bin/pgbench(main+0x16a7) [0x4867]
 > /usr/lib64/libc-2.28.so(__libc_start_main+0xe3) [0x20e63]
 > /usr/bin/pgbench(_start+0x33) [0x5653]
 > /usr/bin/pgbench(_start+0x33) [0x5653]
 > No DWARF information found
recvfrom(5<TCPv6:[[::1]:42120->[::1]:5432]>, 0xaaadd068c180, 16384, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) <0.000004>
 > /usr/lib64/libpthread-2.28.so(recv+0x34) [0x11864]
 > /usr/lib64/libpq.so.5.13(pqsecure_raw_read+0x3b) [0x1f233]
 > /usr/lib64/libpq.so.5.13(pqReadData+0xab) [0x192fb]
 > /usr/lib64/libpq.so.5.13(PQconsumeInput+0x23) [0x1536b]
 > /usr/bin/pgbench(threadRun+0x853) [0x7a23]
 > /usr/bin/pgbench(main+0x16a7) [0x4867]
 > /usr/lib64/libc-2.28.so(__libc_start_main+0xe3) [0x20e63]
 > /usr/bin/pgbench(_start+0x33) [0x5653]
 > /usr/bin/pgbench(_start+0x33) [0x5653]
 > No DWARF information found
pselect6(6, [3<TCPv6:[[::1]:42116->[::1]:5432]> 4<TCPv6:[[::1]:42118->[::1]:5432]> 5<TCPv6:[[::1]:42120->[::1]:5432]>], NULL, NULL, NULL, NULL
) = 1 (in [3]) <1.000339>
 > /usr/lib64/libc-2.28.so(__select+0x74) [0xcaa5c]
 > /usr/bin/pgbench(threadRun+0x14c3) [0x8693]
 > /usr/bin/pgbench(main+0x16a7) [0x4867]
 > /usr/lib64/libc-2.28.so(__libc_start_main+0xe3) [0x20e63]
 > /usr/bin/pgbench(_start+0x33) [0x5653]
 > /usr/bin/pgbench(_start+0x33) [0x5653]
 > No DWARF information found
recvfrom(3<TCPv6:[[::1]:42116->[::1]:5432]>, "T\0\0\0!\0\1pg_sleep\0\0\0\0\0\0\0\0\0\10\346\0\4\377\377\377\377\0\0D\0\0\0\n\0\1\0\0\0\0C\0\0\0\rSELECT 1\0Z\0
\0\0\5I", 16384, 0, NULL, NULL) = 65 <0.000042>
 > /usr/lib64/libpthread-2.28.so(recv+0x34) [0x11864]
 > /usr/lib64/libpq.so.5.13(pqsecure_raw_read+0x3b) [0x1f233]
 > /usr/lib64/libpq.so.5.13(pqReadData+0xab) [0x192fb]
 > /usr/lib64/libpq.so.5.13(PQconsumeInput+0x23) [0x1536b]
 > /usr/bin/pgbench(threadRun+0x853) [0x7a23]
 > /usr/bin/pgbench(main+0x16a7) [0x4867]
 > /usr/lib64/libc-2.28.so(__libc_start_main+0xe3) [0x20e63]
 > /usr/bin/pgbench(_start+0x33) [0x5653]
 > /usr/bin/pgbench(_start+0x33) [0x5653]
 > No DWARF information found

The libpq C library used by pgbench has an asynchronous API with PQsendQuery/pqReadData

`-j` `--jobs` number of threads (default: 1)

They why running multiple threads on the client? You probably don't need to as one thread can handle hundred of asynchronous calls.

First, the threads cannot share the connections so you cannot have more client threads than server connections:

pgbench -j 2 -c 1 -T 30 -nf /dev/stdin <<< "select pg_sleep(1)"

transaction type: /dev/stdin
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 30 s
number of transactions actually processed: 30
latency average = 1005.422 ms
tps = 0.994607 (including connections establishing)
tps = 0.994731 (excluding connections establishing)

This has just ignored the --jobs to set it to the same as --client (you see that in "number of threads: 1"). Actually, the connections defined by --client are distributed among the threads defined by --jobs and it makes no sense to have threads with no connections. However you can have many connections per threads as we have seen below. This will still stress the database with concurrent executions thanks to asynchronous calls.

So what's the point with --jobs? My example was running a script that takes long in the database (1 second) when compared to the client work and that's why one client thread --jobs=1 can serve many connections --client=100 without being the bottleneck. However, if you run really short queries to many connections, the work on client side can be significant. And as the goal of pgbench is to stress the database, you may need more threads. Don't forget that if you want to stress the CPU you will probably not run pgbench on the database server. But then you need more connection because there's a network component in the latency.

I'm taking an extreme example here where my custom script doesn't even call the database but takes 1 second of client time:

pgbench -j 1 -c 100 -T 30 -nf /dev/stdin <<< "\shell sleep 1"
transaction type: /dev/stdin

scaling factor: 1
query mode: simple
number of clients: 100
number of threads: 1
duration: 30 s
number of transactions actually processed: 29
latency average = 446722.970 ms
tps = 0.223852 (including connections establishing)
tps = 0.223857 (excluding connections establishing)

My unique thread throttles the throughput: During 30 seconds, only 30 transactions are possible on one thread when the client-side processing takes 1 second.

key points:

-c --client is what drives the number of sessions on the server
-j --jobs can be used if the coordination from pgbench is a bottleneck

I'll share more about pgbench. Because benchmarks means nothing if we don't understand exactly what is run and how. And pgbench, using libpq, with custom scripts, is great to show the different ways to run SQL efficiently. So I'm flagging this the first post of a series.

DEV Community

pgbench --client --jobs

`-c` `--client` number of concurrent database clients (default: 1)

Asynchronous libq

`-j` `--jobs` number of threads (default: 1)

key points:

Top comments (0)

Read next

New AI Method Cuts Image Learning Costs by 30% While Boosting Accuracy

AI System Achieves 85% Accuracy in Automated Heart Valve Surgery Planning

AI Models Still Far from Earning $1M in Real Programming Jobs, New Study Shows

New Method Reveals How AI Models Process Protein Sequences, Enables Control of Predictions

-c --client number of concurrent database clients (default: 1)

Asynchronous libq

-j --jobs number of threads (default: 1)

key points:

Read next

New AI Method Cuts Image Learning Costs by 30% While Boosting Accuracy

AI System Achieves 85% Accuracy in Automated Heart Valve Surgery Planning

AI Models Still Far from Earning $1M in Real Programming Jobs, New Study Shows

New Method Reveals How AI Models Process Protein Sequences, Enables Control of Predictions

`-c` `--client` number of concurrent database clients (default: 1)

`-j` `--jobs` number of threads (default: 1)