REST Data Service on YugabyteDB / PostgreSQL with PostgREST

#postgres #yugabytedb #rest #sql

In this series I defined an API Rate Limiter with a simple table + function. I still have the table rate_limiting_token_bucket and function rate_limiting_request created, you can find their definitions in Part 2. To call it, I'd created a small Java program in Part 3 but we don't need it here. I'll call my function from a REST API and this very simple to implement, with no code. I'll use PostgREST, a standalone web server that turns your PostgreSQL database directly into a RESTful API. And what works on PostgreSQL can also run on YugabyteDB. I have a 3 nodes cluster setup and I'll install PostgREST directly on each node, connecting to the local endpoint. YugabyteDB is a distributed SQL database, you can connect to any node and see the whole database because there is a stateless PostgreSQL on top of the distributed storage and transaction layer.

The simplest install ever

The install is easy, just check the latest release and it extracts to a single executable:

curl -qsL https://github.com/PostgREST/postgrest/releases/download/v9.0.0/postgrest-v9.0.0-linux-static-x64.tar.xz | \
tar -xJvf -

This creates a 16MB postgrest binary in the current directory.

The simplest config ever

Better than an online help, the executable can generate a configuration file template ready to customize:

./postgrest -e > my_rest_configuration.conf

You can update this configuration, defining at least the connection url db-uri in the libpq format.

But I'll do even simpler for this demo, using environment variables only:

PGRST_DB_URI="postgres://yugabyte:yugabyte@$(hostname):5433/yugabyte" \
PGRST_DB_SCHEMAS="public" PGRST_DB_ANON_ROLE="yugabyte" \
PGRST_DB_CHANNEL_ENABLED=false ./postgrest &

This starts the PostgREST server with a pool of connections to the local endpoint, database "yugabyte", schema "public". I disabled the "DB Channel" because LISTEN/NOTIFY is not enabled in YugabyteDB (this mecanism is not appropriate for a distributed database).

If you created the procedure after starting the PostgREST deamon, you need to refresh the cache with: pkill -10 postgrest to signal for cache re-load.

I'm keeping it simple for this demo, but of course, it can be configured for performance and security. With all defaults, it listens to port 3000.

REST calls

I can now use this REST proxy to call my rate limiting function.

There is nothing more to define because PostgREST reads the schema. The namespace of the stored procedures and functions is exposed in http://localhost:3000/rpc and the parameters are passed as a JSON document. Here is a call getting a token for "user1" with a refill rate of 1 token per second:

echo $( 
curl -qs \
http://localhost:3000/rpc/rate_limiting_request \
-X POST -H "Content-Type: application/json" -d '
{ "rate_id": "user1", "refill_per_sec": 1 }
')

Here is the result:

I have 60 tokens remaining because my function has a default window of 60 seconds, filled with the rate of 1 tocken per second.

Threads

From my Java program and with this cluster, I was able to get 1000 tokens per second from threads requesting tokens from different users and 120 per second. Here is a python script to do the same though the REST API:

import threading
import time
import requests
import sys

def request_tokens(id,num):
 tokens=0
 start = time.time()
 for i in range(1,num):
  response=requests.post('http://localhost:3000/rpc/rate_limiting_request', json={ "rate_id": id, "refill_per_sec": 1000000 })
  if response.status_code==200:
   tokens=tokens+1
  print("{0} tokens/s".format(tokens/(time.time()-start)))

threads=[]
for i in range(0,int(sys.argv[1])):
 threads.append(threading.Thread(target=request_tokens,args=(i,1000)))
for t in threads:
    t.start()
total=0
for t in threads:
    t.join()

This runs 1000 loops in 50 thread:

python franck.py 50

What is interesting is that the overhead is very small:

On this node, the postgrest deamon uses less CPU than the calling program python and the database itself yb-tserver. You can see 10 postgres backends because PostgREST has a connection pool, which is the best to guarantee predictable performances.