This is a continuation of my series of articles about horizontal scaling of databases.
In the first part, we discussed these topics in theory, including consistent hashing, the pitfalls of traditional hashing, and the challenges that sharding introduces at the application layer. Check it out if you haven’t already before moving forward.
In this section, we will focus on the practical implementation of PostgreSQL sharding with a FastAPI backend application.
As a demonstration, we'll build a link shortener app to avoid distractions from the business logic and focus more on the infrastructure and concepts of distributed database systems.
This is the repository with the complete code - https://github.com/Artemooon/postgres-shards
Let's start with the infrastructure setup
First, we need to start a local cluster with Postgres instances. For this, we will use Docker.
Let's create a docker template for our single postgres instance
FROM postgres
COPY init_tables.sql /docker-entrypoint-initdb.d
Here we use the latest PostgreSQL image and also copy our SQL script to the /docker-entrypoint-initdb.d
directory.
When the container runs for the first time, Postgres automatically runs any .sql files found in /docker-entrypoint-initdb.d.
So when we start our shards this script creates needed tables for us
This is what we have in the init_tables.sql file, we just create a table where we plan to store our URLS.
Later, we will use URL_ID as our key for the hash ring.
CREATE TABLE URL_TABLE
(
id serial NOT NULL PRIMARY KEY,
URL text,
URL_ID varchar(5)
);
After you added these files run this command in the directory with your docker template:
docker build -t my-postgres-shard-image .
Then, to create and run the Docker container, run:
docker run --name pgshard1 -p 5434:5432 -d -e POSTGRES_PASSWORD=postgres pgshard
Note: Repeat this command for as many instances in your cluster as you want. Also, ensure that the port on the left side is unique for each instance.
Creating FastAPI app
We will use UV package and project manager to bootstrap our project.
- Install uv
pip install uv
- Create a new project
uv init sharding-app
cd sharding-app
- Add FastAPI and Uvicorn
uv add fastapi uvicorn
This will be the content of our main.py
file:
import uvicorn
from fastapi import FastAPI
from routes.url_routes import urls_router
app = FastAPI()
app.include_router(urls_router, prefix="/urls")
if __name__ == "__main__":
uvicorn.run("main:app", host="localhost", port=5001, reload=True)
Here we define a command to start our application using the Uvicorn server, and we also add a router for our URL endpoints
Then we need to create a connection to our postgres cluster. Since we are using an asynchronous Python framework, it is better to use an async PostgreSQL connection as well. The asyncpg module helps with this.
import asyncpg
db_configs = {
"5432": dict(user='postgres', password='postgres', database='postgres', host='127.0.0.1', port=5432),
"5433": dict(user='postgres', password='postgres', database='postgres', host='127.0.0.1', port=5433),
"5434": dict(user='postgres', password='postgres', database='postgres', host='127.0.0.1', port=5434),
}
def get_db_connector() -> callable:
connection_cache: dict[str, asyncpg.Connection] = {}
async def connector(port: str) -> asyncpg.Connection:
# Create or reuse connection
conn = connection_cache.get(port)
if not conn or conn.is_closed():
conn = await asyncpg.connect(**db_configs[port])
connection_cache[port] = conn
return conn
return connector
Let me explain this code:
This code defines an asynchronous database connection manager for multiple PostgreSQL instances using asyncpg.
db_configs
— a dictionary with connection settings for three PostgreSQL servers running on ports 5432, 5433, and 5434.
get_db_connector()
— returns a function (connector) that can be used to get a connection for a specific port.
Inside connector(port):
It checks if a connection for that port already exists and is still open.
If not, it creates a new async connection using asyncpg.connect() and caches it in connection_cache.
Then it returns the active connection.
Hash ring
Next, we need to use a hash ring to decide which database instance should handle a given request
from uhashring import HashRing
from db_connection import db_configs
db_hr = HashRing(nodes=list(db_configs.keys()))
We use HashRing from the uhashring library to distribute requests consistently across multiple PostgreSQL instances (running on ports 5432, 5433, and 5434). This ensures that each key is always mapped to the same database node, and when nodes are added or removed, only a minimal portion of keys need to be remapped.
Check out my previous article on consistent hashing, where I explained how hash rings work and why they’re ideal for distributed systems.
Adding endpoints and business logic
import hashlib
import base64
from fastapi import APIRouter, Depends, HTTPException
from db_connection import get_db_connector
from shard_hashing import db_hr
urls_router = APIRouter()
@urls_router.get("/{url_id}")
async def get_url(url_id: str, get_connection=Depends(get_db_connector)):
db_server_port = db_hr.get(url_id)["hostname"]
conn = await get_connection(db_server_port)
try:
url_data = await conn.fetchrow("SELECT URL, URL_ID FROM url_table WHERE URL_ID=$1", url_id)
except Exception as e:
raise HTTPException(status_code=500, detail=f"DB insert failed: {str(e)}")
finally:
await conn.close()
if not url_data:
raise HTTPException(status_code=404, detail=f"Url with id: {url_id} was not found")
return {
"url_id": url_data.get("url_id"),
"url": url_data.get("url"),
"server": db_hr.get(url_id)
}
@urls_router.post("")
async def create_url(full_url: str, get_connection=Depends(get_db_connector)):
hash_bytes = hashlib.sha256(full_url.encode('utf-8')).digest()
base64_hash = base64.b64encode(hash_bytes).decode('utf-8')
url_id = base64_hash[0:5]
db_server_port = db_hr.get(url_id)["hostname"]
conn = await get_connection(db_server_port)
try:
await conn.execute("INSERT INTO url_table (URL, URL_ID) VALUES($1, $2)", full_url, url_id)
except Exception as e:
raise HTTPException(status_code=500, detail=f"DB insert failed: {str(e)}")
return {
"url_id": url_id,
"url": full_url,
"server": db_hr.get(url_id)
}
In this final part, we define two FastAPI endpoints that use the hash ring and database connector to distribute and retrieve data across multiple PostgreSQL instances.
GET /{url_id}
This endpoint retrieves a stored URL by its unique url_id.
It uses the hash ring (db_hr) to determine which database node (port) should hold the record for that specific url_id.
It then fetches the connection from the async connection manager (get_db_connector).
Then SQL query retrieves the URL record.
If no record is found, it returns a 404 error; otherwise, it responds with the URL data and the database node info.
POST /
This endpoint creates a new short URL record.
It takes the original full_url, hashes it with SHA-256, and encodes it using Base64, taking the first few characters as a short, unique url_id.
The url_id
is then passed through the hash ring to select which PostgreSQL instance will store it.
The URL and its ID are inserted into that database.
Finally, it returns the created record along with the server node info.
When combined, this creates an application that works efficiently with a database cluster and uses consistent hashing to operate on the required databases.
Additional complications
As you saw, sharding adds additional complexity on the application layer by default. You now need to handle this complexity in your code, such as using consistent hashing, managing multiple database connectors, and ensuring proper data distribution across nodes.
Sharding also adds additional costs associated with maintaining multiple databases instead of one.
Testing our API
Let's start our app and perform some requests.
curl --location --request POST 'http://localhost:5001/urls?full_url=https://www.postgresql.org/'
curl --location --request POST 'http://localhost:5001/urls?full_url=https://www.google.com/'
curl --location --request POST 'http://localhost:5001/urls?full_url=https://www.dev.to/'
This will create new URLs in our database cluster. When creating new records, you may see a different server in the server information object because we pass the unique url_id
to the hash ring, which returns the server responsible for that hash.
When fetching a full URL by url_id
, it will always be retrieved from the same server, since the hash ring consistently maps that url_id
to the same node.
curl --location --request GET 'http://localhost:5001/urls/0OGWo'
Short demo
Conclusion
In this tutorial, we built a minimal working FastAPI application that uses database sharding by combining a hash ring for consistent key distribution and a connection manager for async PostgreSQL access. With this application, you now have a basic template and understanding of how horizontal database scaling works.
Top comments (0)