DEV Community: Meg528

Approaches to tenancy in Postgres

Meg528 — Mon, 18 May 2026 18:05:36 +0000

By Simeon Griggs

Multi-tenancy is a term used across various kinds of technical infrastructure, including application hosting, compute, databases, and more.

For example, you may purchase cloud services from a provider, but your account is one of many that draws from a common pool of resources. Your account is one "tenant" in a multi-tenant infrastructure.

In this article, we're focusing on using a single Postgres database cluster to serve an application with many tenants—you are our customer, and your customers are tenants in that cluster.

Note: PlanetScale databases are, by default, multi-tenant within our infrastructure. Single-tenant resources are available on Enterprise. See the deployment options documentation for more details.

Given the many approaches to multi-tenancy within a Postgres database, it is worth clarifying the recommended best practices and the data models you should avoid. These recommendations are informed by years of seeing multi-tenant applications, both good and bad, succeed and fail at scale.

Definitions

The term "database" is overloaded and can refer to different things:

A Database Cluster refers to the entire database server instance – the running Postgres process, its storage and any replicas.
A Logical Database is an isolated namespace within a database cluster that contains its own schemas, tables, and data.

When you generate credentials to connect to a database, you're connecting to the database cluster. The queries you perform will target a single logical database within it. On PlanetScale Postgres, the default logical database name is postgres.

In short: one database cluster can contain many logical databases.

When modeling data in a relational database:

A Tenant refers to a single entity that accesses their own subset of data in your application.
Single-tenancy refers to giving each tenant their own isolated schema, logical database, or database cluster.
Multi-tenancy refers to using a consistent schema (set of tables and relationships) for all of the users of your application within a single database cluster.

Three approaches to tenant isolation

There are three common approaches to separating tenant data within a single database cluster:

Row-level isolation where each tenant's data is isolated by a column value such as tenant_id
Schema-per-tenant where each tenant has its own schema and tables
Database-per-tenant where each tenant has its own logical database, schema, and tables

Of the three approaches, row-level isolation is the most common and is our recommended approach.

Row-level isolation is also the only true method of "multi-tenancy" in a relational database. Schema-per-tenant and database-per-tenant within the same database cluster do not share tables, but they do share resources.

Finally, you may already be running a database using one schema-per-tenant. You may be able to migrate to a recommended approach to improve the performance of your application and workloads. See Migrating to row-level multi-tenancy.

Good examples for multi-tenancy

Good examples of multi-tenancy include SaaS applications that need to isolate data for each customer but have so many customers that it would be impractical to assign each customer to an individual database cluster. Or multi-national applications that need to isolate data for each country, market, or region.

These are good use cases for multi-tenancy because only the data is different between tenants. The schema, tables, relationships, application code and access patterns are uniform across all tenants.

With any multi-tenancy approach, your goal should be for data belonging to each tenant to be consumed by the same applications, with care to ensure that one tenant cannot query another tenant's data nor that their behavior in your application could jeopardize the experience of another tenant.

Note: These recommendations assume all tenants share the same schema. If tenants genuinely need different schema structures, schema-per-tenant or database-per-tenant is the better fit.

Row-level isolation

Recommended. This is the most common, general-purpose method for combining tenants in a single database.

All data is stored in a single database cluster
All tenants share the same schema and tables
Each tenant's data is isolated with a column such as tenant_id

With row-level isolation, each tenant shares the same schema and tables, but has its own data.

This is the simplest model conceptually and the most scalable approach to multi-tenancy.

CREATE TABLE orders (
    id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    tenant_id BIGINT NOT NULL,
    customer_name TEXT,
    created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
    total NUMERIC
);

-- tenant_id should lead most indexes
CREATE INDEX idx_orders_tenant_created ON orders(tenant_id, created_at DESC);

-- Insert data for different tenants into the same table
INSERT INTO orders (tenant_id, customer_name, total) VALUES (1, 'Alice', 49.99);
INSERT INTO orders (tenant_id, customer_name, total) VALUES (2, 'Hans', 59.99);

-- Every query must filter by tenant
SELECT * FROM orders WHERE tenant_id = 1;

Depending on the size of your tables, row-level isolation can easily scale to many thousands of tenants. Migrations and schema changes need only be applied to a single table to update all tenants. Querying across tenants is simple and efficient.

Modeling tenants

In most multi-tenant applications, tenants have metadata beyond just an ID — a name, a region, etc. A dedicated tenants table gives you a place to store this and lets the tenant_id column across your schema remain a compact, performant BIGINT foreign key.

CREATE TABLE tenants (
    id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    code VARCHAR(2) UNIQUE NOT NULL, -- 'uk', 'de'
    name TEXT NOT NULL               -- 'United Kingdom', 'Germany'
);

Using a BIGINT for tenant_id is preferred over text-based identifiers. A BIGINT is faster to compare than a string and is a stable identifier that won't need to change if a tenant rebrands or a region code is restructured.

The column name tenant_id is a common one, but not a required naming convention. For example, a social media application may use the column user_id for the same purpose.

Enforcing tenant filtering

The inherent risk of row-level isolation is that every query must include WHERE tenant_id = ?. Rather than relying on each query to add this manually, use ORM global scopes, middleware, or a shared data access layer to inject the tenant filter automatically.

Postgres also offers Row-Level Security (RLS) as an optional, additional layer of defense. RLS automatically appends a filter to every query on a table based on a session variable. In the example below, RLS ensures that queries are scoped to the current tenant without relying on the application to include the filter.

-- Create a non-superuser role for the application
CREATE ROLE app_user LOGIN PASSWORD 'secret';
GRANT SELECT, INSERT, UPDATE, DELETE ON orders TO app_user;

-- Enable RLS and define the policy
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;
ALTER TABLE orders FORCE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation ON orders
    USING (tenant_id = current_setting('app.current_tenant')::BIGINT);

-- At runtime, your app sets the tenant context per request
BEGIN;
SET LOCAL app.current_tenant = '1';
-- SET LOCAL ensures the setting is scoped to this transaction
-- which is important when using connection pooling.
-- Only returns orders for tenant_id = 1

SELECT * FROM orders;
COMMIT;

We generally don't recommend relying on RLS. It shifts security logic into the database, where policy misconfiguration, silent failures, and connection pooling interactions are difficult to debug. Keep tenant isolation enforced in your application code.

Partitioning

With all data stored in a single table, as your database scales and your tenant count grows, row-level isolation can be further optimized by partitioning the table. The tenant_id column, which is used to partition the data, is an ideal partition key.

Partitioning is a Postgres feature that splits a single logical table into multiple sub-tables based on a column value. Your application queries don't need to target a specific partition, as Postgres will automatically route the query to the correct one.

In practice, you would only partition tables that grow large enough to benefit from it. A messages table with billions of rows is a strong candidate for partitioning by tenant, but not a small reference table like office_locations with only thousands of rows.

Note that Postgres requires the partition key to be part of the primary key on partitioned tables.

-- Create a partitioned table
CREATE TABLE orders (
    id BIGINT GENERATED ALWAYS AS IDENTITY,
    tenant_id BIGINT NOT NULL,
    customer_name TEXT,
    total NUMERIC,
    PRIMARY KEY (tenant_id, id)
) PARTITION BY LIST (tenant_id);

-- Create a partition for each tenant
-- All rows with tenant_id=1 (UK) go into 'orders_tenant_1'
CREATE TABLE orders_tenant_1 PARTITION OF orders FOR VALUES IN (1);
-- All rows with tenant_id=2 (DE) go into 'orders_tenant_2'
CREATE TABLE orders_tenant_2 PARTITION OF orders FOR VALUES IN (2);

-- Your application doesn't know or care about partitions
INSERT INTO orders (tenant_id, customer_name, total) VALUES (1, 'Alice', 49.99);
-- Postgres automatically routes this to orders_tenant_1

SELECT * FROM orders WHERE tenant_id = 1;
-- Postgres only scans orders_tenant_1 (partition pruning)

Partitioning can greatly improve performance and scalability by reducing the amount of data that needs to be scanned and the size of indexes. Internal processes such as vacuuming and index maintenance are also performed on a per-partition basis.

This adds to operational overhead, as you will need to create a new partition for each tenant.

Row-level isolation with partitioning offers some of the benefits of database-per-tenant multi-tenancy with lower operational overhead.

Tenant data lifecycle

With partitioning, onboarding each new tenant requires creating a new partition.

Partitioning simplifies offboarding tenants: you can drop the partition, and all data for that tenant is deleted.

-- Wrap in a transaction in case the DROP fails
BEGIN;
ALTER TABLE orders DETACH PARTITION orders_tenant_1;
DROP TABLE orders_tenant_1;
COMMIT;

Without partitioning, a new tenant's data can be inserted into a table with no schema changes or migrations.

However, removing tenants requires doing table-level delete operations, which can generate a significant number of dead tuples and increase vacuum pressure.

DELETE FROM orders WHERE tenant_id = 1;

Schema-per-tenant

Generally not recommended. Schema-per-tenant has a few benefits but does not work well at scale.

All data is stored in a single database cluster
Each tenant has its own schema and tables
Each tenant's schema and data are isolated by the schema name as a prefix to the table name

The appeal of this approach is greater isolation, since your queries do not need to filter on a specified tenant_id column. Instead, your application can reuse the same queries but with a different search_path to target the correct tenant's data.

-- Create the schemas
CREATE SCHEMA uk;
CREATE SCHEMA de;

-- Each gets identical tables
CREATE TABLE uk.orders (id BIGINT PRIMARY KEY, customer_name TEXT, total NUMERIC);
CREATE TABLE de.orders (id BIGINT PRIMARY KEY, customer_name TEXT, total NUMERIC);

-- At runtime, your app sets the search path per request
BEGIN;
SET LOCAL search_path TO uk;
SELECT * FROM orders;  -- returns uk.orders data
COMMIT;

BEGIN;
SET LOCAL search_path TO de;
SELECT * FROM orders;  -- now returns de.orders data
COMMIT;

There are performance benefits to using a schema-per-tenant. With each table containing fewer rows, indexes are smaller and more likely to fit in the buffer cache. One tenant's update/delete churn will not increase another tenant's bloat or vacuum workload.

However, the operational overhead of maintaining a schema-per-tenant outweighs the performance benefits. It increases schema migration complexity because they need to be applied to each tenant's schema. Should you need to query across tenants, complex cross-schema joins will be required.

While this approach works, it likely won't scale beyond a few hundred tenants. Every table, index, constraint, and sequence across all schemas lives in shared system catalogs. With hundreds of schemas, each containing even a modest number of tables and their indexes, these catalogs grow into millions of rows. This slows the query planner as it consults the catalog on every query. Migrations slow down as the catalog size increases.

Safety concerns of `SET search_path`

There is no database-level enforcement of preventing access to the wrong schema. Schema-per-tenant feels like greater separation of data, but it does not meaningfully impact data isolation from a security perspective. You may also need to create a separate database user and set up precise schema-level permissions for better security.

Tenant data lifecycle

Onboarding new tenants requires creating a new schema for the tenant and performing a migration.

Removing tenants from a schema-per-tenant configuration may be one of the few operational advantages of this approach to multi-tenancy, as it is a single, simple operation.

DROP SCHEMA uk CASCADE;

Database-per-tenant

Generally not recommended. Database-per-tenant has a few benefits but is at odds with the connection model of Postgres.

All data is stored in a single database cluster
Each tenant has its own logical database, schema, and tables
Each tenant's data is isolated by the logical database name

Within a PlanetScale Postgres database, you have the option to run CREATE DATABASE to create many logical databases within a single database cluster.

The appeal of using logical databases per tenant is increased isolation: you do not need to filter by a column or modify search_path; instead, you can modify the connection string to connect to the correct database. This makes working with the data and schema of an individual tenant much simpler.

-- Create separate databases
CREATE DATABASE uk_store;
CREATE DATABASE de_store;

-- Connect to the UK database and create tables there
\c uk_store
CREATE TABLE orders (id BIGINT PRIMARY KEY, customer_name TEXT, total NUMERIC);

-- Connect to the German database and do the same
\c de_store
CREATE TABLE orders (id BIGINT PRIMARY KEY, customer_name TEXT, total NUMERIC);

There are notable performance benefits to using a database-per-tenant. With each table in each database containing fewer rows, indexes are smaller and more likely to fit in the buffer cache. One tenant's update/delete churn will not increase another tenant's bloat or vacuum workload.

Database-per-tenant is better for performance than a schema-per-tenant, as each database contains its own catalog of tables, indexes, constraints, and sequences.

However, these performance benefits are still outweighed by the drawbacks of increased operational complexity. Critically, connection pooling becomes a problem immediately, as PgBouncer pools are calculated per-database and will quickly exceed your max_connections limit. Connection limits are the primary issue with database-per-tenant multi-tenancy.

Additionally, each CREATE DATABASE copies Postgres's template database, consuming roughly 8 MB. Unlike schema-per-tenant, where all schemas share a single set of system catalogs, every logical database carries its own, multiplying storage and catalog maintenance overhead with each new tenant.

While all the isolation and performance benefits of a database-per-tenant are compelling, it conflicts with Postgres's connection model.

Additionally, if you need to query across tenants, there is no way to do so in Postgres. You would need to use an external data warehouse or a custom application layer to join the data together.

While this approach works, it likely won't scale beyond a few hundred tenants.

Security considerations

Of all the multi-tenancy approaches, the database-per-tenant approach is the most isolated from a security perspective. Each tenant has its own logical database, schema, and tables. Each tenant's data can be accessed only by a user with privileges to that database and schema.

Even so, the limitations on connectivity and the operational complexity of this model make it difficult to recommend.

Tenant data lifecycle

Every new tenant requires a new logical database to be created and a migration to set up its tables.

Removing tenants from a database-per-tenant configuration may be one of the few operational advantages of this approach to multi-tenancy, as it is a single, simple operation with no side effects.

DROP DATABASE uk_store;

Protecting tenants from each other

In all three approaches to multi-tenancy, tenants must be protected from one another, both in terms of data access and resource contention.

Our recommended approach, row-level isolation, is the most exposed because tables and indexes are shared. Care must be taken here to keep things safely isolated. Schema- and database-per-tenant approaches are more isolated at the relation level, but all three compete for CPU, memory, disk I/O, and connections.

One tenant running an expensive query degrades performance for all other tenants, commonly referred to as a "noisy neighbor" problem. Within your database, you can add some protection by setting statement_timeout and idle_in_transaction_session_timeout appropriately. Your application should also be aware of potential rate limits, which could allow one tenant to disrupt another tenant's experience.

PlanetScale Query Insights can help you identify and troubleshoot performance issues within your database, which you can debug manually or with an Agent using the PlanetScale MCP server.

Migrating to row-level multi-tenancy

Should your application already be configured for schema, database, or some other kind of multi-tenancy, you may be able to migrate to row-level multi-tenancy by adding a tenant_id column to your tables and updating your application to filter by this column.

If you are not yet on PlanetScale, we have successfully migrated large, multi-tenant workloads that were experiencing operational, performance, or scaling issues. We offer hands-on assistance on a case-by-case basis.

Reach out to discuss your current situation.

Other examples for multi-tenancy

For needs that are less than mission critical, such as internal applications and side projects, you may diverge from the recommendations in this post. For example, you might like to run distinct applications from a single database cluster, as it seems cheaper or operationally advantageous.

If your multiple "tenants" are actually different applications with unique data structures running from a single database, we simply ask you to exercise caution.

If you can't behave, be careful.

RLS sounds great until it isn't

Meg528 — Mon, 11 May 2026 16:06:32 +0000

By Josh Brown

When you leave your house, go to sleep, or go do work in the yard, you lock your door. Maybe you have a gate or fence you lock too. Without these, anyone can waltz into your house and snoop around.

Row Level Security (RLS) can be attractive to developers for numerous reasons, but the foot-guns and gotchas in RLS often outweigh the benefits. You probably want to keep your doors locked.

Friends and family: Managing access

RLS for Postgres lets administrators define security policies in their database, instead of the application layer. Let's imagine your house is your database, and the rows, tables, and data are like the things inside.

When your friends or family come over, you give them keys to every drawer they are allowed to have access to. Maybe everyone gets access to the silverware, but only the family can access your laundry room.

This is similar to how policies work in RLS. The rules for who gets which keys are your policies. If a user passes a policy rule (has the key) then they are allowed to access the data. At a very small scale, this can seem like a great idea. Anyone can access your database however they want and your policies ensure they aren't seeing things they shouldn't.

Testing and scaling these policies as your database grows becomes near impossible. For every new feature in your application, you must ensure your RLS policies are protecting the correct rows. Remembering to add these policies can be cumbersome, especially when they need to be manually synced to your codebase.

RLS fundamentally exists to protect your data. If you mess up even a single policy however, your data becomes exposed. Managing access in the same location your code lives is much easier than remembering to write a new policy every time a new table, column, or feature is added to your product.

The party: Managing connections

Postgres uses a process-per-connection architecture. Each new user connecting to your database directly with their role is like a new person coming into your house. At first it's fine, but once you have 100 people it gets crowded pretty quick.

PgBouncer is a connection pooler that reuses a small number of direct connections to your database while letting many clients connect to it. When using PgBouncer with RLS, you lose the upstream identity of the client.

The traditional way of solving this is using local variables instead of roles to define RLS policies. You define a policy that reads from a session-local variable instead of checking the Postgres role:

CREATE POLICY user_isolation ON orders
  FOR ALL USING (user_id = current_setting('app.tenant_id')::bigint);

Then wrap every transaction in your application to set that variable:

BEGIN;
SET LOCAL app.tenant_id = '1234';
SELECT * FROM orders;
COMMIT;

This requires a lot of extra application code to manage all the different local variables attached to each and every transaction (1). If SET LOCAL is omitted, current_setting() returns an empty string or throws an error depending on how your policy is written.

Annoying neighbor: Attack Surface

You go out to get your mail and you find your neighbor standing over your mailbox trying to open it over and over. You try to tell them that one is yours and to let you in, but they are having none of it. Now you have to sit and wait until they get bored and figure out they don't have the right key.

RLS acts like an extra WHERE clause appended to your queries. Unless the user lacks read permission on a table, their queries will still run even if no data is returned. On complex joins or queries lacking indexes, this can hurt database performance.

If a malicious user starts retrying a query over and over, RLS will make sure they don't see any data, but cannot stop them from running the query itself. Relying on RLS to completely protect your tables burns valuable CPU cycles and can potentially starve your other, honest users.

Any user of your application, particularly in situations where you do not have sufficient rate limiting in place, can DDoS your database simply by hitting an API endpoint. This is preventable by checking authentication to see if a user is allowed to run a query, without relying on RLS to manage your security for you.

A large keyring: Performance Implications

Every time your friend goes to get a Diet Coke, they need to find the fridge key on their very large key chain. This wastes valuable time sifting through all the different keys and trying each one, so instead they mark the key so it's easier to find next time they go to the fridge.

RLS policies are generally executed per row (2), meaning any function or complex logic will run for each row scanned. This can be solved by wrapping functions into subqueries. Setting up a simple benchmark, we can see the difference between RLS, RLS cached, and with RLS disabled. If you want to try it yourself, you can use this benchmark repository.

For this benchmark, we tested 5 different setups. Two different functions that are called from two different policies, and one without RLS at all.

RLS with a VOLATILE function
RLS with a STABLE function
RLS with a VOLATILE function + cache
RLS with a STABLE function + cache
No RLS

A volatile function is defined with the keyword VOLATILE that tells Postgres the function may modify data or return different values upon successive calls. This is the default mode for a new function in Postgres.

CREATE OR REPLACE FUNCTION get_current_role()
RETURNS TEXT
LANGUAGE SQL
VOLATILE
SECURITY DEFINER
AS $$
    ...
$$;

The other option is to use STABLE in our function definition. Stable functions cannot modify data, and are expected to return the same value for successive calls within the same transaction. When using RLS however, Postgres does not cache the value when evaluating the policy on each row during queries. In order to successfully cache the result across each policy evaluation, we need to trick Postgres.

When we wrap the function call in a SELECT, Postgres creates an InitPlan query node type. By default, anything after the USING keyword is executed as a SubPlan type, where Postgres expects that the outcome can change row to row. This is desired as that is what we are checking; for every row, should the user be allowed to fetch it.

An InitPlan is only run once per execution of the outer plan, and cached for reuse in later rows of the evaluation. Using EXPLAIN, we can see how the different policy definitions change the estimated cost.

-- RLS without subquery: no InitPlan, high cost
CREATE POLICY tenant_isolation ON orders USING (tenant_id = current_setting('app.tenant_id')::bigint AND get_current_role() = 'admin');
EXPLAIN:
    Aggregate  (cost=34828.68..34828.69 rows=1 width=40)
      ->  Index Scan using orders_tenant_id_idx on orders  (cost=0.43..34826.20 rows=495 width=6)
            Index Cond: (tenant_id = (current_setting('app.tenant_id'::text))::bigint)
            Filter: (get_current_role() = 'admin'::text)

-- RLS with subquery: Initplan caches result, lower cost
CREATE POLICY tenant_isolation ON orders USING  (tenant_id = current_setting('app.tenant_id')::bigint AND (SELECT get_current_role()) = 'admin');
EXPLAIN:
    Aggregate  (cost=10095.69..10095.70 rows=1 width=40)
      InitPlan 1
        ->  Result  (cost=0.00..0.26 rows=1 width=32)
      ->  Index Scan using orders_tenant_id_idx on orders  (cost=0.43..10092.95 rows=495 width=6)
            Index Cond: (tenant_id = (current_setting('app.tenant_id'::text))::bigint)
            Filter: ((InitPlan 1).col1 = 'admin'::text)

The cost= in the explain rows is Postgres' guess at how expensive a query will be to run, in arbitrary units. The first number is the estimated startup cost; or how expensive it is to do the sorting and filtering of the query before returning rows to the user. The second number is the estimated total cost, including fetching all the rows. The rows= and width= are how many expected rows the query will return, and the width of those rows respectively.

When Postgres doesn't think it can cache the inner query, the cost is over 3x higher than if it would have been able to. In reality, the actual latency difference is much larger than 3x as seen in the chart above.

When Postgres doesn't cache expensive functions in your policy definitions, RLS becomes expensive overhead. RLS can be just as fast as if you weren't using it at all in some scenarios. The issue is that RLS becomes yet another layer of code that needs to continuously optimized, where small mistakes can cause large performance hits.

It's your house: Permission ownership

It's your house, you obviously have the keys to everything, but what if you weren't supposed to?

Every Postgres table has an owner. Normally you'd control table and row access on a per-Postgres-role basis, however when you connect to Postgres as the owning role of a table, none of its RLS policies apply. You must explicitly opt in:

ALTER TABLE users FORCE ROW LEVEL SECURITY;

Even this may not be sufficient if you are connected with the Postgres superuser role. Any roles that contain the SUPERUSER attribute will always bypass RLS. This is easy to miss and easy to test incorrectly. Your policy tests might pass under a non-owner role while production traffic runs as the owner.

Making a ham sandwich: Stricter patterns

Let's say your friend Andy wanted to make a ham sandwich. He had access to the fridge and utensils, but not your grocery list. When he made his sandwich, he used up all the mustard, and now you need to go get more. When using RLS, Andy's query can't touch our grocery list. We have to update that separately.

Without RLS this is easy. When using RLS, doing this type of query can add a lot of complexity. Getting the utensils, making the sandwich, and updating the grocery list might not share the same permissions. While rows in one table may be accessible to a user, updating rows in another may not be. Since we own the grocery list, we don't want anyone touching it except in well defined scenarios.

One way to solve this is by using multiple roles and multiple transactions, but this becomes overly cumbersome on our application layer. A better solution would be to add a SECURITY DEFINER function in our database that gives roles access to modify or view data in a well defined way:

CREATE FUNCTION use_ingredients(ingredients text[])
RETURNS void
LANGUAGE plpgsql
SECURITY DEFINER AS $$
BEGIN
  -- Runs as the function owner, bypassing Andy's RLS policies
  UPDATE grocery_list SET quantity = quantity - 1
  WHERE item = ANY(ingredients);
END;
$$;

SECURITY DEFINER causes the function to run as its owner's role, bypassing RLS entirely for that operation. Now you're back to managing security on both RLS and your application layer, ensuring only specific parameters are allowed to pass to this function.

Keeping database functions in version control also becomes difficult. Some migration tools include SQL functions and policies, but are another part of your schema migrations that can cause headaches down the road.

Your application layer also needs to stay in sync with every function it calls in your database. Changing function definitions, names, or return values may require a new database migration, or delicate surgery to ensure a stable update.

End of the day

Once we have managed locking everything under a different key inside your house, who has what keys, who is allowed in, and who is delegating access for who, we find our application code has almost as much logic as if it didn't have RLS at all.

RLS policies themselves are stored in pg_policies inside your database, not in your source code. Most standard migration tools don't track policy changes alongside schema changes. Policy migrations become a separate, manual process, and they drift. A schema change that adds a column or renames a table can silently break a policy that no one realizes is outdated until something breaks in our application, impacting users.

Each query to the database will already need some sort of modifier in your application code to add local variables for user identification when using PgBouncer. Misconfigured local variables could be just as damaging as if RLS wasn't there to begin with.

We still need to check early on if a user has permission to run a query, or else we risk allowing users to degrade our database performance with spam. If we are already checking permissions at the application layer, the benefits of RLS become harder to observe.

Optimizing queries also becomes much harder. Queries are artificially restricted to what they are allowed to see, and need bespoke functions and permissions to get access. This causes our management of source code and database logic to become even harder to manage, between policies, functions, and the mappings between them.

How to do it right

At PlanetScale, we typically recommend against relying on Postgres RLS. There may be occasional useful scenarios, but when implementing RLS correctly at scale, the benefits quickly turn into cons with a higher overhead not only to performance, but also developer experience and complexity.

Application-layer authorization like middleware, ORM-level scoping, or a dedicated permissions table keeps your logic visible, testable, and co-located with the code that uses it.

Your database is more like a warehouse. Don't treat it like your house.

Footnotes

Note that PgBouncer pool_mode must be in either session or transaction. statement mode won't work with SET LOCAL at all.
The Postgres query planner can sometimes determine that a policy is safe to cache across evaluations on its own. Doing this properly can be a tricky process. Even in our benchmark example, functions that are marked as stable still need to be wrapped in a subquery in order for Postgres to properly cache the result. Each policy is different, and determining the proper optimizations for each one is another layer of complexity in your codebase.

High Memory Usage in Postgres is Good, Actually

Meg528 — Mon, 04 May 2026 15:56:50 +0000

By Simeon Griggs

Houseplants often die from over-watering, not neglect. It is easy to project human needs onto them: "If I am thirsty, they must be thirsty too." But many indoor plants actually benefit from drying out between waterings.

Similarly, your empathy can lead to misinterpreting signals from your database. You don't like feeling overwhelmed, so you don't want your database overwhelmed either.

But not all usage is created equal, and memory in computers can be uniquely complex to understand.

A look at your PlanetScale dashboard might show memory usage sitting at 80%. That looks bad, but it could actually be representative of a healthy system.

To be clear, consistently high CPU usage is a problem. For as long as CPU stays high, queries wait longer, the slowest queries get slower, and you have less headroom for spikes.

Memory is different. The percentage shown in the cluster diagram on your PlanetScale dashboard is measuring the entire node your database runs on, not just Postgres. When most RAM is in use, it usually means the system is keeping data close to the CPU so it does not have to read from disk as often. Unlike sustained high CPU, high memory usage by itself does not mean performance is degraded or that you are at immediate risk of "running out" of memory.

Why Postgres wants your memory

Reading from disk is slower than reading from RAM, even with PlanetScale Metal's locally attached NVMe drives. Postgres is designed to take advantage of that gap by caching as much data in memory as it can.

There are two layers of caching at work, and both consume RAM.

shared_buffers is Postgres' own buffer pool. When a query needs data, Postgres first checks this pool for the relevant pages, the fixed-size (8 KB by default) chunks of table and index data it works with, before reading from disk. The more of your working data that fits here, the fewer disk reads Postgres needs to perform.

This parameter can be configured in the cluster configuration page of the PlanetScale dashboard. The default value should be sufficient for most workloads, and modifying it should not be your first step in troubleshooting memory usage.

The OS page cache is the second caching layer. Even when Postgres does go to disk, the operating system keeps a copy of the data it reads in RAM so the next access is faster. This is not a Postgres feature — it is standard Linux behavior. Postgres was designed with this in mind, and its own documentation notes that the operating system's cache is expected to handle data beyond what fits in shared_buffers.

Between these two layers, a healthy Postgres server will use most of the available RAM. That is the goal, not a side effect. For context, reading a page from RAM is roughly 1,000 times faster than reading it from even a fast NVMe drive. A database that keeps frequently accessed data in memory avoids that penalty on every query.

When caching is working well, the vast majority of page reads are served from memory without touching disk. If that ratio drops — because the working dataset has outgrown available memory, for example — queries slow down as Postgres waits on disk more often.

See our documentation on "Normal operating ranges" to sense-check what values you should be seeing in Cluster Metrics for CPU, memory, and more.

Memory usage compared to CPU usage

At a glance, CPU and memory usage numbers look comparable because they share a 0–100% scale, but they describe very different behavior.

CPU is work. Sustained high CPU means the database is spending time on work it cannot skip. When CPU is saturated, queries arrive faster than they can be processed. They queue, latency climbs, and connection timeouts can cascade into application-level failures. There is no "good" kind of sustained high CPU usage.

Memory is workspace. Postgres and the OS use spare RAM to avoid expensive disk reads. Higher use improves performance ... most of the time.

"Most of the time" because memory usage gets a little complicated.

Two kinds of memory usage

The single “memory usage” percentage number combines two different behaviors.

To explore that number in more detail, within the Cluster Metrics page of the PlanetScale dashboard, memory is shown as a stacked chart over time with four different categories: active cache, inactive cache, RSS, and memory mapped. These four categories can be grouped into two separate but equally important use-cases: cache and process memory.

1. Cache (active, inactive, and memory mapped)

Much of what looks like “used” memory on a healthy database host is cache: file data the operating system keeps in RAM after reads so the next access is cheap. You may see this referred to as "page cache" in other dashboards.

Active cache is data the OS recently touched and wants to keep around. Inactive cache hasn't been accessed lately. Memory-mapped pages are cached pages that are backed by real files on disk.

All three of these cache types are reclaimable by the operating system and can be dropped when something else needs RAM.

If total memory is high because cache is high, good! Frequently accessed data stays near the CPU for faster access.

2. Process memory (RSS)

Separately, Postgres holds memory for processes that are actually using it. You will see this referred to as RSS (Resident Set Size) in the PlanetScale dashboard.

This memory is not reclaimable by the operating system and is what increases out of memory (OOM) risk. High memory usage through high RSS leads to restarts and degraded behavior.

If total memory is high because RSS is high, that is referred to as memory pressure and is a problem.

What is Resident Set Size?

Roughly, RSS is the amount of private memory allocated to a process such as stack, heap, catalog/relcache caches, query execution memory like sorts and hash tables.

Given Postgres' process-per-connection architecture, each process requires some baseline amount of memory. Not every process will consume the same amount of memory.

Further, some memory use is shared across processes. So calculating RSS use is not as simple as adding up the memory usage of every process.

RSS increases for a number of reasons:

Postgres may grant multiple work_mem allocations within a single query; see below for more details.
Catalog bloat can spike RSS usage, common in multi-tenant schemas using a table-per-tenant pattern.
The operating system's memory allocator may not return memory efficiently.
Misbehaving or misconfigured extensions can increase RSS usage.
Cached plans and prepared statements accumulate per-session memory that is not released until the session ends or the statement is explicitly deallocated.

The work_mem parameter's default value is set relative to the amount of memory in your database cluster. It can be modified in the cluster configuration page of the PlanetScale dashboard.

Tuning work_mem might seem like an obvious lever — decrease it to reduce RSS, or increase it to prevent operations from spilling to disk. But the allocation is per-sort/hash-node, per-query, per-backend.

A single complex query can allocate work_mem multiple times, and that multiplies across every active connection. Setting it too low forces more disk I/O; setting it too high globally can cause total memory usage to spike unpredictably under load. Neither direction is a safe default change without first understanding your workload's concurrency and query complexity.

Efficient connection pooling can be the best way to reduce RSS usage. Fewer active connections result in fewer copies of all that per-process overhead.

PgBouncer on PlanetScale runs in transaction mode, where connections are returned to the pool after each transaction completes. See our blog post on Scaling Postgres connections with PgBouncer for more details.

Investigating memory usage while debugging performance

If you're experiencing degraded performance, the challenge is figuring out what drove the RSS growth.

Query Insights helps you investigate query performance through CPU time, I/O, and latency, but it does not show per-query memory. You may see OOM markers and slow-query signals, but not query-specific RSS usage.

RSS is a per-process metric, not a per-query metric. That means you cannot read “RSS per query” directly from EXPLAIN or Query Insights. Instead, you may need to gather multiple signals and triangulate:

Use Cluster Metrics to identify when RSS rises.
In Query Insights for that same window, look for expensive patterns (high runtime, CPU, I/O, rows/blocks read) and OOM-adjacent activity.
Re-run suspect queries with EXPLAIN (ANALYZE, BUFFERS, MEMORY) to inspect operator-level memory usage.
Check connection counts in the same window, because many concurrent connection processes can increase RSS even when a single query is moderate.

The out of memory documentation has more details on the likely causes of, and how to prevent, OOM events.

In summary

A lot of cached data in memory is a good thing. Ideally, your "hot dataset" fits in the page cache of your database cluster to maintain fast performance. Too little cached data can lead to increased CPU usage and degraded performance.

High memory usage is not automatically bad. If your high memory usage is due to cache, you typically have a healthy, performant database.
Memory pressure is bad. Rising RSS toward limits, OOM kills, unexplained restarts, and tail latency spiking together with heavy disk I/O when the working set is tight on RAM are the signals to act on.
Sustained high CPU is a problem. It means you are out of headroom. Tune the workload (see Query Insights) or upgrade.

If the dashboard shows a high “% memory used,” do not panic. Investigate the types of memory being used and check for OOM events before taking action.

Stripe Projects Partnership: Provision PlanetScale Postgres and MySQL Databases From the Stripe CLI

Meg528 — Mon, 27 Apr 2026 17:55:54 +0000

By Elom Gomez

Did you hear the news? PlanetScale is participating as a co-design and launch partner for Stripe Projects, a new developer preview from Stripe that centralizes dev tool provisioning and billing in one place.

What is Stripe Projects?

Stripe Projects is a new way for developers and coding agents to discover, provision, and pay for developer tools all from the Stripe CLI. Instead of jumping between dashboards, entering payment info, and copying credentials across services, everything lives in one centralized workflow.

This fragmented developer workflow has always existed, but AI agents have made the gap much more obvious. The ecosystem has been missing a standard way for provisioning and credential handoff to work reliably across providers. And we're excited to partner with Stripe to close this gap.

With PlanetScale as a launch partner, you can now spin up and pay for fully managed MySQL or Postgres databases directly from your terminal in seconds.

Try it out today

Stripe Projects is currently in developer preview. You can request early access here. Once you're in, follow these instructions to spin up a PlanetScale Postgres or MySQL database:

Install the Stripe CLI
Install the Projects plugin: stripe plugin install projects
Initialize Stripe Projects in your app stripe projects init
Add a PlanetScale database: stripe projects add planetscale/postgresql or stripe projects add planetscale/mysql
Go through the prompts to create your database: database name, cluster size, region, and number of replicas
Within seconds, your PlanetScale Postgres or MySQL database is provisioned without you ever leaving the terminal
Sync your database credentials to your .env file: stripe projects env --sync

Resources and feedback

You can start using PlanetScale with Stripe Projects in the Stripe Marketplace. Or, head to the Stripe Projects documentation to learn more.

We'd love to hear how you're using PlanetScale with Stripe Projects. Join our Discord to let us know or reach out to us on X!

Enhanced Tagging in Postgres Query Insights

Meg528 — Mon, 20 Apr 2026 16:14:33 +0000

By Rafer Hazen

As part of our Traffic Control launch, we made enhancements to the Insights query tagging feature for Postgres databases. Insights has supported query tags for some time, but they were previously only attached as metadata on individual notable query logs. With this release, tags are now present in aggregated query data, which enables powerful new capabilities. It's now possible to view the complete distribution of tags assigned to a query pattern, search queries by tag, and see a per-tag breakdown of database-level statistics. This blog post gives an overview of the feature, and digs into the details of how we implemented it.

Adding Tags

Query tags are string key-value pairs that are included in query SQL using specially formatted SQL comments. For example, the following query has the controller and action tags attached.

select * from users 
  where id = 1 
  /* controller='users',
     action='show' */;

Typically tags are specified at the application level and applied automatically to all queries issued by the database framework you're using. Common examples are controller, action, job, or source_location.

In addition to tags set by the database client, Insights automatically adds the following tags to all queries:

application_name - set by the Postgres driver
username - the Postgres user executing the query
remote_address - the remote IP address

Feature Overview

This feature introduces three new surfaces where tag information can be seen.

Query Pattern Tags

To see the set of tags associated with a given query pattern, click on a query pattern from the main Insights dashboard. This page lists the tags that have been submitted with a given query pattern over a particular time range, as well as the percentage of queries that included each tag value.

Database Tags

To see aggregate statistics for your entire database broken down by tag, go to the Tags section in the Insights sidebar and select the tag or set of tags that you want to view.

Query Filter

To see a list of query patterns that have a given tag value, go to the Insights dashboard and search for a particular tag with tag:MY_TAG:MY_VALUE. The returned query patterns and statistics are filtered to only queries with the specified tag pair.

Implementation

To understand how tagging works in Insights, it helps to understand the underlying data sources that power Insights. Query performance data is observed by the Insights Postgres extension, emitted to Kafka and written to ClickHouse. The extension publishes to two separate Kafka topics:

Individual queries - any query reading more than 10,000 rows, taking longer than 1 second, or resulting in an error. One message is sent per qualifying query. This powers the Notable queries feature.
Aggregate summaries - statistics like total query count, rows read, and cumulative query time. One message is sent for every query pattern every 15 seconds. This powers the majority of Insights including the query table, anomalies, and all query-related graphs.

Prior to this release, tag data was only attached to the individual query data stream. This adds important information to notable queries, but because the data wasn't present in the aggregate summaries, it wasn't possible to filter or group aggregate data by tag. Insights couldn't answer important questions like:

What queries has this user executed?
What percentage of my total query run time is coming from this controller?
Which background jobs are executing this query?

Our goal with this release was to associate all query data with the relevant tags to make it possible to answer this class of questions.

Sending Tags

To explore the various approaches for implementing tags, let's use the following query executions as an example:

select * from users where id = 1 /*controller='users'*/;
select * from users where id = 2 /*controller='sessions'*/;
select * from users where id = 3 /*controller='sessions'*/;

Since each of these queries has the same fingerprint (query with all literal values removed), without tags we would only need to send a single summary message. To include tags, we have several options. The first would be to continue sending only a single query summary event with a count of how many times each tag was observed. This would produce a summary message like the following (other stats fields are omitted):

{
    sql: "select * from users where id = ?",
    query_count: 3,
    total_time: "100ms",
    tags: {"controller=users": 1, "controller=sessions": 2}
}

This message tells us the given query was executed three times - twice from the sessions controller and once from the users controller - and had a cumulative execution time of 100ms.

At first glance, including tags in this manner is an attractive option. It's simple to implement - we just accumulate tags along with the other aggregate stats - and it doesn't increase the number of events that need to be emitted and stored. It has a serious shortcoming, however: it's not possible to attribute aggregated stats to any individual tag. For example, it's not possible to know the total time of queries emitted from just the users controller, because we can't tell what portion of the 100ms was associated with controller=users. The summary data for one tag is permanently combined with the data from all tags.

To overcome this limitation, we can instead emit a separate aggregate summary message for each set of unique tags. In our example this would mean we emit two separate messages to the insights pipeline:

{
    sql: "select * from users where id = ?",
    query_count: 1,
    total_time: "20ms",
    tags: {"controller": "users"}
}
{
    sql: "select * from users where id = ?",
    query_count: 2,
    total_time: "80ms",
    tags: {"controller": "sessions"}
}

This approach makes it possible to fully disambiguate aggregated statistics based on the attached tags. We can tell that the users controller was responsible for exactly 20ms of total execution time and the sessions controller was responsible for exactly 80ms.

This comes at a cost though: we have to emit a separate message for each unique tag combination. This can be problematic for high-cardinality tags (tags with a large number of distinct values). Consider a customer that has set a request_id tag on all of the queries issued from their web tier. Where we previously would be able to collapse 500 user-lookup queries into a single summary message, we now have to send 500 messages because they each have a unique request_id. In the worst case, this means that the summary data stream must send one summary message per query execution, and we've lost all of the scalability advantages of aggregating query statistics. For large clusters executing millions of queries per second, this would be prohibitively expensive to process and store, and would consume considerable resources on the database host where telemetry data is emitted.

To prevent this from overwhelming the pipeline, we implemented several strategies to dynamically reduce the cardinality of tags and therefore decrease the number of messages that must be handled by the Insights pipeline.

Cardinality Reduction

The core idea is simple: when a tag (or set of tags) would result in sending too much telemetry data, we collapse that tag by replacing specific values (like request_id="a" and request_id="b") with a value that indicates it has been removed: request_id=*. This lets us more aggressively merge aggregates and reduce the total number of messages sent, while ensuring that we're capturing 100% of the summary data.

We employed two separate approaches for tag collapsing.

Per-tag Limits

This mechanism tracks the number of unique values seen for each tag key, scoped per query pattern. If that count exceeds a predefined limit (currently 20), we proactively collapse that key for all queries for the next hour. This catches inherently high-cardinality tags like request_id or user_id.

An important part of this approach is that cardinality is monitored per query pattern and not globally. Consider the source_location tag that contains the file and line number showing where the query was initiated in the client app. Overall this tag is high-cardinality, because each query pattern likely has its own unique value for source_location, but it is highly correlated with the query pattern so it doesn't actually result in additional messages being sent to the pipeline - we are already sending a separate query summary message for each query pattern. Monitoring cardinality per-pattern allows high-cardinality tags that are highly correlated with query pattern to pass through without being collapsed.

Per-interval Limits

Within each 15-second interval, we track all aggregates keyed by their unique set of tag key-value pairs. Because we must emit a message for each unique combination of tags, even individually low-cardinality tags could produce an unacceptably large number of combinations of tags. For example, if a query pattern has 6 tag keys that each have 10 distinct values, there could be 10^6 individual tag combinations. To prevent an explosion in the number of messages that must be tracked, we perform dynamic cardinality reduction on a per-interval basis for any individual query pattern that has more than a fixed number of tag combinations.

To reduce the combined cardinality of a given set of aggregates, we find the highest cardinality tag and collapse it (replace all values with a single value). We successively perform this operation until the number of aggregates is beneath the fixed threshold (currently set to 50 in production).

To illustrate this operation, consider five executions of the same query pattern.

select * from users where id = ? /*controller='users',    host='app-1'*/
select * from users where id = ? /*controller='users',    host='app-2'*/
select * from users where id = ? /*controller='sessions', host='app-3'*/
select * from users where id = ? /*controller='sessions', host='app-4'*/
select * from users where id = ? /*controller='sessions', host='app-1'*/

Without any limits, this produces five separate aggregate messages. To reduce the aggregate message count, we identify that the host tag has the highest cardinality (4 unique values) and replace all of its values with a placeholder and merge the remaining results. This yields only two combinations that must be emitted to the pipeline, one for each of the two unique controller tag values.

Tracking Tag Collapsing

When a tag must be collapsed due to either of the cardinality limitation mechanisms, we record the fact that the key has been collapsed in the emitted aggregate message. This allows us to detect when collapsing has occurred and display a message noting the percentage of tag values where the value is unknown.

Conclusion

Query tagging is a powerful feature. Being able to slice your Insights data by arbitrary tags gives you a much clearer picture of your database performance. We're excited for you to try it.

Patterns for Postgres Traffic Control

Meg528 — Mon, 13 Apr 2026 16:44:20 +0000

By Josh Brown

Last month we introduced Database Traffic Control™. Traffic Control lets you attach resource budgets to slices of your Postgres traffic, like keeping your checkout flow running while a runaway analytics query gets shed instead. We have already discussed some scenarios where you should use Traffic Control, along with how to define resource limits, so now let's dig into what Traffic Control looks like in your codebase.

This post walks through some practical patterns in Go. Each pattern targets a different failure mode, architecture, or foot gun. Most of them layer on top of one another too, so you can adopt them individually or combine them for extra peace of mind. Keep in mind the general concepts here are applicable to whatever language your application is written in.

Setup

Most of the patterns here rely on custom tags attached to your queries. Traffic Control reads these using the SQLCommenter format: a SQL comment appended to each query with URL-encoded key=value pairs.

SELECT * FROM orders 
  WHERE user_id = $1
 /*route='checkout',feature='new_order_flow'*/;

These tags are then available for new Traffic Control rules.

Here's a minimal Go helper that appends tags in this format:

import (
    "fmt"
    "net/url"
    "sort"
    "strings"
)

// appendTags appends SQLCommenter-format tags to a SQL query.
func appendTags(query string, tags map[string]string) string {
    if len(tags) == 0 {
        return query
    }
    parts := make([]string, 0, len(tags))
    for k, v := range tags {
        parts = append(parts, fmt.Sprintf("%s='%s'", k, url.QueryEscape(v)))
    }
    sort.Strings(parts) // deterministic order
    return query + " /*" + strings.Join(parts, ",") + "*/"
}

You'll also want a way to thread tags through your call stack without touching every function signature. A context key works well for this:

type contextKey string

const sqlTagsKey contextKey = "sql_tags"

func tagsFromContext(ctx context.Context) map[string]string {
    if tags, ok := ctx.Value(sqlTagsKey).(map[string]string); ok {
        // return a copy so callers can't mutate shared state
        out := make(map[string]string, len(tags))
        for k, v := range tags {
            out[k] = v
        }
        return out
    }
    return make(map[string]string)
}

func contextWithTags(ctx context.Context, tags map[string]string) context.Context {
    return context.WithValue(ctx, sqlTagsKey, tags)
}

With these two helpers in place, the patterns below mostly just set keys and values in context. Tagging happens automatically when the query executes.

Per-service isolation via Roles

In a microservice architecture, a single misbehaving service should not be able to degrade every other service sharing the same database. The simplest way to isolate a service is to create a Traffic Control rule based on a unique connection string for the given service, or via application name.

A budget on username='pscale_api_123abc' will isolate all traffic from that role. This also helps in incident response: you can immediately cap a service's resource share without redeploying anything.

Note that the username is the internal Postgres username of the role, not the dashboard role name. You can also target custom roles created by CREATE ROLE if your microservices have strict security over table permissions.

You can also use the application_name by appending it to your connection strings such as postgresql://other@localhost/otherdb?application_name=myapp.

Route-level tagging in an HTTP service

When you're running a monolith or a large API service, the problem isn't usually the whole service, it's specific routes. The /api/export endpoint that generates CSV reports should not be able to kill the /api/checkout flow.

An HTTP middleware can inject the route into context at runtime before any handler runs:

// Any route using SQLTagMiddleware will have the pattern injected into its context
// dynamically at runtime
func SQLTagMiddleware(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        tags := tagsFromContext(r.Context())
        route := strings.ReplaceAll(strings.ReplaceAll(r.Pattern, "{", ":"), "}", ":") // Removes "{}" characters from route
        tags["route"] = route
        tags["app"] = "web"
        ctx := contextWithTags(r.Context(), tags)
        next.ServeHTTP(w, r.WithContext(ctx))
    })
}

Wrap your database calls to pick up the tags automatically:

// QueryContext for SELECT statements
func (db *DB) QueryContext(ctx context.Context, query string, args ...any) (*sql.Rows, error) {
    return db.sql.QueryContext(ctx, appendTags(query, tagsFromContext(ctx)), args...)
}

// ExecContext for INSERT/UPDATE statements
func (db *DB) ExecContext(ctx context.Context, query string, args ...any) (sql.Result, error) {
    return db.sql.ExecContext(ctx, appendTags(query, tagsFromContext(ctx)), args...)
}

Now every query carries the route it came from. You can create a Traffic Control budget targeting route='/api-export' and give it a conservative CPU limit.

This also makes it easy to set up broad budgets during incidents. If you suddenly see a spike and don't know which route is responsible, the violation graph in Traffic Control will show you exactly which route tag is hitting limits.

Feature flags and new deployments

Shipping a new feature to production always carries risk. Maybe the new query pattern is fine under your test load but becomes expensive at scale. Traffic Control gives you a way to cap the blast radius before it becomes an incident.

The simplest version sets a tag from an environment variable at startup:

var deploymentTag = os.Getenv("DEPLOYMENT_TAG") // e.g. "new_checkout_v2" or git sha "96e350426"

func tagWithDeployment(ctx context.Context) context.Context {
    if deploymentTag == "" {
        return ctx
    }
    tags := tagsFromContext(ctx)
    tags["feature"] = deploymentTag
    return contextWithTags(ctx, tags)
}

Set DEPLOYMENT_TAG=new_checkout_v2 when rolling out new pods and leave it unset on the old pods. Traffic Control can then have a budget on feature='new_checkout_v2' in Warn mode from day one, so you see exactly how the new code behaves before it causes problems. When you're confident, either remove the budget or switch it to Enforce as a safety net.

For feature flags controlled at runtime, the same approach works but driven by your flag evaluation:

func (h *OrderHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    ctx := r.Context()
    if flags.Enabled(ctx, "new_order_flow") {
        tags := tagsFromContext(ctx)
        tags["feature"] = "new_order_flow"
        ctx = contextWithTags(ctx, tags)
    }
    h.processOrder(ctx, w, r)
}

Tier-based limits in multi-tenant apps

In a SaaS application, free-tier users should not be able to degrade the experience for enterprise customers. Traffic Control lets you enforce this at the database level rather than just at the application layer.

Inject the user's subscription tier into the SQL tags early in your request handling — ideally right after you've resolved the authenticated user:

type Tier string

const (
    TierFree       Tier = "FREE"
    TierPro        Tier = "PRO"
    TierEnterprise Tier = "ENTERPRISE"
)

func WithUserTier(ctx context.Context, tier Tier) context.Context {
    tags := tagsFromContext(ctx)
    tags["tier"] = string(tier)
    return contextWithTags(ctx, tags)
}

In your authentication middleware:

func AuthMiddleware(users *UserService, next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        user, err := users.Authenticate(r)
        if err != nil {
            http.Error(w, "unauthorized", http.StatusUnauthorized)
            return
        }
        ctx := WithUserTier(r.Context(), user.Tier)
        next.ServeHTTP(w, r.WithContext(ctx))
    })
}

With this in place, create two Traffic Control budgets:

tier='free' — conservative limits on server share and max concurrent queries
tier='pro' — moderate limits

Leave enterprise traffic unbudgeted or give it a high budget as a ceiling. When a free-tier user runs an expensive dashboard or triggers a slow query, the budget sheds that traffic before it touches enterprise workloads.

You can combine this with the route tag from Pattern 2. A budget matching tier='free' AND route='api-export' can be stricter than a budget on tier='free' alone. Enterprise export requests get more headroom than free-tier export requests.

Background jobs and scripts

Background jobs are a common cause of database incidents. A migration script, a nightly sync, or a one-off data backfill can all accidentally saturate your database if they run faster than expected. Traffic Control is a clean way to give these jobs a resource ceiling without having to tune query-level timeouts throughout your codebase.

For long-running background workers, use a dedicated connection pool with a distinct application_name:

func newJobDB(dsn string) (*sql.DB, error) {
    jobDSN, err := url.Parse(dsn) // your connection string
    if err != nil {
        return nil, err
    }
    q := jobDSN.Query()
    // This sets the application name in code instead of in the connection string env variable.
    q.Set("application_name", "background-jobs")
    jobDSN.RawQuery = q.Encode()

    db, err := sql.Open("pgx", jobDSN.String()) // connects to Postgres
    if err != nil {
        return nil, err
    }
    db.SetMaxOpenConns(4) // Jobs don't need high concurrency
    return db, nil
}

newJobDB takes the DSN of your database and sets application_name to background-jobs before connecting. Once connected we set the max connections to 4 to make sure our background job isn't taking up more workers than it should, and finally we return it so that the calling function can now query the database.

Setting application_name on the connection string level in code ensures that it is always set for this service, no matter the query or connection string given. You can pair this with SQL comments as described above for even more fine-grained control and insights into your queries.

For one-off scripts and migrations we can do something similar. Here we encode the script's identity directly in the connection string so it shows up clearly in Traffic Control and Insights:

// Returns a database instance with the `application_name` set to `script-[scriptName]`
// for use in scripts
func scriptDB(dsn, scriptName string) (*sql.DB, error) {
    u, _ := url.Parse(dsn)
    q := u.Query()
    q.Set("application_name", "script-"+scriptName) // e.g. "script-backfill-order-totals"
    u.RawQuery = q.Encode()
    return sql.Open("pgx", u.String())
}

Create a Traffic Control budget for application_name='background-jobs' in Warn mode before you run this job next. Observe how much of the database's resources your background work typically consumes. Then switch to Enforce to cap it at a level where it can't crowd out interactive traffic even if a job goes sideways.

Handling blocked queries

When Traffic Control is in Enforce mode and a query exceeds its budget, Postgres returns SQLSTATE 53000 with an error message prefixed with [PGINSIGHTS] Traffic Control:. Your application needs to handle this without crashing.

With pgx/v5:

import (
    "errors"
    "github.com/jackc/pgx/v5/pgconn"
)

const sqlstateTrafficControl = "53000"

func isTrafficControlError(err error) bool {
    var pgErr *pgconn.PgError
    return errors.As(err, &pgErr) && pgErr.Code == sqlstateTrafficControl
}

The right response depends on the query's role in your application:

func (s *OrderService) GetUserOrders(ctx context.Context, userID int64) ([]Order, error) {
    rows, err := s.db.QueryContext(ctx, `SELECT id, total FROM orders WHERE user_id = $1`, userID)
    if err != nil {
        if isTrafficControlError(err) {
            // Return a degraded response rather than a 500
            return nil, ErrServiceUnavailable
        }
        return nil, err
    }
    defer rows.Close()
    // ...
}

For non-critical workloads like analytics or reporting, returning a 503 Service Unavailable or a cached result is most likely the right behavior. That's exactly the controlled failure mode Traffic Control is designed to create. For more critical paths, you may want a short retry with backoff:

func queryWithBackoff(ctx context.Context, db *sql.DB, query string, args ...any) (*sql.Rows, error) {
    const maxRetries = 3
    backoff := 100 * time.Millisecond

    for attempt := range maxRetries {
        rows, err := db.QueryContext(ctx, query, args...)
        if err == nil {
            return rows, nil
        }
        if !isTrafficControlError(err) || attempt == maxRetries-1 {
            return nil, err
        }
        select {
        case <-time.After(backoff):
            backoff *= 2
        case <-ctx.Done():
            return nil, ctx.Err()
        }
    }
    return nil, errors.New("overloaded")
}

Observing warn-mode notices

Before switching a budget to Enforce, you'll run it in Warn mode. In Warn mode, queries succeed but the driver receives a Postgres notice containing [PGINSIGHTS] Traffic Control:. With pgx/v5 you can log these notices to build an accurate picture of what would be blocked:

import "github.com/jackc/pgx/v5"

config, err := pgx.ParseConfig(dsn)
if err != nil {
    log.Fatal(err)
}

config.OnNotice = func(c *pgconn.PgConn, notice *pgconn.Notice) {
    if strings.Contains(notice.Message, "[PGINSIGHTS] Traffic Control:") {
        log.Printf("traffic control warning: %s", notice.Message)
        // Increment a metric, write to a structured log, etc.
    }
}

Collect these logs for a few hours of representative traffic before switching to Enforce. The pattern of which rules fire and how often tells you whether your limits need adjustment.

Putting it together

These patterns compose. A real application might layer several of them:

func (s *Server) setupMiddleware() http.Handler {
    mux := http.NewServeMux()
    // register routes...

    var handler http.Handler = mux
    handler = SQLTagMiddleware(handler)   // Pattern 2: route tags
    handler = AuthMiddleware(s.users, handler) // Pattern 4: tier tags
    return handler
}

// At startup, the job worker uses Pattern 5: Background jobs
jobDB, _ := newJobDB(dsn)

// New features use Pattern 3:
// DEPLOYMENT_TAG=new_checkout_v2 set in the deployment manifest

Traffic Control sees all of this as a combination of tags. A budget on tier='free' covers all free-tier traffic regardless of route. A budget on route='api-export' AND tier='free' covers a specific combination. Multiple matching budgets all apply simultaneously and queries must satisfy every budget they match. You can build layered policies without complicated rule logic.

Start in Warn mode, observe which budgets would fire during normal load, tighten the limits until only pathological cases trigger violations, then switch to Enforce. The getting started guide walks through this rollout process in detail.

The difference between a database outage and a degraded experience often comes down to whether you've decided in advance which traffic to shed. Traffic Control makes that decision explicit and configurable instead of leaving it to whichever query happens to win a resource race.

Scaling Postgres Connections With PgBouncer

Meg528 — Mon, 06 Apr 2026 16:59:32 +0000

By Ben Dicken

The Postgres process-per-connection architecture has an elegant simplicity, but hinders performance when tons of clients need to connect simultaneously.

The near-universal choice for solving this problem is PgBouncer. Though there are upcoming systems like Neki which will solve this problem in a more robust way, PgBouncer has proven itself an excellent connection pooler for Postgres.

PlanetScale gives you local PgBouncers by default, and makes it incredibly easy to add dedicated ones when needed. The challenge comes in determining the optimal configuration for your app, which is highly use-case dependent.

My aim with this article is to make every engineer well-equipped to tune PgBouncer with confidence.

Why PgBouncer?

PgBouncer is a lightweight connection pooler that sits between your application and Postgres.

PgBouncer is totally transparent, speaking the PostgreSQL wire protocol. From an app's perspective, it's just talking to a Postgres server, PgBouncer acting as a lightweight middleman. It can multiplex thousands of client connections onto tens of Postgres connections.

But why not just make 1000s of connections directly to Postgres? Unfortunately, the Postgres process-per-connection architecture doesn't scale well. Every connection forks a dedicated OS process consuming 5+ MB of RAM and adding context-switching overhead. PgBouncer solves this by maintaining a pool of reusable server connections, reducing resource consumption and letting PostgreSQL handle far more concurrent clients than its native max_connections would otherwise allow.

It's best-practice to keep the count of direct connections to Postgres small. Tens of connections for smaller instances. Hundreds for larger servers.

This is too restrictive for the way modern apps are built. We frequently want thousands of simultaneous connections to the database. PgBouncer gives Postgres that capability while keeping the total number of forked processes low.

At PlanetScale, we recommend using PgBouncer for all application traffic, only resorting to direct connections for administrative tasks and a few other narrow cases.

How to use PgBouncer

PgBouncer maintains a pool of pre-established Postgres server connections. When an app / client needs a database connection, it connects to PgBouncer, and then PgBouncer uses one of the pre-existing pooled connections to pass along the message. When the client is done, the connection returns to the pool for reuse. A single pooled Postgres connection can serve hundreds or thousands of client PgBouncer connections over its lifetime.

When all pool connections are in use, PgBouncer queues the client until one becomes available rather than rejecting it. If the wait exceeds query_wait_timeout (default: 120 seconds), the client is disconnected with an error.

Whereas the Postgres default port is 5432, PgBouncer defaults to 6432. Typically, switching from a direct connection to a PgBouncer connection is as simple as switching the port in your client connection string.

This is true on PlanetScale, with a twist: We give you three options for using PgBouncer:

Local PgBouncer

Every Postgres database includes a local PgBouncer running on the same server as the primary. Connect using the same credentials as usual, just swap the port to 6432.

Dedicated primary PgBouncer

A dedicated primary PgBouncer runs on separate nodes from Postgres, making for better HA characteristics. It connects to the local PgBouncer first, which then connects to Postgres. Client connections persist through resizes, upgrades, and most failovers. Connect by appending |your-pgbouncer-name to your username on port 6432.

Dedicated Replica PgBouncer

Dedicated replica PgBouncers are similar to dedicated primary ones, but connect to the replicas instead (and don't route through the local bouncer).

We recommend this if your applications make heavy use of replicas for read queries.

The three pooling modes

PgBouncer operates in one of three modes.

Session pooling assigns a server connection for the lifetime of the client connection, releasing it only when the client disconnects. This means there's a 1:1 mapping between client and server connections. It's not incredibly useful, as it does little to reduce Postgres connection count. At times, it's helpful for limiting thundering herds of connections.

Statement pooling assigns a server connection for a single SQL statement and releases it immediately after. This means multi-statement transactions are disallowed entirely. Most apps need this, so not useful in 99% of cases!

Transaction pooling is the only sensible option. It assigns a server connection for the duration of a transaction, returning it to the pool the moment a COMMIT or ROLLBACK completes. This is great for most use cases, though there are a few unsupported features in this mode.

PlanetScale only supports Transaction pooling, given the clear weaknesses of the two. When you absolutely need one of those few unsupported features, keep them to a small number of direct-to-Postgres connections.

Knob all the things

PgBouncer's configuration centers on a hierarchy of connection limits. These control how many client connections are accepted, how many server connections are maintained per pool, and how those relate to PostgreSQL's own max_connections.

The connection chain works like this:

max_client_conn is the maximum number of application connections PgBouncer will accept. Because connections are lightweight in PgBouncer, this is frequently set in the 1000s.

default_pool_size controls the number of server connections per (user, database) pair that PgBouncer will make to Postgres. How to configure this depends quite a bit on your schema and access patterns. In an environment where you have a single server with many logical databases and many Postgres users, this will likely need to be set low, between 1-20. When you have a single logical database and a small number of Postgres roles, this can be set much higher.

The total potential PgBouncer ↔ Postgres connections equals num_pools × default_pool_size. With 4 users and 2 databases we get 4 x 2 = 8 pools. At a pool size of 20, PgBouncer could open up to 160 connections to PostgreSQL.

max_db_connections and max_user_connections are hard caps that span across all PgBouncer pools for a given database or user, respectively. They act as safety valves to prevent pool arithmetic from exceeding PostgreSQL limits. These default to 0 (no limit) but can be set in some scenarios for safety.

All the above are PgBouncer settings. The key setting on the Postgres side is max_connections. The total server connections must stay below this number. We should always keep a few available direct connections reserved for admin tasks and other emergency scenarios. We NEVER want PgBouncer to use all of the connections!

All of this can be summarized in a nice formula:

In Postgres, we can explicitly set superuser_reserved_connections, which is handy for ensuring some connections are reserved for the superuser.

Tuning examples

Thinking through some practical scenarios makes this easier to reason about.

Small server

First, let's think through having a PlanetScale PS-80 (1 vCPU, 8GB RAM per node), a single multi-tenant database, and 3 distinct Postgres users we use for clients connecting through PgBouncer: one for the app servers (app), one for an analytics service (analytics), and one for a data exporter (export).

We want to keep direct Postgres connections low, so we set the Postgres max_connections=50.

Though it's a small database, we sometimes have 100s of app servers making simultaneous connections during peak load. We set the PgBouncer max_client_conn=500.

The majority of these connections come from a single Postgres user + database pair (the app-server user connecting to the main logical database). Because of this, we set default_pool_size=30 but then also set max_user_connections=30 and max_db_connections=40. This prevents connections from the app user from utilizing all of the backend connections, ensuring some are always available for the other two. This also means PgBouncer can never hold more than 40 connections to Postgres in total, ensuring 10 are always available for other services or administrative tasks.

Large server

Now for the same scenario, but with much higher traffic, requiring an M-2650 (32 vCPU, 256GB RAM per node). We'll again have the same 3 distinct Postgres users.

Just because we now have 32x the CPU power, we don't want to increase direct Postgres connections by 32x. It's still wise to keep this on the lower side, so we will settle in at a max of max_connections=500.

We now sometimes have 1000s of app servers making simultaneous connections during peak load. We set the PgBouncer max_client_conn=10000.

Because of this, we set default_pool_size=200 but then also set max_user_connections=200 and max_db_connections=450 for similar reasons as the previous example. No one user can use more than 200 connections.

This also means PgBouncer can never hold more than 450 connections to Postgres, ensuring 50 remain available for other purposes, or if we add services requiring features of direct connections like session variables.

Single-tenant configuration

Though single-tenant architectures are generally discouraged, some organizations prefer this or have inherited such a structure. In this case, we'll assume there is a unique logical database co-located on the same Postgres server for every customer.

Say in this case we have a PlanetScale M-1280 (16 vCPUs, 128GB RAM per node), 200 distinct logical databases (for 200 tenants) and a unique Postgres role for each, for the sake of isolating permissions. There is a 1:1 mapping between each logical database and the Postgres user querying it.

This is a much different connection pattern than the previous example. We have 200 roles connecting to 200 logical databases all on the same host, and want to ensure we can scale to thousands of combined connections without hitting limits.

We'll center this around max_connections=400.

If any one tenant peaks at 20 connections, then we'll set PgBouncer's max_client_conn=5000 (includes a bit of buffer).

Recall that default_pool_size controls connections per (user, database) pool. Since each of the 200 users connects to exactly one database, there are 200 active pools. Even a modest default_pool_size results in a large number of server connections: for example, a default_pool_size of 10 would yield a theoretical max of 200 × 10 = 2,000 server connections, far exceeding max_connections=400.

We'll set default_pool_size=2 (at most 2 PgBouncer <-> Postgres connections per pool). Since we have a clean user-to-logical-database mapping, we also set max_db_connections=2 and max_user_connections=2 to enforce this per-pool cap. The maximum total PgBouncer server connections is 200 × 2 = 400, matching max_connections=400.

A single tenant can have 10s or even 100s of connections to PgBouncer, but all these will get multiplexed through at most 2 direct Postgres connections.

App-side PgBouncers

In some deployments, it also makes sense to layer PgBouncer. You can run one PgBouncer on the app or client side to funnel many worker or process connections into a smaller egress set, then run another PgBouncer near Postgres as the final funnel into a tightly controlled number of direct database connections.

This is especially useful when you need connection pooling both close to compute and close to the database.

Multiple PgBouncers

In large-scale deployments, setting up multiple PgBouncers is useful for traffic isolation. When your web app, background workers, and other consumers all share one pool, a spike from one class of traffic can saturate the PgBouncer and delay everything else.

Giving each major consumer its own PgBouncer creates independent funnels with their own limits, pool sizing, and failure domains. That makes it easier to protect latency-sensitive app traffic from bursty worker traffic and tune each workload separately.

For an additional layer of protection, Database Traffic Control™ lets you enforce resource budgets on query traffic by pattern, application name, Postgres user, or custom tags — without needing separate infrastructure. The two approaches complement each other well: PgBouncer manages connections, Traffic Control manages resource consumption.

The key concepts

PgBouncer solves a fundamental architectural constraint in PostgreSQL: the process-per-connection model that makes every connection expensive. When working with PgBouncer, there are a few fundamental things to keep in mind:

Transaction pooling is the mode that matters. Every transaction, be it a single query or many, gets a dedicated connection from PgBouncer <-> Postgres while executing. After this, the connection can be re-used for another transaction, maybe on the same client, and maybe for another.
Use PgBouncer as much as possible. If you absolutely need features that are incompatible with transaction pooling, like LISTEN, session-level SET/RESET, or SQL PREPARE/DEALLOCATE, use a direct connection. In all other cases, the small latency penalty of PgBouncer is well worth the scalability and connection safety.
The key configs to pay attention to are: max_connections (Postgres), plus max_client_conn, default_pool_size, max_db_connections, and max_user_connections (PgBouncer).
Ensure things are configured to allow for direct connections, even when all PgBouncer connections are in use.

Behind the Scenes: How Database Traffic Control Works

Meg528 — Wed, 01 Apr 2026 19:13:49 +0000

By Patrick Reynolds

In March, we released Database Traffic Control™, a feature for mitigating and preventing database overload due to unexpectedly expensive SQL queries. For an overview, read the blog post introducing the feature, and to get started using it, read the reference documentation. This post is a deep dive into how the feature works.

Background

If you already know how Postgres and Postgres extensions work internally, you can skip this section.

A single Postgres server is made up of many running processes. Each client connection to Postgres gets its own dedicated worker process, and all SQL queries from that client connection run, one at a time, in that worker process. When a client sends a SQL query, the worker process parses it, plans it, executes it, and sends any results back to the client. Planning is a key step, in which Postgres takes a parsed query and turns it into a step-by-step execution plan that specifies the indexes to use, the order to load rows from multiple tables, and the operators that will be used to filter, aggregate, and join those rows. Most queries can be run using several different plans, so it's the planner's job to estimate the cost of the possible plans and pick the cheapest one.

Every part of how Postgres handles queries can be modified by extensions. Extensions can add new functions, new data types, new storage systems, and new authentication methods, among other things. (They can also add new failure modes, but that's a topic for another day.) Extensions can also passively observe and report on traffic, like PlanetScale's own pginsights extension that powers Query Insights.

Much of what Postgres extensions can do, they do using hooks. A hook is a function that runs before, after, or instead of existing Postgres functionality. Want to observe or replace the planner? There's a hook for that. Want to examine queries as they execute? There are three hooks for that. As of this writing, there are 55 hooks available to anyone writing Postgres extensions.

PlanetScale's pginsights extension installs hooks for the ExecutorRun and ProcessUtility functions, among others, to run timers and measure resource consumption while SQL statements execute. Since each hook wraps the original Postgres functionality, that means pginsights sees each query just before it executes and again just after it completes. Any time that has elapsed and any resources the worker process has consumed are directly attributable to that query. The extension does some aggregation, sends aggregate data periodically to a data pipeline, and returns control to Postgres to accept the next query.

Insights, hooks, and blocking queries

When we first started planning for Traffic Control, we knew we would use a Postgres extension with a hook on ExecutorRun to decide whether or not each statement would be allowed to run. Initially, we wrote a new extension for this. We soon realized that there are two ways to choose which queries to block: based on static analysis of the individual query, or based on cumulative measurements of resource usage over time. We split the extension along those lines. Blocking based on static analysis got merged into the project that became pg_strict. Blocking based on cumulative resource usage became Traffic Control.

It turns out Traffic Control needed the same hook points and much of the same information that pginsights already had. So rather than duplicate all that code and impose the extra runtime overhead of another extension, we taught pginsights how to block queries.

If there are any Traffic Control rules configured, then at the beginning of each query execution, the extension does four things:

It identifies all of the rules that match the tags and other metadata of the query. Each rule identifies a budget; multiple rules can map to the same budget.
It checks to see if any of the applicable budgets has reached its concurrency limit.
It checks if the query's estimated cost is higher than any applicable budget's per-query limit.
It checks to see if every applicable budget has enough available capacity for the query to begin execution. In the documentation, these parameters are described as the burst limit and the server share. As we'll see below, those parameters combine over time to describe the behavior of a leaky-bucket rate limiter.

If any budget fails any of these checks, then the query is warned or blocked, based on how the budget is configured.

Blocking a query just before it begins execution means the server spends no resources on the query, beyond the cost of the planner and the decision to block it. That's an improvement over schedulers like Linux cgroups, which let every task begin and simply starve them of resources if higher priority tasks exist in the system. It's also an improvement over the Postgres statement_timeout setting, which allows any overly expensive query to consume resources until it times out. Traffic Control blocks expensive, low priority queries before they begin.

Cost prediction

I glossed over something important in the last section: cost. The concurrency check is easy, because it just counts worker processes already assigned to the queries associated with a Traffic Control budget. But the other two checks — per-query cost and cumulative cost — require us to know what resources the query will consume before it even begins execution. How do we do that? We trust, but also don't trust, the planner.

A SQL query planner takes a parsed SQL statement and selects what it hopes is the most efficient series of steps to execute that query. To evaluate all the possible plans, the planner has to estimate the cost of each one. When you run EXPLAIN on a SQL statement, Postgres's planner shows the cost of each step in the chosen plan, as well as the overall total cost. The cost is measured in dimensionless units and is based on configurable weights assigned to each step the plan will take. There are a lot of variables that go into the plan cost, most of which you can ignore for the purposes of understanding Traffic Control. Just remember these two things: plan costs are roughly linear (a plan with double the cost should take something like double the time and resources to execute), and the relationship between plan costs and real-world resources is heavily dependent on what query you're running, what server you run it on, and what else is happening on that server at the moment.

Traffic Control compensates for those dependencies. We assume that there is an unknown constant k that we can multiply the plan cost by, to get the actual wall-clock time it will take to execute that query. But that constant is different for each query pattern and for each host. The constant may also change over time as the workload mix on the server changes and as tables grow and change. So it's not exactly a constant!

Traffic Control implements a hash table on each host, mapping query patterns to two averages: CPU time and planner cost estimates. Both are exponential moving averages, heavily weighting recent queries. Every time a query completes, we update both of those averages. The magical not-quite-constant k is the ratio of the two.

Each time a query comes in, Traffic Control multiplies the planner's estimated cost by k to guess how much CPU and/or wall-clock time the query will take. Based on that estimate, Traffic Control decides if the query can be allowed to begin. If it does, then at the end of query execution, Traffic Control updates the two averages for that query pattern so the k value will be more recent and more precise for the next query that arrives.

Leaky buckets

Two of the checks that Traffic Control performs for each query are easy: if the query's estimated cost is too high, block it. If too many queries in the same budget are already running, block it. But the final check — is there enough capacity in the budget to proceed — is harder. It's important, though! Many executions of a moderately expensive query can be even more damaging than a single very expensive query, and managing a budget over time is the best way to block queries that are only expensive in aggregate. Traffic Control considers the cumulative cost of queries in each configured budget.

Each budget is modeled as a reverse leaky bucket. Here's how that works. Each query that executes accumulates debt in the bucket. Any query that would cause the bucket to overflow with debt is blocked. Debt drains out over time, until the bucket is empty. The bucket has two important parameters: its size and its drain rate. The size dictates the burst limit, or what total resources queries under a given budget can use in a short amount of time. The drain rate dictates the server share, or what fraction of overall resources queries under a given budget can use in the long term.

Traditionally, leaky buckets work the other way: they start out full, they fill (but never overflow) with credits at a configured rate, traffic consumes credits, and if a bucket is ever empty, traffic gets blocked. We inverted the model for a simple reason: an empty bucket doesn't need to be stored. Over time, we may need to store many buckets for changing rules and changing query metadata. We can drop buckets with a zero debt level, meaning that we only need to store recently active buckets, instead of every possible bucket. We store as many buckets as will fit in a configurable amount of shared memory, and we evict them implicitly when their debt falls to zero.

There is no periodic task that drains debt from all buckets. Instead, each bucket is updated only when read. There is also no periodic task to evict buckets with a debt level of zero. Instead, adding a new bucket to the table evicts any that have already emptied, or whichever bucket is expected to become empty soonest.

Rule sets

One important goal for Traffic Control is that it can efficiently decide when not to block a query. After all, Traffic Control has to make that decision before each query is even allowed to begin execution. So the budget here is measured in microseconds. But we also want developers and database administrators to be able to configure as many rules as it takes to manage traffic to their application. So it's crucial that we can evaluate many rules quickly. Enter rule sets: a data structure that allows evaluating n rules in O(1) time.

Each rule has the form <key, value>, and it matches any query that has that same value for that same key. It's complicated a bit by the fact that value can be an IP address with a CIDR mask.

A rule set maps each <key, value> pair to a rule. Now, when a query comes in with metadata like username=postgres, app=commerce, controller=api, the rule set can quickly identify the rule for each of those pairs. Hence, for this query, there are just three lookups in the rule set, regardless of how many rules are configured.

Note that a rule set only identifies rules to consider. Each rule's budget is only checked if all its conditions match the query. A rule set is all about checking as few rules as possible. So, the sequence is: the rule set identifies a list of rules, that list is narrowed down to just the rules that actually match, and then the budgets for all the matching rules get checked to see if the query can proceed.

There are three exceptions to the O(1) target for identifying rules:

Rules for the remote_address key check for a match for each mask length. So if you have rules for ten different mask lengths, the rule set has to do as many as ten lookups to find the rule with the longest matching prefix.
Any conjunction rule — that is, a rule with multiple <key, value> pairs ANDed together — may be identified as a candidate for queries that match any one of the <key, value> pairs in the rule. So if you have conjunction rules with overlapping <key, value> pairs, the rule set may identify several or all of them as candidates for each query.
It is possible to add multiple rules for the exact same <key, value> pair. If you do that, any query with that exact <key, value> pair will get checked against all of those rules.

Applying new rules

Traffic Control is meant to be used both proactively and during incident response. For incident response, it's important that rules take effect quickly. And they do! Rules created or modified in the UI generally take effect at all database replicas in just 1-2 seconds. How?

Rules and budgets are stored as objects in the PlanetScale app. Any change to Traffic Control rules made in the UI or the API gets stored as rows in the planetscale database. Then it's serialized as JSON in the traffic_control.rules and traffic_control.budgets parameters for Postgres. Some Postgres parameters require restarting the server, but those two don't. So they cut the line and get sent immediately to postgresql.conf files on each database replica. Postgres reads the new config, and each worker process parses it into a rule set as soon as it completes whatever query it's executing. The rule set is in place before the next query begins.

One big advantage of using Postgres configuration files, rather than sending configuration over SQL connections, is robustness on a busy server. You may want new Traffic Control rules most urgently when Postgres is using 100% of its available CPU, 100% of its worker processes, or both. Changing config files is possible even when opening a new SQL connection and issuing statements wouldn't be.

Wrap up

Traffic Control uses the hooks and the performance measurements that Query Insights already implemented, then bolts on a system for sorting query traffic into budgets and warning or blocking queries that exceed those budgets. Each query can be warned or blocked if it's individually too expensive, if too many other queries are already running under the same budget, or if recent and concurrent queries under the same budget have consumed too many resources in the aggregate. Traffic Control implements a dynamic model per query pattern that leverages the existing Postgres planner to estimate the real-world cost of a query before it begins to execute. Leaky buckets impose limits on both traffic bursts and the long-term average fraction of server resources assigned to any individual budget.

Taken as a whole, these elements implement Traffic Control, which gives developers and database administrators powerful new tools to identify, prioritize, and limit SQL traffic.

AI Fatigue Has Entered the Chat: How to Innovate Without Alienating Your Brand

Meg528 — Sun, 03 Nov 2024 17:24:04 +0000

It wasn’t that long ago that AI was something sensationalized mostly by high-budget movies like The Matrix. In 2024, however, we’re not living in a mind-bending alternate reality, dodging bullets and Agent Smith. Instead, we’re using artificial intelligence to optimize our blogs for search engines, create lifelike videos without any human actors, and write code within seconds to power the next app to hit the marketplace.

In my own day-to-day work, I’m exploring how to use AI to get more done, better — a less-than-frictionless transition since my background is in writing. (When AI blew up, writers feared that they’d be the first on the chopping block, and in some cases, they were.)

Brands raced to jump on the AI bandwagon — some, a little recklessly. A couple of years later, many are starting to feel the blowback: customers who want to talk to a human being, not an AI chatbot; people who want to read human-written words, not AI-generated; search engines penalizing websites for page after page of low-quality content; users who are struggling with the inescapably prolific amount of AI-created content in both Google and social media news feeds.

AI fatigue is here. What is it, what are the implications, and what can we do about it?

What is AI Fatigue?

The term “AI fatigue” refers to a general hesitation toward, lack of excitement for, or even suspicion or skepticism around using AI-driven technologies.

I experienced this myself very recently, while using a healthcare provider’s AI chatbot. I had the option to answer a series of questions and potentially get the information I needed, or I could speak with a representative directly. I wasn’t in the most patient mood and immediately opted for the human route. Why? Simple. The last few times I engaged with AI chatbots were a complete bust. (To be clear, I believe that AI chatbots can work stupendously and have happily utilized them in other moments.)

So, while organizations are racing to adopt AI solutions, customers might be whistling a different tune.

What Caused AI Fatigue?

The speed and intensity with which technology is progressing make it hard for some of us to keep up.

Think of older generations trying to figure out Facebook. Now, add AI on top of that. And technology is only gaining momentum. Some estimates [1] say that computers’ speed and power have typically doubled every 1.5 to two years since the 1960s.

The proof is in the pudding: About 90% of the world’s data [2] was generated within the last few years alone.

AI technology shows no signs of slowing down, which means we have to hustle to keep up. And, put simply, many people are tired of trying to do that. We’re always working so hard to try to understand the next big thing that we barely have an opportunity to slow down and just… be.

Interestingly, the Gartner Hype Cycle [3] for artificial intelligence reports that the hype around AI has far outweighed what the technology has actually delivered.

AI Gone Astray: When Technology Backfires

It feels like the widespread response from countless companies has been, “More AI!” And to be fair, in many cases, artificial intelligence has completely revolutionized the way some businesses are run and the experience they provide for their users.

But not always.

You might remember when CNET [4] was found to be publishing AI-generated articles in a less-than-transparent manner. The byline of these articles read “CNET Money Staff.” If you clicked on that byline, a popup appeared disclosing that the content was written by AI. To make matters worse, we then learned that more than half of these AI-generated articles contained significant errors and plagiarism.

When CNET’s parent company, Red Ventures, went to sell it, the blemish on their reputation was a hurdle — although they did eventually sell it [5] for over $100 million. (Some sources say it was closer to $250 million [6].) This was after paying $500 million for it four years earlier.

This is just one example of what can happen when we get greedy with AI. The ripple effect can be ghastly for both your bottom line and the people working to keep the lights on.

Where Do We Go From Here?

So, you now know what AI fatigue is. You’ve read some of the horror stories. Should you abandon AI completely? Absolutely not. For every AI failure, we can talk about many successes.

Plus, this technology isn’t going anywhere. We have two choices: Embrace it, or get left behind.

But here’s the key: Using it intentionally is critical.

What does this look like? Let’s go through some tips and examples.

1. Recognize That There’s a Time and a Place

The solution to all our woes is not to replace everything with AI. The approach should be much more purposeful.

I spoke with Apoorva Joshi [7], Senior AI Developer Advocate at MongoDB, who said, “The future isn’t about AI replacing humans; it’s about humans and AI working together. The path to success lies in collaboration, where human creativity and intelligence are enhanced by AI’s ability to drive innovation and help solve complex problems.”

As one example, when it comes to content production, AI can be an excellent complement, rather than a replacement. MongoDB’s Developer Center [8] is a valuable resource for developers around the globe. While these authors may use AI to formulate ideas, the content is written, reviewed, and fact-checked by humans. Why? Well, put simply, at the end of the day, these authors are responsible for the content. If something goes awry, “AI did it!” is no excuse.

Plus, humans do it better.

2. Prioritize Quality Over Quantity

AI coding assistants have completely changed the way we build applications, speeding up the development time and taking away a lot of the heavy lifting. The same can be said for the use of AI in content production.

Because the barrier to entry is now much lower, what we’ve seen is a huge surge in the number of apps hitting the market, blogs on search engine results pages, and videos going live on YouTube. This would be a positive change if more of these apps, blogs, and videos were of a better quality. Instead, many of us find ourselves struggling to swim through a flood of junk.

Take note: Producing more of something that’s low-quality doesn’t make it higher-quality. If you stop caring about creating amazing things and simply focus on creating more things, users will notice, and they will go somewhere else to find something better.

3. Keep the End Result in Mind

If you use AI in any capacity to build something, it doesn’t change what your ultimate goal should be:

Entertain your users.
Educate your users.
Solve a problem.

If you can’t answer how your product/service does one or more of these things, your job isn’t done.

AI can be used to create personalized songs that you can then gift to people you care about. We call that entertainment!

Reliable AI chatbots empower users by delivering relevant help docs so that they don’t have to wait in long queues — an excellent way to educate users and help them help themselves.

Vector search [9] allows users to find search results based not just on how well their keywords match but on the meaning behind them. Better search results, faster? Problem solved.

An AI Reset: Moving Forward With Renewed Energy

AI fatigue doesn’t have to be permanent, but we do need to shift our approach.

By this point, we’ve got at least a basic understanding of just how powerful AI is and what it’s capable of. We’ve tested it and applied it in countless ways. Some have been miraculous, and others have been disastrous.

Next, we iterate!

Using AI at the right time, under the right circumstances, and always for the betterment of users — consider this your north star, and you’ll never go wrong.

References

https://www.zippia.com/answers/is-technology-growing-exponentially/
https://leftronic.com/blog/how-fast-is-technology-growing-statistics
https://www.gartner.com/en/documents/5505695
https://futurism.com/cnet-for-sale-ai
https://www.charlotteobserver.com/news/business/article290791529.html
https://www.axios.com/2024/08/06/cnet-ziff-davis-red-ventures#
https://www.linkedin.com/in/apoorvajoshi95/
https://mdb.link/towards-ai-dc
https://mdb.link/vector-search-towards-ai

DEV Community: Meg528

Approaches to tenancy in Postgres

Definitions

Three approaches to tenant isolation

Good examples for multi-tenancy

Row-level isolation

Modeling tenants

Enforcing tenant filtering

Partitioning

Tenant data lifecycle

Schema-per-tenant

Safety concerns of SET search_path

Tenant data lifecycle

Database-per-tenant

Security considerations

Tenant data lifecycle

Protecting tenants from each other

Migrating to row-level multi-tenancy

Other examples for multi-tenancy

RLS sounds great until it isn't

Friends and family: Managing access

The party: Managing connections

Annoying neighbor: Attack Surface

A large keyring: Performance Implications

It's your house: Permission ownership

Making a ham sandwich: Stricter patterns

End of the day

How to do it right

Footnotes

High Memory Usage in Postgres is Good, Actually

Why Postgres wants your memory

Memory usage compared to CPU usage

Two kinds of memory usage

1. Cache (active, inactive, and memory mapped)

2. Process memory (RSS)

What is Resident Set Size?

Investigating memory usage while debugging performance

In summary

Stripe Projects Partnership: Provision PlanetScale Postgres and MySQL Databases From the Stripe CLI

What is Stripe Projects?

Try it out today

Resources and feedback

Enhanced Tagging in Postgres Query Insights

Adding Tags

Feature Overview

Query Pattern Tags

Database Tags

Query Filter

Implementation

Sending Tags

Cardinality Reduction

Per-tag Limits

Per-interval Limits

Tracking Tag Collapsing

Conclusion

Patterns for Postgres Traffic Control

Setup

Per-service isolation via Roles

Route-level tagging in an HTTP service

Feature flags and new deployments

Tier-based limits in multi-tenant apps

Background jobs and scripts

Handling blocked queries

Observing warn-mode notices

Putting it together

Scaling Postgres Connections With PgBouncer

Why PgBouncer?

How to use PgBouncer

Local PgBouncer

Dedicated primary PgBouncer

Dedicated Replica PgBouncer

The three pooling modes

Knob all the things

Tuning examples

Small server

Large server

Single-tenant configuration

App-side PgBouncers

Multiple PgBouncers

The key concepts

Safety concerns of `SET search_path`