A million tables in PostgreSQL
Recently I have been designing and developing a web application with PostgreSQL. In the architecture, many tables will be created in PostgreSQL, possibly a million.
How many table can PostgreSQL handle?
It seems that there is a hard limit in one database up to about 4 billion (= maximum number of Uint32).
Then, is it okay up to 4B tables? It looks not. Some articles say that a limit will come before reaching to that hard limits.
https://dev.to/kspeakman/breaking-postgres-with-too-many-tables-4pg0
https://www.reddit.com/r/PostgreSQL/comments/voz3uh/will_number_of_tables_limit_performance/
https://stackoverflow.com/questions/22395883/postgresql-what-is-the-maximum-number-of-tables-can-store-in-postgresql-databas
https://gis.stackexchange.com/questions/294101/healthy-number-of-tables-in-postgresql-schema
Therefore, I ran a durability test that Aurora PostgreSQL can handle a million of tables with nice performance.
Test condition
I would like to judge Aurora works good in such condition, try to:
- Make 1 million tables with 10000 rows including
geometry
column (PostGIS)- Each tables has no relation to other tables.
- Run queries:
- to select rows from a table
- to count how many tables exists
- Make a snapshot
- Restore from a snapshot
Tests ran on PostgreSQL 16 with ServerlessV2 instance type.
Result
create tables
Okay, but it takes a week.
queries
select rows from a table
Perform full-scan to one table.
EXPLAIN ANALYZE SELECT * FROM public.features_table_00502897;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Seq Scan on features_table_00502897 (cost=0.00..264.00 rows=10000 width=100) (actual time=0.013..0.877 rows=10000 loops=1)
Planning Time: 0.675 ms
Execution Time: 1.346 ms
No problem occured.
Count how many tables exists
EXPLAIN (ANALYZE, BUFFERS) SELECT count(*) FROM information_schema.tables WHERE table_schema = 'public';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=314857.84..314857.85 rows=1 width=8) (actual time=2401.424..2406.264 rows=1 loops=1)
Buffers: shared hit=3135772
-> Gather (cost=314857.63..314857.84 rows=2 width=8) (actual time=2397.817..2406.254 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=3135772
-> Partial Aggregate (cost=313857.63..313857.64 rows=1 width=8) (actual time=2392.081..2392.085 rows=1 loops=3)
Buffers: shared hit=3135772
-> Nested Loop Left Join (cost=1.63..313677.60 rows=72010 width=0) (actual time=0.272..2359.793 rows=333334 loops=3)
Buffers: shared hit=3135772
-> Hash Join (cost=1.07..248114.22 rows=72010 width=4) (actual time=0.262..941.881 rows=333334 loops=3)
Hash Cond: (c.relnamespace = nc.oid)
Buffers: shared hit=135766
-> Parallel Seq Scan on pg_class c (cost=0.00..246555.93 rows=288040 width=8) (actual time=0.030..837.999 rows=333400 loops=3)
Filter: ((relkind = ANY ('{r,v,f,p}'::"char"[])) AND (pg_has_role(relowner, 'USAGE'::text) OR has_table_privilege(oid, 'SELECT, INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER'::text) OR has_any_column_privilege(oid, 'SELECT, INSERT, UPDATE, REFERENCES'::text)))
Rows Removed by Filter: 1666738
Buffers: shared hit=135665
-> Hash (cost=1.06..1.06 rows=1 width=4) (actual time=0.117..0.118 rows=1 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: shared hit=29
-> Seq Scan on pg_namespace nc (cost=0.00..1.06 rows=1 width=4) (actual time=0.094..0.100 rows=1 loops=3)
Filter: ((NOT pg_is_other_temp_schema(oid)) AND (nspname = 'public'::name))
Rows Removed by Filter: 3
Buffers: shared hit=29
-> Nested Loop (cost=0.56..0.90 rows=1 width=4) (actual time=0.004..0.004 rows=0 loops=1000002)
Buffers: shared hit=3000006
-> Index Scan using pg_type_oid_index on pg_type t (cost=0.43..0.75 rows=1 width=8) (actual time=0.003..0.003 rows=0 loops=1000002)
Index Cond: (oid = c.reloftype)
Buffers: shared hit=3000006
-> Index Only Scan using pg_namespace_oid_index on pg_namespace nt (cost=0.13..0.15 rows=1 width=4) (never executed)
Index Cond: (oid = t.typnamespace)
Heap Fetches: 0
Planning:
Buffers: shared hit=385
Planning Time: 3.087 ms
Execution Time: 2406.424 ms
It took two seconds and more. It seems that too many tables decrease performance on queries for system catalogs.
Make a snapshot and restore
Tried on AWS Management Console and no problem occured.
Conclusion
There is no critical problem even when a million tables in one database, on Aurora PostgreSQL. Aurora PostgreSQL has different architecture than PostgreSQL and it might be effective for such situation of too many tables. I'm not sure but I need to mention that that test would end as different results on "normal" PostgreSQL.
On the other hand, queries to system catalogs has bad performance with too many tables as some articles mentioned. It may be better to take other approach if you can, or you need to accept this trade-off.
Top comments (0)