Kanahiro Iguchi for MIERUNE

Posted on Aug 3

What does occur when making a million tables on Amazon Aurora PostgreSQL?

#aws #postgres #postgis

A million tables in PostgreSQL

Recently I have been designing and developing a web application with PostgreSQL. In the architecture, many tables will be created in PostgreSQL, possibly a million.

How many table can PostgreSQL handle?

It seems that there is a hard limit in one database up to about 4 billion (= maximum number of Uint32).

Then, is it okay up to 4B tables? It looks not. Some articles say that a limit will come before reaching to that hard limits.

https://dev.to/kspeakman/breaking-postgres-with-too-many-tables-4pg0
https://www.reddit.com/r/PostgreSQL/comments/voz3uh/will_number_of_tables_limit_performance/
https://stackoverflow.com/questions/22395883/postgresql-what-is-the-maximum-number-of-tables-can-store-in-postgresql-databas
https://gis.stackexchange.com/questions/294101/healthy-number-of-tables-in-postgresql-schema

Therefore, I ran a durability test that Aurora PostgreSQL can handle a million of tables with nice performance.

Test condition

I would like to judge Aurora works good in such condition, try to:

Make 1 million tables with 10000 rows including geometry column (PostGIS)
- Each tables has no relation to other tables.
Run queries:
- to select rows from a table
- to count how many tables exists
Make a snapshot
Restore from a snapshot

Tests ran on PostgreSQL 16 with ServerlessV2 instance type.

Result

create tables

Okay, but it takes a week.

queries

select rows from a table

Perform full-scan to one table.

EXPLAIN ANALYZE SELECT * FROM public.features_table_00502897;

                                                         QUERY PLAN                                                          
-----------------------------------------------------------------------------------------------------------------------------
 Seq Scan on features_table_00502897  (cost=0.00..264.00 rows=10000 width=100) (actual time=0.013..0.877 rows=10000 loops=1)
 Planning Time: 0.675 ms
 Execution Time: 1.346 ms

No problem occured.

Count how many tables exists

EXPLAIN (ANALYZE, BUFFERS) SELECT count(*) FROM information_schema.tables WHERE table_schema = 'public';

                 QUERY PLAN                                                                                                                                                  
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=314857.84..314857.85 rows=1 width=8) (actual time=2401.424..2406.264 rows=1 loops=1)
   Buffers: shared hit=3135772
   ->  Gather  (cost=314857.63..314857.84 rows=2 width=8) (actual time=2397.817..2406.254 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=3135772
         ->  Partial Aggregate  (cost=313857.63..313857.64 rows=1 width=8) (actual time=2392.081..2392.085 rows=1 loops=3)
               Buffers: shared hit=3135772
               ->  Nested Loop Left Join  (cost=1.63..313677.60 rows=72010 width=0) (actual time=0.272..2359.793 rows=333334 loops=3)
                     Buffers: shared hit=3135772
                     ->  Hash Join  (cost=1.07..248114.22 rows=72010 width=4) (actual time=0.262..941.881 rows=333334 loops=3)
                           Hash Cond: (c.relnamespace = nc.oid)
                           Buffers: shared hit=135766
                           ->  Parallel Seq Scan on pg_class c  (cost=0.00..246555.93 rows=288040 width=8) (actual time=0.030..837.999 rows=333400 loops=3)
                                 Filter: ((relkind = ANY ('{r,v,f,p}'::"char"[])) AND (pg_has_role(relowner, 'USAGE'::text) OR has_table_privilege(oid, 'SELECT, INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER'::text) OR has_any_column_privilege(oid, 'SELECT, INSERT, UPDATE, REFERENCES'::text)))
                                 Rows Removed by Filter: 1666738
                                 Buffers: shared hit=135665
                           ->  Hash  (cost=1.06..1.06 rows=1 width=4) (actual time=0.117..0.118 rows=1 loops=3)
                                 Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                 Buffers: shared hit=29
                                 ->  Seq Scan on pg_namespace nc  (cost=0.00..1.06 rows=1 width=4) (actual time=0.094..0.100 rows=1 loops=3)
                                       Filter: ((NOT pg_is_other_temp_schema(oid)) AND (nspname = 'public'::name))
                                       Rows Removed by Filter: 3
                                       Buffers: shared hit=29
                     ->  Nested Loop  (cost=0.56..0.90 rows=1 width=4) (actual time=0.004..0.004 rows=0 loops=1000002)
                           Buffers: shared hit=3000006
                           ->  Index Scan using pg_type_oid_index on pg_type t  (cost=0.43..0.75 rows=1 width=8) (actual time=0.003..0.003 rows=0 loops=1000002)
                                 Index Cond: (oid = c.reloftype)
                                 Buffers: shared hit=3000006
                           ->  Index Only Scan using pg_namespace_oid_index on pg_namespace nt  (cost=0.13..0.15 rows=1 width=4) (never executed)
                                 Index Cond: (oid = t.typnamespace)
                                 Heap Fetches: 0
 Planning:
   Buffers: shared hit=385
 Planning Time: 3.087 ms
 Execution Time: 2406.424 ms

It took two seconds and more. It seems that too many tables decrease performance on queries for system catalogs.

Make a snapshot and restore

Tried on AWS Management Console and no problem occured.

Conclusion

There is no critical problem even when a million tables in one database, on Aurora PostgreSQL. Aurora PostgreSQL has different architecture than PostgreSQL and it might be effective for such situation of too many tables. I'm not sure but I need to mention that that test would end as different results on "normal" PostgreSQL.

On the other hand, queries to system catalogs has bad performance with too many tables as some articles mentioned. It may be better to take other approach if you can, or you need to accept this trade-off.

DEV Community