Philip McClarence

Posted on Apr 9

PostgreSQL Foreign Data Wrappers: Cross-Database Queries Explained

#database #postgres #dataengineering #sql

PostgreSQL Foreign Data Wrappers: Cross-Database Queries Explained

PostgreSQL foreign data wrappers (FDWs) let you query remote databases, files, and external services as if they were local tables. Using CREATE FOREIGN TABLE, you can JOIN data across PostgreSQL instances, Oracle databases, CSV files, and dozens of other sources directly from SQL.

No ETL pipeline. No data duplication. Just standard SQL against remote data.

When You Need Cross-Database Queries

Applications often need data from multiple databases. A reporting system joins user data from one PostgreSQL instance with order data from another. A migration project reads from Oracle while writing to PostgreSQL. A data pipeline queries a remote analytics database without copying everything locally.

The traditional approach is ETL: extract, transform, load. It works but introduces latency (the local copy is always stale), complexity (pipeline to maintain), and storage costs (data duplication). For many use cases, querying the remote data directly is simpler and more current.

Setting Up postgres_fdw

The postgres_fdw extension is the most commonly used FDW. Here's the full setup:

-- Install the extension
CREATE EXTENSION IF NOT EXISTS postgres_fdw;

-- Create a foreign server
CREATE SERVER remote_analytics
    FOREIGN DATA WRAPPER postgres_fdw
    OPTIONS (
        host 'analytics-db.example.com',
        port '5432',
        dbname 'analytics',
        fetch_size '10000'  -- critical: default 100 is far too low
    );

-- Create user mapping (credentials for the remote connection)
CREATE USER MAPPING FOR current_user
    SERVER remote_analytics
    OPTIONS (user 'readonly_user', password 'secure_password');

That fetch_size option deserves emphasis. The default of 100 rows per batch means a query returning 1 million rows makes 10,000 round trips to the remote server. Setting it to 10,000 reduces this to 100 round trips. For analytical workloads, 50,000 is even better.

Creating Foreign Tables

You can define them individually or import entire schemas:

-- Individual foreign table
CREATE FOREIGN TABLE remote_orders (
    order_id BIGINT,
    customer_id BIGINT,
    customer_name TEXT,
    order_date DATE,
    order_total_amount NUMERIC(12, 2),
    status TEXT
)
SERVER remote_analytics
OPTIONS (schema_name 'public', table_name 'orders');

-- Or import multiple tables at once (PG9.5+)
IMPORT FOREIGN SCHEMA public
    LIMIT TO (customers, products, categories)
    FROM SERVER remote_analytics
    INTO foreign_data;

IMPORT FOREIGN SCHEMA is a huge time saver. It automatically creates local foreign table definitions matching the remote schema.

The Performance Trap: Pushdown Optimization

Here's where most FDW performance problems hide. PostgreSQL tries to push WHERE clauses, JOINs, sorts, and aggregations to the remote server. When pushdown works, the remote server does the heavy lifting and only sends the filtered results. When it doesn't, PostgreSQL fetches the entire remote table and filters locally.

-- Pushdown works: simple operators
SELECT * FROM remote_orders WHERE order_date >= '2025-01-01';

-- Pushdown FAILS: local function in WHERE
SELECT * FROM remote_orders WHERE my_custom_function(status) = true;
-- PostgreSQL fetches ALL rows, then filters locally

-- Pushdown works: JOINs between foreign tables on the SAME server
SELECT o.order_id, c.customer_name
FROM remote_orders o
JOIN remote_customers c ON o.customer_id = c.customer_id;

-- Pushdown FAILS: JOIN between foreign and local tables
SELECT o.order_id, l.label
FROM remote_orders o
JOIN local_labels l ON o.status = l.status_code;
-- Entire foreign table fetched, then joined locally

Always verify with EXPLAIN VERBOSE:

EXPLAIN VERBOSE
SELECT order_id, customer_name, order_total_amount
FROM remote_orders
WHERE order_date >= '2025-01-01'
  AND status = 'completed'
ORDER BY order_total_amount DESC
LIMIT 100;

Look for Remote SQL in the Foreign Scan node. If your WHERE clause appears in the remote SQL, pushdown is working. If the remote SQL is just SELECT * FROM orders, you have a problem.

The Statistics Gap

This one catches a lot of teams. Foreign tables are not analyzed by autovacuum. Without statistics, the planner assumes a default row estimate (typically 1,000 rows). If the remote table actually has millions of rows, the planner makes terrible decisions -- choosing nested loop joins when hash joins would be 100x faster.

-- Run ANALYZE on foreign tables manually
ANALYZE remote_orders;

-- Verify statistics were collected
SELECT
    relname,
    reltuples AS estimated_rows,
    relpages AS estimated_pages
FROM pg_class
WHERE relname = 'remote_orders';

Schedule this regularly. It's the most overlooked FDW maintenance task.

Checking Your FDW Setup

-- List all foreign servers
SELECT
    s.srvname AS server_name,
    f.fdwname AS wrapper_name,
    s.srvoptions AS server_options
FROM pg_foreign_server s
JOIN pg_foreign_data_wrapper f ON s.srvfdw = f.oid;

-- List all foreign tables
SELECT
    ft.foreign_table_schema AS local_schema,
    ft.foreign_table_name AS local_table,
    ft.foreign_server_name AS server_name
FROM information_schema.foreign_tables ft
ORDER BY ft.foreign_table_name;

-- Find foreign tables that haven't been analyzed
SELECT
    relname AS foreign_table_name,
    last_analyze,
    n_live_tup AS estimated_rows
FROM pg_stat_user_tables
WHERE relname IN (
    SELECT foreign_table_name
    FROM information_schema.foreign_tables
)
ORDER BY last_analyze NULLS FIRST;

Beyond PostgreSQL: Other FDW Options

The FDW ecosystem extends well beyond PostgreSQL-to-PostgreSQL:

-- Oracle FDW
CREATE SERVER oracle_prod
    FOREIGN DATA WRAPPER oracle_fdw
    OPTIONS (dbserver '//oracle-host:1521/ORCL');

-- File FDW for CSV files
CREATE EXTENSION file_fdw;
CREATE SERVER csv_files FOREIGN DATA WRAPPER file_fdw;
CREATE FOREIGN TABLE import_data (
    id INTEGER,
    name TEXT,
    value NUMERIC
) SERVER csv_files
OPTIONS (filename '/data/import.csv', format 'csv', header 'true');

Same SQL interface regardless of the data source.

Performance Optimization Tips

Materialized views for frequently accessed data

If you query the same remote data repeatedly and can tolerate some staleness:

CREATE MATERIALIZED VIEW mv_recent_orders AS
SELECT order_id, customer_name, order_total_amount, order_date
FROM remote_orders
WHERE order_date >= CURRENT_DATE - INTERVAL '30 days';

-- Refresh on schedule
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_recent_orders;

Local-speed queries with periodic refreshes, while the foreign table stays available for ad-hoc full access.

Tune fetch_size per table

-- Server-level default
ALTER SERVER remote_analytics OPTIONS (SET fetch_size '10000');

-- Override for a large fact table
ALTER FOREIGN TABLE remote_events OPTIONS (SET fetch_size '50000');

Security considerations

User mappings store passwords in plaintext in pg_user_mappings (visible to superusers). Use a dedicated read-only role on the remote server with minimal privileges. Rotate credentials periodically.

The Decision: FDW vs ETL

Use FDWs when:

You need real-time access to remote data
The remote dataset is small enough that query latency is acceptable
You want to avoid maintaining a sync pipeline

Use ETL when:

You query the same remote data thousands of times per day
Query latency must be sub-millisecond
The remote data needs transformation before use
The remote server cannot handle the additional query load

FDWs and ETL aren't mutually exclusive. Use a foreign table for development and ad-hoc queries, and a materialized view or ETL pipeline for production workloads that need guaranteed performance.

DEV Community

PostgreSQL Foreign Data Wrappers: Cross-Database Queries Explained

PostgreSQL Foreign Data Wrappers: Cross-Database Queries Explained

When You Need Cross-Database Queries

Setting Up postgres_fdw

Creating Foreign Tables

The Performance Trap: Pushdown Optimization

The Statistics Gap

Checking Your FDW Setup

Beyond PostgreSQL: Other FDW Options

Performance Optimization Tips

Materialized views for frequently accessed data

Tune fetch_size per table

Security considerations

The Decision: FDW vs ETL

Top comments (0)