PostgreSQL Error 22P04: bad copy file format
PostgreSQL error code 22P04 (bad_copy_file_format) is thrown when the COPY command encounters a file whose format does not match the format specified in the command options. This typically happens during bulk data loads, ETL pipelines, or database migrations when the source file's actual structure — encoding, delimiter, or binary layout — differs from what PostgreSQL expects. Since COPY is an all-or-nothing operation by default, this error will halt your entire data load immediately.
Top 3 Causes
1. Format Mismatch (CSV / TEXT / BINARY)
The most common cause is specifying the wrong FORMAT option. PostgreSQL supports TEXT, CSV, and BINARY formats, and mixing them up — such as trying to read a binary dump file as CSV — will trigger this error immediately.
-- WRONG: Reading a binary file as CSV
COPY employees FROM '/data/employees.bin' WITH (FORMAT CSV);
-- CORRECT: Match the format to the actual file
COPY employees FROM '/data/employees.bin' WITH (FORMAT BINARY);
-- CORRECT: Standard CSV import with header
COPY employees FROM '/data/employees.csv' WITH (FORMAT CSV, HEADER true);
-- CORRECT: Tab-separated text file
COPY employees FROM '/data/employees.tsv' WITH (FORMAT TEXT, DELIMITER E'\t');
2. Wrong Delimiter or Quote Character
Even when the format is set to CSV, the delimiter might not be a comma. Files exported from European systems often use semicolons (;), while data warehouse exports may use pipes (|). If the delimiter setting is wrong, PostgreSQL cannot correctly parse rows and columns, resulting in format errors.
-- Semicolon-delimited file
COPY employees FROM '/data/employees.csv'
WITH (
FORMAT CSV,
DELIMITER ';',
HEADER true
);
-- Pipe-delimited file
COPY employees FROM '/data/employees.csv'
WITH (
FORMAT CSV,
DELIMITER '|',
HEADER true
);
-- Custom quote character (single quote instead of double quote)
COPY employees FROM '/data/employees.csv'
WITH (
FORMAT CSV,
DELIMITER ',',
QUOTE '''',
HEADER true
);
3. Encoding Mismatch or BOM Characters
Files generated on Windows or by tools like Microsoft Excel often include a UTF-8 BOM (\xEF\xBB\xBF) at the start of the file, or are saved in non-UTF-8 encodings (e.g., Latin-1, CP1252). PostgreSQL will fail to parse the first line correctly, throwing a format error.
-- Specify encoding explicitly to avoid mismatch
COPY employees FROM '/data/employees_latin.csv'
WITH (
FORMAT CSV,
HEADER true,
ENCODING 'LATIN1'
);
-- For UTF-8 BOM files, strip the BOM before loading (Linux shell):
-- sed -i '1s/^\xEF\xBB\xBF//' /data/employees.csv
-- Then load normally
COPY employees FROM '/data/employees.csv'
WITH (
FORMAT CSV,
HEADER true,
ENCODING 'UTF8'
);
Quick Fix Solutions
Always be explicit with every COPY option — never rely on defaults in production:
-- Production-safe COPY template
COPY target_table FROM '/path/to/file.csv'
WITH (
FORMAT CSV,
DELIMITER ',',
ENCODING 'UTF8',
HEADER true,
NULL ''
);
-- Use a staging table to safely validate before inserting into production
CREATE TABLE employees_staging (LIKE employees);
COPY employees_staging FROM '/data/employees.csv'
WITH (FORMAT CSV, HEADER true, ENCODING 'UTF8');
-- Validate
SELECT COUNT(*) FROM employees_staging;
SELECT * FROM employees_staging WHERE id IS NULL;
-- Promote to production after validation
INSERT INTO employees SELECT * FROM employees_staging;
TRUNCATE employees_staging;
Prevention Tips
Always explicitly declare FORMAT, DELIMITER, ENCODING, and HEADER in every
COPYstatement. Never assume defaults. Add a pre-load file validation step to your ETL pipeline using tools like Python'schardetfor encoding detection or the Linuxfilecommand to verify file types before PostgreSQL ever sees them.Use a staging table pattern for all production data loads. Load data into a temporary staging table first, run data quality checks (null checks, row counts, type validation), and only then move clean data into the production table. This approach isolates format errors from your live data and makes rollback trivial.
Related Errors
-
22P05untranslatable_character — Triggered when a character in the file cannot be converted to the database encoding. Often appears alongside22P04in encoding-related issues. -
22007invalid_datetime_format — Occurs when date/time columns in the file don't match PostgreSQL's expected format (DateStyle). Frequently seen together with format errors in CSV imports. -
23502not_null_violation — Raised when a NULL value in the file violates a NOT NULL constraint. Fixable by setting theNULL ''option to correctly identify empty strings as NULLs.
📖 Want a more detailed guide?
Check out the full in-depth version (Korean) on oraerror.com — includes detailed analysis, additional SQL examples, and prevention tips.
Top comments (0)