PostgreSQL Error 22021: character not in repertoire
PostgreSQL error code 22021 (character_not_in_repertoire) occurs when you attempt to insert, convert, or process a character that does not exist within the allowed character repertoire of the current database or target encoding. This typically happens when multibyte characters (such as Korean, Japanese, emoji, or special Unicode symbols) are used in a database configured with a restrictive encoding like SQL_ASCII or LATIN1.
Top 3 Causes
1. Database and Client Encoding Mismatch
When your database encoding is SQL_ASCII or LATIN1 but your client sends UTF-8 encoded multibyte characters, PostgreSQL cannot map those bytes to the database's character repertoire.
-- Check current encodings
SELECT datname, pg_encoding_to_char(encoding) AS encoding
FROM pg_database
WHERE datname = current_database();
SHOW client_encoding;
SHOW server_encoding;
-- Fix: explicitly set client encoding
SET client_encoding TO 'UTF8';
-- Permanent fix per role
ALTER ROLE myapp_user SET client_encoding TO 'UTF8';
2. Using String Functions on Multibyte Data in SQL_ASCII Database
SQL_ASCII databases allow raw bytes to be stored but fail as soon as character-aware string functions like length(), upper(), or lower() are called on multibyte data.
-- This will fail in a SQL_ASCII database with multibyte characters
-- SELECT length('한글'); -- ERROR: 22021
-- Workaround: use octet_length (byte-level, encoding-agnostic)
SELECT octet_length('한글') AS byte_length;
-- Filter out non-ASCII characters before processing
SELECT length(regexp_replace(my_column, '[^\x00-\x7F]', '', 'g'))
FROM my_table;
-- Retrieve only ASCII-safe rows
SELECT *
FROM my_table
WHERE my_column ~ '^[\x00-\x7F]*$';
3. Failed Encoding Conversion via convert() / convert_to()
When converting strings between encodings, any character that has no mapping in the target encoding triggers this error. A common scenario is trying to convert a UTF-8 string containing emoji or Korean characters into LATIN1.
-- This will fail: Korean characters cannot be represented in LATIN1
-- SELECT convert('안녕하세요'::bytea, 'UTF8', 'LATIN1'); -- ERROR: 22021
-- Safe conversion function with exception handling
CREATE OR REPLACE FUNCTION safe_convert(p_text TEXT, p_from TEXT, p_to TEXT)
RETURNS BYTEA AS $$
BEGIN
RETURN convert(p_text::bytea, p_from, p_to);
EXCEPTION
WHEN character_not_in_repertoire THEN
RAISE WARNING 'Unconvertible character in: %', p_text;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
-- Usage
SELECT safe_convert('Hello', 'UTF8', 'LATIN1'); -- returns converted bytes
SELECT safe_convert('안녕하세요', 'UTF8', 'LATIN1'); -- returns NULL with warning
Quick Fix Solutions
If you cannot immediately recreate the database, apply these tactical fixes:
-- 1. Re-create database with UTF-8 encoding (recommended long-term fix)
CREATE DATABASE mydb_utf8
ENCODING = 'UTF8'
LC_COLLATE = 'en_US.UTF-8'
LC_CTYPE = 'en_US.UTF-8'
TEMPLATE = template0;
-- 2. Dump and restore with correct encoding
-- (run in shell)
-- pg_dump --encoding=UTF8 -d olddb -f dump.sql
-- psql -d mydb_utf8 -f dump.sql
-- 3. Strip problematic characters before insert
INSERT INTO my_table (my_column)
VALUES (regexp_replace('Hello 😊 World', '[^\x00-\x7F]', '', 'g'));
Prevention Tips
1. Always create databases with UTF-8 encoding from the start.
Set UTF8 as your standard and enforce it at the role level so every connection defaults to the correct encoding. This eliminates the most common source of 22021 errors entirely.
-- Set default encoding for all connections by a role
ALTER ROLE myapp_user SET client_encoding TO 'UTF8';
2. Validate input data before it reaches the database.
Add CHECK constraints or application-level validation to catch illegal characters early. Catching the problem at the boundary is far cheaper than debugging encoding errors in production.
-- Example: enforce ASCII-only input on a specific column
ALTER TABLE my_table
ADD CONSTRAINT chk_ascii_only
CHECK (my_column ~ '^[\x00-\x7F]*$');
Related Errors
-
22P05(untranslatable_character): Closely related; occurs when a character cannot be translated from client encoding to server encoding during transmission. -
22000(data_exception): The parent category error code for all data-related exceptions including22021.
📖 Want a more detailed guide?
Check out the full in-depth version (Korean) on oraerror.com — includes detailed analysis, additional SQL examples, and prevention tips.
Top comments (0)