Igor Nosatov

Posted on Jan 22

The Day I Chose VARCHAR(255) and Regretted It: A PostgreSQL String Story

#database #performance #postgres #sql

Meta Description: Discover why your PostgreSQL string type choices matter through a wardrobe analogy. Learn CHAR, VARCHAR, and TEXT differences with real performance data and migration horror stories.

Cover Image Concept: Split image showing a messy closet overflowing with clothes labeled "VARCHAR(255)" on one side, and a perfectly organized modular wardrobe system labeled "TEXT/VARCHAR/CHAR" on the other.

The $14,000 Migration

"Just use VARCHAR(255) for everything. It's a safe default."

That advice seemed reasonable when I started building our user profile system. Names? VARCHAR(255). Bios? VARCHAR(255). Even zip codes—you guessed it—VARCHAR(255).

Fast forward 18 months: 12 million user records, and I needed to let users write longer bios. The migration took 47 hours of downtime and cost us $14,000 in lost revenue.

All because I didn't understand how PostgreSQL actually handles strings.

The brutal truth? In PostgreSQL, VARCHAR(255) isn't the universal answer. In fact, it's often the wrong answer. Let me show you why, using a metaphor that finally made it click for me: your wardrobe.

Your Database is a Wardrobe (Stay With Me)

Think about how you organize clothes:

Fixed-size drawers (dress shirts, folded t-shirts) = CHAR(n)
Expandable shelves (sweaters that vary in bulk) = VARCHAR(n)
The closet rod (coats of wildly different lengths) = TEXT or VARCHAR (no limit)

Each storage method has trade-offs. Use a drawer for a winter coat? Waste of space. Hang a tie on the closet rod? Works, but inefficient.

PostgreSQL string types work the same way. Let me prove it.

The Three Closets: CHAR, VARCHAR, and TEXT

CHAR(n): The Rigid Drawer

CREATE TABLE user_codes (
    user_id BIGSERIAL PRIMARY KEY,
    country_code CHAR(2),        -- Always exactly 2 characters
    postal_code CHAR(5),         -- US zip codes, always 5 digits
    access_level CHAR(1)         -- 'A', 'B', 'C' only
);

-- What happens under the hood?
INSERT INTO user_codes (country_code, postal_code, access_level) 
VALUES ('US', '90210', 'A');

-- PostgreSQL pads with spaces!
SELECT 
    country_code,
    LENGTH(country_code) as len,
    '|' || country_code || '|' as visual
FROM user_codes;

/*
 country_code | len | visual
--------------+-----+--------
 US           |  2  | |US|
*/

-- But watch this gotcha:
INSERT INTO user_codes (country_code) VALUES ('U');
-- Stored as 'U ' (with trailing space!)

When to use CHAR:

Fixed-length data (country codes, US state abbreviations, gender codes)
Small, known sizes (MD5 hashes are always 32 chars)
Performance-critical lookups where every byte counts

The wardrobe rule: Use drawers only when every item is the exact same size.

VARCHAR(n): The Adjustable Shelf

CREATE TABLE users (
    id BIGSERIAL PRIMARY KEY,
    username VARCHAR(30),        -- Twitter-style limit
    email VARCHAR(254),          -- Max valid email length (RFC 5321)
    display_name VARCHAR(100)
);

-- No padding, only stores actual characters
INSERT INTO users (username, email, display_name)
VALUES ('alice', 'alice@example.com', 'Alice Anderson');

SELECT 
    username,
    LENGTH(username) as len,
    pg_column_size(username) as bytes
FROM users;

/*
 username | len | bytes
----------+-----+-------
 alice    |  5  |  10    (5 chars + 5 bytes overhead)
*/

-- The dangerous assumption:
ALTER TABLE users ALTER COLUMN username TYPE VARCHAR(50);
-- On 10M rows: This can take HOURS and lock your table!

Here's the plot twist that blew my mind:

-- Test: Does VARCHAR length limit affect performance?
CREATE TABLE test_varchar (
    short_limit VARCHAR(10),
    medium_limit VARCHAR(255),
    long_limit VARCHAR(10000)
);

-- Insert same data into all columns
INSERT INTO test_varchar 
SELECT 'hello', 'hello', 'hello' 
FROM generate_series(1, 1000000);

-- Check storage size
SELECT 
    pg_column_size(short_limit) as short_bytes,
    pg_column_size(medium_limit) as medium_bytes,
    pg_column_size(long_limit) as long_bytes
FROM test_varchar LIMIT 1;

/*
 short_bytes | medium_bytes | long_bytes
-------------+--------------+------------
     10      |      10      |     10
*/

MIND. BLOWN.

VARCHAR(10) and VARCHAR(10000) take the same storage space for the same data. The limit is just a constraint, not a storage directive!

TEXT: The Unlimited Closet Rod

CREATE TABLE posts (
    id BIGSERIAL PRIMARY KEY,
    content TEXT,                    -- No arbitrary limit
    author_bio TEXT,
    comments TEXT[]                  -- Array of text (PostgreSQL magic!)
);

-- TEXT and VARCHAR (no limit) are identical in PostgreSQL
-- Seriously. Check the source code. Same type internally.

-- This is valid:
INSERT INTO posts (content) 
VALUES (repeat('a', 1000000));  -- 1 million characters!

-- Storage is the same as VARCHAR
SELECT pg_column_size(content) FROM posts;

The shocking truth: In PostgreSQL, TEXT and VARCHAR (without length) are literally the same type. The only difference is that VARCHAR lets you add a length constraint.

-- These are functionally identical:
name TEXT
name VARCHAR

-- This adds a constraint:
name VARCHAR(50)

-- Which is equivalent to:
name TEXT CHECK (LENGTH(name) <= 50)

The Performance Mythology: Debunked

Myth #1: "VARCHAR(255) is faster than TEXT"

-- Performance test
CREATE TABLE test_performance (
    id SERIAL,
    text_col TEXT,
    varchar_col VARCHAR(255),
    varchar_unlimited VARCHAR
);

-- Insert 1 million rows with random string lengths
INSERT INTO test_performance 
SELECT 
    i,
    repeat('x', (random() * 200)::int),
    repeat('x', (random() * 200)::int),
    repeat('x', (random() * 200)::int)
FROM generate_series(1, 1000000) i;

-- Create identical indexes
CREATE INDEX idx_text ON test_performance(text_col);
CREATE INDEX idx_varchar ON test_performance(varchar_col);
CREATE INDEX idx_varchar_unl ON test_performance(varchar_unlimited);

-- Query performance test
EXPLAIN ANALYZE
SELECT * FROM test_performance WHERE text_col = 'test';

EXPLAIN ANALYZE
SELECT * FROM test_performance WHERE varchar_col = 'test';

/*
Results: Nearly IDENTICAL performance!
The difference is in nanoseconds, not milliseconds.
*/

Myth #2: "Always set a length limit to save space"

Nope. PostgreSQL uses TOAST (The Oversized-Attribute Storage Technique) for long strings automatically.

┌─────────────────────────────────────────────────┐
│  ROW STORAGE (How PostgreSQL Actually Works)    │
├─────────────────────────────────────────────────┤
│                                                 │
│  Short strings (< ~2KB):                        │
│  ┌──────────────────────────────────┐           │
│  │ Stored inline with row           │           │
│  │ [id][username][email][bio]       │           │
│  └──────────────────────────────────┘           │
│                                                 │
│  Long strings (> ~2KB):                         │
│  ┌──────────────────┐      ┌─────────────────┐ │
│  │ [id][username]   │─────>│  TOAST table    │ │
│  │ [email][pointer] │      │  [actual bio]   │ │
│  └──────────────────┘      └─────────────────┘ │
│                                                 │
│  VARCHAR(255) vs TEXT: Same storage strategy!  │
└─────────────────────────────────────────────────┘

The Real-World Decision Tree

Here's how I choose string types now:

-- 1. FIXED LENGTH DATA → CHAR
CREATE TABLE standard_codes (
    iso_country CHAR(2),           -- Always 2: 'US', 'UK', 'FR'
    iso_language CHAR(2),          -- Always 2: 'en', 'es', 'fr'
    state_code CHAR(2),            -- Always 2: 'CA', 'NY', 'TX'
    md5_hash CHAR(32),             -- Always 32: MD5 output
    uuid CHAR(36)                  -- Always 36: with hyphens
);

-- 2. CONSTRAINED LENGTH (Business Rule) → VARCHAR(n)
CREATE TABLE user_profiles (
    username VARCHAR(30),          -- Twitter: max 15, we allow 30
    email VARCHAR(254),            -- RFC 5321 max email length
    twitter_handle VARCHAR(16),    -- Twitter's actual limit
    phone VARCHAR(20),             -- E.164 format + extras
    slug VARCHAR(100)              -- URL-safe identifier
);

-- 3. UNBOUNDED BUT SEARCHABLE → TEXT with indexes
CREATE TABLE content (
    title TEXT,                    -- Could be long, need to search
    body TEXT,                     -- Articles, comments, etc.
    notes TEXT,

    -- Make searches fast
    CONSTRAINT title_reasonable CHECK (LENGTH(title) <= 500)
);

CREATE INDEX idx_content_title ON content USING GIN (to_tsvector('english', title));

-- 4. TRULY UNLIMITED → TEXT
CREATE TABLE documents (
    content TEXT,                  -- Could be books
    markdown TEXT,                 -- Code blocks, documentation
    json_data TEXT                 -- Before PostgreSQL 9.2 (use JSONB now!)
);

The Migration Horror Story (And How to Avoid It)

Remember that $14,000 mistake? Here's what happened:

-- Original (naive) design
CREATE TABLE user_profiles (
    id BIGSERIAL PRIMARY KEY,
    bio VARCHAR(255)
);

-- 18 months later, 12 million rows
SELECT COUNT(*) FROM user_profiles;  -- 12,458,392

-- The destructive migration
ALTER TABLE user_profiles 
ALTER COLUMN bio TYPE VARCHAR(1000);

/*
PostgreSQL had to:
1. Lock the ENTIRE table (no reads/writes)
2. Rewrite EVERY row (even unchanged ones)
3. Rebuild ALL indexes referencing this column
4. Update ALL foreign key constraints

Time: 47 hours
Downtime: Unacceptable
Cost: $14K in lost revenue
*/

The smart way:

-- If I'd used TEXT from the start:
CREATE TABLE user_profiles (
    id BIGSERIAL PRIMARY KEY,
    bio TEXT,
    CONSTRAINT bio_reasonable CHECK (LENGTH(bio) <= 255)
);

-- To extend later (NO table rewrite!):
ALTER TABLE user_profiles DROP CONSTRAINT bio_reasonable;
ALTER TABLE user_profiles ADD CONSTRAINT bio_reasonable CHECK (LENGTH(bio) <= 1000);

-- Time: < 1 second
-- Locks: Minimal
-- Cost: $0

The golden rule: Start with TEXT + CHECK constraint, not VARCHAR(n).

Edge Cases That Will Bite You

1. The Emoji Explosion

-- User stores emoji in username
INSERT INTO users (username) VALUES ('Alice 👩‍💻');

-- How long is it?
SELECT LENGTH('Alice 👩‍💻');  -- 8 characters
SELECT pg_column_size('Alice 👩‍💻');  -- 15 bytes!

-- With VARCHAR(10), this fits
-- But byte-wise, it's bigger than you think

-- The safer approach:
ALTER TABLE users ADD CONSTRAINT username_bytes 
    CHECK (pg_column_size(username) <= 50);

2. The Collation Trap

-- Case-insensitive searches
CREATE TABLE tags (
    name VARCHAR(50) COLLATE "en_US"  -- Case-sensitive by default
);

INSERT INTO tags VALUES ('PostgreSQL'), ('postgresql'), ('POSTGRESQL');

-- Returns 3 rows (all different!)
SELECT * FROM tags WHERE name = 'postgresql';

-- Better approach:
CREATE TABLE tags (
    name CITEXT  -- Case-insensitive text type
);

-- Or use functional index:
CREATE INDEX idx_tags_lower ON tags (LOWER(name));
SELECT * FROM tags WHERE LOWER(name) = LOWER('postgresql');

3. The NULL vs Empty String Confusion

-- These are DIFFERENT in PostgreSQL (unlike MySQL)
SELECT '' = NULL;        -- NULL (not true!)
SELECT '' IS NULL;       -- false
SELECT NULL IS NULL;     -- true

-- Implications for constraints:
CREATE TABLE profiles (
    bio TEXT NOT NULL DEFAULT ''  -- Empty string allowed
);

INSERT INTO profiles (bio) VALUES ('');     -- OK
INSERT INTO profiles (bio) VALUES (NULL);   -- ERROR!

-- Better: disallow both
CREATE TABLE profiles (
    bio TEXT,
    CONSTRAINT bio_not_empty CHECK (bio IS NOT NULL AND LENGTH(TRIM(bio)) > 0)
);

The Performance Optimization Playbook

1. Prefix Indexes for Long Strings

-- Don't index the entire TEXT column
CREATE INDEX idx_url_full ON pages(url);  -- Could be huge!

-- Index only the first N characters
CREATE INDEX idx_url_prefix ON pages((LEFT(url, 100)));

-- Use in queries:
SELECT * FROM pages WHERE LEFT(url, 100) = 'https://example.com/very/long/path...';

2. Full-Text Search Instead of LIKE

-- Slow: Sequential scan
SELECT * FROM articles WHERE content LIKE '%postgresql%';

-- Fast: GIN index with full-text search
CREATE INDEX idx_articles_fts ON articles 
    USING GIN (to_tsvector('english', content));

SELECT * FROM articles 
WHERE to_tsvector('english', content) @@ to_tsquery('postgresql');

-- With ranking!
SELECT 
    title,
    ts_rank(to_tsvector('english', content), query) AS rank
FROM articles, to_tsquery('postgresql & performance') query
WHERE to_tsvector('english', content) @@ query
ORDER BY rank DESC;

3. Compression for Archival Data

-- PostgreSQL 14+ supports LZ4 compression
CREATE TABLE archives (
    id BIGSERIAL PRIMARY KEY,
    document TEXT COMPRESSION lz4
);

-- Old approach (manual):
CREATE TABLE archives (
    id BIGSERIAL PRIMARY KEY,
    document_compressed BYTEA  -- Store compressed externally
);

-- Compress in application layer
INSERT INTO archives (document_compressed) 
VALUES (compress_function('very long text...'));

The Ultimate Cheat Sheet

/*
┌────────────────┬──────────────┬───────────────────────────────────┐
│ Use Case       │ Type         │ Example                           │
├────────────────┼──────────────┼───────────────────────────────────┤
│ Country code   │ CHAR(2)      │ 'US', 'UK', 'FR'                  │
│ State code     │ CHAR(2)      │ 'CA', 'NY', 'TX'                  │
│ MD5 hash       │ CHAR(32)     │ 'd41d8cd98f00b204e9800998ecf8427e'│
│ UUID           │ UUID         │ Use UUID type, not CHAR!          │
│                │              │                                   │
│ Username       │ VARCHAR(30)  │ Business constraint               │
│ Email          │ VARCHAR(254) │ RFC max length                    │
│ Slug           │ VARCHAR(100) │ URL identifier                    │
│ Phone          │ VARCHAR(20)  │ E.164 + formatting                │
│                │              │                                   │
│ Name           │ TEXT         │ No arbitrary limit                │
│ Bio            │ TEXT         │ + CHECK constraint                │
│ Comment        │ TEXT         │ Unknown length                    │
│ Article body   │ TEXT         │ + full-text index                 │
│ Description    │ TEXT         │ Flexibility                       │
│                │              │                                   │
│ Tag            │ CITEXT       │ Case-insensitive                  │
│ Enum-like      │ ENUM         │ Or CHECK (status IN (...))        │
└────────────────┴──────────────┴───────────────────────────────────┘
*/

-- My default template:
CREATE TABLE your_table (
    id BIGSERIAL PRIMARY KEY,

    -- Fixed-length codes
    country_code CHAR(2),

    -- Constrained business fields
    username VARCHAR(30) NOT NULL UNIQUE,
    email VARCHAR(254) NOT NULL UNIQUE,

    -- Flexible text fields
    display_name TEXT NOT NULL,
    bio TEXT,

    -- Add reasonable constraints
    CONSTRAINT bio_length CHECK (LENGTH(bio) <= 5000),
    CONSTRAINT display_name_not_empty CHECK (LENGTH(TRIM(display_name)) > 0)
);

The Lessons I Learned (The Hard Way)

TEXT is not "lazy"—it's often the smartest choice
VARCHAR(255) is cargo cult programming from MySQL days
Storage space is rarely your bottleneck—query performance is
Constraints protect you—use CHECK liberally
Migration pain is real—design for change from day one

The New Rules I Follow

-- ✅ DO: Start flexible, add constraints
CREATE TABLE users (
    bio TEXT,
    CONSTRAINT bio_length CHECK (LENGTH(bio) <= 500)
);

-- ❌ DON'T: Start rigid, migrate painfully
CREATE TABLE users (
    bio VARCHAR(255)  -- Will require table rewrite to extend
);

-- ✅ DO: Use domain types for reusability
CREATE DOMAIN email AS TEXT
    CHECK (VALUE ~ '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$');

CREATE TABLE users (
    email email NOT NULL UNIQUE
);

-- ✅ DO: Index smartly
CREATE INDEX idx_user_search ON users 
    USING GIN (to_tsvector('english', bio));

-- ❌ DON'T: Index entire large text columns
CREATE INDEX idx_bio ON users(bio);  -- Probably too big

Your Turn: The Challenge

Look at your current database schema. I bet you have VARCHAR(255) fields that should be TEXT. Here's a safe migration strategy:

-- 1. Add the new column
ALTER TABLE your_table ADD COLUMN bio_new TEXT;

-- 2. Copy data (in batches for large tables)
UPDATE your_table SET bio_new = bio WHERE id BETWEEN 1 AND 100000;
UPDATE your_table SET bio_new = bio WHERE id BETWEEN 100001 AND 200000;
-- ... continue in batches

-- 3. Add constraint
ALTER TABLE your_table 
    ADD CONSTRAINT bio_new_length CHECK (LENGTH(bio_new) <= 1000);

-- 4. Swap in a transaction
BEGIN;
ALTER TABLE your_table DROP COLUMN bio;
ALTER TABLE your_table RENAME COLUMN bio_new TO bio;
COMMIT;

-- 5. Celebrate! 🎉