William Andrews

Posted on May 27 • Originally published at devcrate.net

Base64 explained — what it is, when to use it, and the gotchas that bite developers

#base64 #javascript #security #beginners

You see a long string of letters and numbers ending in == and wonder what it is. You paste a JWT into a tool and the middle section is mostly readable. You embed an image in an HTML email and the src attribute is a wall of characters. You upload a PDF to an API and the docs tell you to "send it as Base64." They're all the same encoding, and most developers use it without ever really understanding what it does.

This guide covers what Base64 actually is, when you should reach for it, the common mistakes (including the biggest one — assuming it's encryption), URL-safe variants, padding rules, and how to encode and decode it in every major language.

What Base64 actually is

Base64 is an encoding that converts binary data into ASCII text using 64 specific characters: A-Z, a-z, 0-9, plus + and /. The = character is used for padding at the end. Every 3 bytes of input become exactly 4 Base64 characters of output — meaning Base64 increases data size by roughly 33%.

Input bytes:   "Hi"                  (2 bytes: 0x48 0x69)
Binary:        01001000 01101001
Group in 6s:   010010 000110 1001(00)   ← last group padded with zeros
Base64 chars:  S      G      k    =     ← '=' = padding

Result:        "SGk="

The math: 64 characters means each character represents 6 bits. The lowest common multiple of 6 bits (one Base64 char) and 8 bits (one byte) is 24 bits — which is 3 bytes or 4 Base64 chars. That's why Base64 always works in groups of 4 output characters, and why padding exists at all.

The biggest misconception: Base64 is not encryption

This catches developers and non-developers alike. Base64 looks like gibberish, so it feels like a secret. It isn't. Anyone can decode Base64 instantly — there's no key, no password, no algorithm to crack. It's a transparent transformation, like writing in a different alphabet.

// "Encrypted" password?
"cGFzc3dvcmQxMjM="

// Decoded in one line
atob("cGFzc3dvcmQxMjM=")
// → "password123"

Base64 is encoding, not encryption. Use it to transport data safely through text-only channels — never to hide data. If you need confidentiality, use real encryption: AES, NaCl/libsodium, or TLS for data in transit.

When to use Base64

Base64 exists to move binary data through systems that expect text. The most common cases:

Embedding binary in JSON or XML — neither format supports raw bytes. APIs that accept file uploads as part of a JSON payload use Base64 to represent the file.

Data URLs in HTML/CSS — data:image/png;base64,iVBORw0KGgo... lets you embed an image inline without a separate HTTP request. Useful for small icons and email signatures.

HTTP Basic Auth — the Authorization header sends credentials as Basic <base64-of-username:password>. This is also a perfect example of why Base64 isn't encryption — Basic Auth is only secure when paired with HTTPS.

JWTs — JSON Web Tokens consist of three Base64URL-encoded sections separated by dots. The header and payload are readable JSON; only the signature is opaque.

Email attachments — SMTP is technically a 7-bit text protocol, so attachments have been Base64-encoded by default since the MIME standard.

Cryptographic keys and certificates — PEM files (the -----BEGIN CERTIFICATE----- blocks) wrap Base64-encoded binary keys.

Padding — why some Base64 ends in = and some doesn't

The = at the end of Base64 strings is padding. It exists because Base64 works in groups of 3 input bytes, and not every input is a multiple of 3 bytes long. When the input is short by 1 or 2 bytes, the encoder pads the output with = characters so the result is always a multiple of 4 characters.

Input length (mod 3)   Padding   Example
─────────────────────────────────────────
0 (multiple of 3)      none      "Man"  → "TWFu"
1                      ==        "M"    → "TQ=="
2                      =         "Ma"   → "TWE="

Some encoders and protocols strip the padding to save bytes. JWTs do this — the Base64URL encoding inside a JWT has no padding at all. If you're manually decoding Base64 from a JWT, you may need to add the padding back before some decoders will accept it.

// JavaScript: add padding back to an unpadded Base64 string
function pad(b64) {
  const remainder = b64.length % 4;
  return remainder ? b64 + '='.repeat(4 - remainder) : b64;
}

Base64 vs Base64URL — the variant that matters for the web

Standard Base64 uses + and / as its 62nd and 63rd characters. Both have special meaning in URLs: + means "space" in query strings, and / is a path separator. Putting standard Base64 in a URL without further encoding breaks things.

Base64URL (defined in RFC 4648) solves this by swapping those characters: + becomes -, and / becomes _. It also typically omits padding. The result is safe to drop directly into URLs, filenames, and HTTP headers.

Standard Base64:   "abc/d+ef=="
Base64URL:         "abc_d-ef"

// Convert one to the other
const toUrlSafe = (b64) => b64.replace(/\+/g, '-').replace(/\//g, '_').replace(/=+$/, '');
const fromUrlSafe = (b64u) => {
  const b64 = b64u.replace(/-/g, '+').replace(/_/g, '/');
  return pad(b64); // add padding back
};

JWTs use Base64URL. So do most modern token formats, OAuth state parameters, and anything else that travels in a URL.

Encoding and decoding in JavaScript

JavaScript has two built-in functions: btoa() (binary-to-ASCII, encode) and atob() (ASCII-to-binary, decode). The names are confusing — they don't work the way you'd expect for arbitrary binary data.

// Simple ASCII strings — these work
btoa("Hello, world!");
// → "SGVsbG8sIHdvcmxkIQ=="

atob("SGVsbG8sIHdvcmxkIQ==");
// → "Hello, world!"

// Unicode strings — this BREAKS
btoa("héllo");
// → DOMException: invalid character

// Correct way for Unicode: encode to UTF-8 first
function utf8ToBase64(str) {
  return btoa(unescape(encodeURIComponent(str)));
}
function base64ToUtf8(b64) {
  return decodeURIComponent(escape(atob(b64)));
}

utf8ToBase64("héllo");   // → "aMOpbGxv"
base64ToUtf8("aMOpbGxv"); // → "héllo"

// Modern alternative (Node 16+, modern browsers)
const bytes = new TextEncoder().encode("héllo");
const b64 = btoa(String.fromCharCode(...bytes));

For binary data like file uploads, work with ArrayBuffer or Uint8Array:

// Convert a file to Base64 (browser)
async function fileToBase64(file) {
  const buffer = await file.arrayBuffer();
  const bytes = new Uint8Array(buffer);
  let binary = '';
  for (const byte of bytes) binary += String.fromCharCode(byte);
  return btoa(binary);
}

// FileReader alternative — gives you a data URL
function fileToDataUrl(file) {
  return new Promise(resolve => {
    const reader = new FileReader();
    reader.onload = () => resolve(reader.result);
    reader.readAsDataURL(file);
  });
}

Encoding and decoding in Python

import base64

# Encode a string
encoded = base64.b64encode(b"Hello, world!")
# → b"SGVsbG8sIHdvcmxkIQ=="

# Decode
decoded = base64.b64decode(b"SGVsbG8sIHdvcmxkIQ==")
# → b"Hello, world!"

# URL-safe variant — for JWTs, URLs, filenames
url_safe = base64.urlsafe_b64encode(b"data with /and+chars")
# → b"ZGF0YSB3aXRoIC9hbmQrY2hhcnM="

# Encode a file
with open("photo.jpg", "rb") as f:
    encoded_file = base64.b64encode(f.read()).decode("ascii")

Encoding and decoding on the command line

# macOS and Linux — encode
echo -n "Hello, world!" | base64
# → SGVsbG8sIHdvcmxkIQ==

# Decode
echo "SGVsbG8sIHdvcmxkIQ==" | base64 -d
# → Hello, world!

# Encode a file
base64 photo.jpg > photo.txt

# Decode a file
base64 -d photo.txt > photo.jpg

# Watch out — many systems wrap output at 76 characters by default
# Use -w 0 (GNU) or no flag (BSD) to disable wrapping
echo -n "long content..." | base64 -w 0

Encoding and decoding in SQL

-- PostgreSQL
SELECT encode('Hello, world!'::bytea, 'base64');
-- → SGVsbG8sIHdvcmxkIQ==

SELECT convert_from(decode('SGVsbG8sIHdvcmxkIQ==', 'base64'), 'UTF8');
-- → Hello, world!

-- MySQL 8+
SELECT TO_BASE64('Hello, world!');
-- → SGVsbG8sIHdvcmxkIQ==

SELECT FROM_BASE64('SGVsbG8sIHdvcmxkIQ==');
-- → Hello, world!

Common bugs and how to avoid them

The line-wrap trap. Some implementations (notably MIME and OpenSSL) wrap Base64 output at 64 or 76 characters with newlines. Other implementations reject input that contains newlines. If you're seeing "invalid character" errors decoding what looks like valid Base64, strip whitespace first.

The padding mismatch. JWTs and URL-safe Base64 typically omit padding. Many decoders require it. If decoding fails, calculate how many = characters to add: (4 - (length % 4)) % 4 of them.

The UTF-8 assumption. Base64 encodes bytes, not characters. Encoding a string assumes you know what character encoding it's in. Always make the encoding explicit (UTF-8 is almost always the right answer) and decode back to bytes before treating the result as a string.

The size surprise. Base64 increases payload size by 33%. For small assets it doesn't matter. For a 10 MB file embedded in JSON, it does — you're sending 13.3 MB over the wire. For larger files, prefer multipart uploads.

Treating it as a secret. Worth saying twice: Base64 is not encryption. Don't store passwords, API keys, or other sensitive data as Base64 expecting it to be hidden. If you can see the encoded string, you can see the original.

I'm William, the developer behind DevCrate. The Base64 tool exists because I got tired of dropping pseudo-random strings into shady online converters to see what was inside. It encodes and decodes both standard and URL-safe variants, handles files, and never sends a single byte off your machine.

If there's a Base64 case this guide didn't cover, drop it in the comments — I read every one.

DEV Community