ToolDeck for ToolDeck

Posted on Mar 25 • Originally published at tooldeck.top

Base64 Decode in Python — Complete Guide (b64decode, urlsafe, padding fix)

#python #base64 #tutorial #programming

When an API returns a content field that looks like eyJob3N0IjogImRiLXByb2Qi…, or your secrets manager hands you an encoded credential, or you need to extract a JWT payload — Python base64 decode is your first stop. The built-in base64 module handles all of it, but the small details around bytes vs strings, URL-safe alphabets, and missing padding catch almost every developer at least once.

This guide covers base64.b64decode(), urlsafe_b64decode(), automatic padding repair, decoding from files and HTTP responses, CLI tools, input validation, and four common mistakes with before/after fixes — all runnable Python 3.8+ examples. If you just need a quick one-off decode without writing code, ToolDeck's Base64 Decoder handles both standard and URL-safe Base64 instantly in your browser.

Key takeaways:

base64.b64decode(s) is built into Python stdlib — no install required; always returns bytes, not str
Chain .decode("utf-8") after b64decode() to convert bytes to a Python string
For URL-safe Base64 (uses - and _), use base64.urlsafe_b64decode() — standard in JWTs, OAuth tokens, Google API credentials
Fix "Incorrect padding" with: padded = s + "=" * (-len(s) % 4)
Set validate=True on external input to raise binascii.Error on non-Base64 characters

What is Base64 Decoding?

Base64 is an encoding scheme that represents arbitrary binary data as a string of 64 printable ASCII characters: A–Z, a–z, 0–9, +, and /, with = used as padding. Every 4 Base64 characters encode exactly 3 original bytes, so the encoded form is roughly 33% larger than the source. Decoding reverses the process — transforming the ASCII representation back into the original bytes.

Base64 does not encrypt data. It is purely a binary-to-text encoding:

# Before — Base64 encoded
eyJob3N0IjogImRiLXByb2QubXljb21wYW55LmludGVybmFsIiwgInBvcnQiOiA1NDMyLCAidXNlciI6ICJhcHBfc3ZjIn0=

# After — decoded
{"host": "db-prod.mycompany.internal", "port": 5432, "user": "app_svc"}

base64.b64decode() — Standard Library Decoding

Python's base64 module ships with the standard library — zero installation, always available. The primary function is base64.b64decode(s, altchars=None, validate=False). It accepts a str, bytes, or bytearray, and always returns bytes.

Minimal working example

import base64
import json

# Encoded database config received from a secrets manager
encoded_config = (
    "eyJob3N0IjogImRiLXByb2QubXljb21wYW55LmludGVybmFsIiwgInBvcnQiOiA1NDMyLCAid"
    "XNlciI6ICJhcHBfc3ZjIiwgInBhc3N3b3JkIjogInM0ZmVQYXNzITIwMjYifQ=="
)

# Step 1: decode Base64 bytes
raw_bytes = base64.b64decode(encoded_config)
# b'{"host": "db-prod.mycompany.internal", "port": 5432, ...}'

# Step 2: convert bytes → str
config_str = raw_bytes.decode("utf-8")

# Step 3: parse into a dict
config = json.loads(config_str)
print(config["host"])    # db-prod.mycompany.internal
print(config["port"])    # 5432

Note: b64decode() always returns bytes — never a string. If the original data was text, chain .decode("utf-8"). If it was binary (image, PDF, gzip), keep the bytes as-is.

Extended example: strict validation

import base64
import binascii

encoded_event = (
    "eyJldmVudCI6ICJvcmRlci5zaGlwcGVkIiwgIm9yZGVyX2lkIjogIk9SRC04ODQ3MiIsICJ"
    "0aW1lc3RhbXAiOiAiMjAyNi0wMy0xM1QxNDozMDowMFoiLCAicmVnaW9uIjogImV1LXdlc3QtMSJ9"
)

try:
    # validate=True raises binascii.Error on any non-Base64 character
    raw = base64.b64decode(encoded_event, validate=True)
    event = raw.decode("utf-8")
    print(event)
    # {"event": "order.shipped", "order_id": "ORD-88472", ...}

except binascii.Error as exc:
    print(f"Invalid Base64: {exc}")
except UnicodeDecodeError as exc:
    print(f"Not UTF-8 text: {exc}")

Decoding URL-safe Base64 (base64url)

Standard Base64 uses + and /, which are reserved characters in URLs. The URL-safe variant (RFC 4648 §5, also called "base64url") replaces them with - and _. This is the encoding used in JWT tokens, OAuth 2.0 PKCE challenges, Google Cloud credentials, and most modern web authentication flows.

Use base64.urlsafe_b64decode() — it handles - → + and _ → / substitution automatically.

import base64
import json

# JWT payload segment (the middle part between the two dots)
jwt_payload_b64 = (
    "eyJ1c2VyX2lkIjogMjg5MywgInJvbGUiOiAiYWRtaW4iLCAiaXNzIjogImF1dGgubXljb21w"
    "YW55LmNvbSIsICJleHAiOiAxNzQwOTAwMDAwLCAianRpIjogImFiYzEyMzQ1LXh5ei05ODc2In0"
)

# Restore padding before decoding (JWT deliberately omits '=')
padded = jwt_payload_b64 + "=" * (-len(jwt_payload_b64) % 4)

payload_bytes = base64.urlsafe_b64decode(padded)
payload = json.loads(payload_bytes.decode("utf-8"))

print(payload["role"])    # admin
print(payload["iss"])     # auth.mycompany.com
print(payload["user_id"]) # 2893

The expression "=" * (-len(s) % 4) adds exactly 0, 1, or 2 padding characters as needed and is a no-op when the string is already correctly padded. It is the idiomatic Python fix for JWT and OAuth padding issues.

base64.b64decode() Parameters Reference

Parameter	Type	Default	Description
`s`	bytes \	str \	bytearray
`altchars`	bytes \	None	None
`validate`	bool	False	When `True`, raises `binascii.Error` on non-Base64 characters; when `False`, whitespace is silently ignored

The validate=False default is intentional for PEM-formatted data and multi-line Base64. For API payloads or any untrusted input, pass validate=True.

Python Base64 Decode Padding Error — How to Fix It

The most frequent error when decoding Base64 in Python:

import base64
base64.b64decode("eyJ0eXBlIjogImFjY2VzcyJ9")
# binascii.Error: Incorrect padding

Base64 requires string lengths that are multiples of 4. JWTs and URLs strip trailing = padding to save bytes.

Option 1: Restore padding inline (recommended)

import base64, json

def b64decode_unpadded(data: str | bytes) -> bytes:
    """Decode Base64 with automatic padding correction."""
    if isinstance(data, str):
        data = data.encode("ascii")
    data += b"=" * (-len(data) % 4)
    return base64.b64decode(data)

token_a = "eyJ0eXBlIjogImFjY2VzcyJ9"       # 0 chars stripped
token_b = "eyJ0eXBlIjogInJlZnJlc2gifQ"      # 1 char stripped
token_c = "eyJ0eXBlIjogImFwaV9rZXkifQ=="    # already padded

for token in (token_a, token_b, token_c):
    result = json.loads(b64decode_unpadded(token).decode("utf-8"))
    print(result["type"])
# access
# refresh
# api_key

Option 2: URL-safe decode for JWT / OAuth

import base64, json

def decode_jwt_segment(segment: str) -> dict:
    """Decode a single JWT segment (header or payload)."""
    padded = segment + "=" * (-len(segment) % 4)
    raw = base64.urlsafe_b64decode(padded)
    return json.loads(raw.decode("utf-8"))

id_token_payload = (
    "eyJzdWIiOiAiMTEwNTY5NDkxMjM0NTY3ODkwMTIiLCAiZW1haWwiOiAic2FyYS5jaGVuQGV4"
    "YW1wbGUuY29tIiwgImhkIjogImV4YW1wbGUuY29tIiwgImlhdCI6IDE3NDA5MDAwMDB9"
)

claims = decode_jwt_segment(id_token_payload)
print(claims["email"])   # sara.chen@example.com
print(claims["hd"])      # example.com

Decode Base64 from a File and API Response

Reading and decoding a Base64 file

import base64, json
from pathlib import Path

def decode_attachment(envelope_path: str, output_path: str) -> None:
    """
    Read a JSON envelope with a Base64-encoded attachment,
    decode it, and write the binary output to disk.
    """
    try:
        envelope = json.loads(Path(envelope_path).read_text(encoding="utf-8"))
        encoded_data = envelope["attachment"]["data"]
        file_bytes = base64.b64decode(encoded_data, validate=True)
        Path(output_path).write_bytes(file_bytes)
        print(f"Saved {len(file_bytes):,} bytes → {output_path}")
    except FileNotFoundError:
        print(f"Envelope file not found: {envelope_path}")
    except (KeyError, TypeError):
        print("Unexpected envelope structure — 'attachment.data' missing")
    except base64.binascii.Error as exc:
        print(f"Invalid Base64 content: {exc}")

# {"attachment": {"filename": "invoice_2026_03.pdf", "data": "JVBERi0xLjQK..."}}
decode_attachment("order_ORD-88472.json", "invoice_2026_03.pdf")

Decoding Base64 from an HTTP API response

import base64, json, urllib.request

def fetch_and_decode_secret(vault_url: str, secret_name: str) -> str:
    url = f"{vault_url}/v1/secrets/{secret_name}"
    req = urllib.request.Request(url, headers={"X-Vault-Token": "s.internal"})

    try:
        with urllib.request.urlopen(req, timeout=5) as resp:
            body = json.loads(resp.read().decode("utf-8"))
            # Vault returns: {"data": {"value": "<base64>", "encoding": "base64"}}
            encoded = body["data"]["value"]
            return base64.b64decode(encoded).decode("utf-8")

    except urllib.error.URLError as exc:
        raise RuntimeError(f"Vault unreachable: {exc}") from exc
    except (KeyError, UnicodeDecodeError, base64.binascii.Error) as exc:
        raise ValueError(f"Unexpected secret format: {exc}") from exc

If you use requests, replace urllib.request with resp = requests.get(url, timeout=5, headers=headers) and body = resp.json(). The Base64 decoding logic is identical.

Command-Line Base64 Decoding

# Decode a Base64 string (Linux / macOS)
echo "eyJob3N0IjogImRiLXByb2QubXljb21wYW55LmludGVybmFsIn0=" | base64 --decode
# {"host": "db-prod.mycompany.internal"}

# Decode a file, save output
base64 --decode encoded_payload.txt > decoded_output.json

# Python's cross-platform CLI decoder (works on Windows too)
python3 -m base64 -d encoded_payload.txt

# Decode a JWT payload segment inline
echo "eyJ1c2VyX2lkIjogMjg5MywgInJvbGUiOiAiYWRtaW4ifQ" | python3 -c "
import sys, base64, json
s = sys.stdin.read().strip()
padded = s + '=' * (-len(s) % 4)
print(json.dumps(json.loads(base64.urlsafe_b64decode(padded)), indent=2))
"

For exploratory work where writing a shell pipeline feels like overkill, paste the string into ToolDeck's Base64 Decoder — it auto-detects URL-safe input and fixes padding on the fly.

Validating Base64 Input Before Decoding

import base64, binascii, re

# ── Option A: try/except (recommended) ──────────────────────────────────────

def safe_b64decode(data: str) -> bytes | None:
    """Return decoded bytes, or None if the input is not valid Base64."""
    try:
        padded = data + "=" * (-len(data) % 4)
        return base64.b64decode(padded, validate=True)
    except (binascii.Error, ValueError):
        return None

print(safe_b64decode("not-base64!!"))                     # None
print(safe_b64decode("eyJ0eXBlIjogInJlZnJlc2gifQ"))      # b'{"type": "refresh"}'


# ── Option B: regex pre-validation ──────────────────────────────────────────

_STANDARD_RE = re.compile(r"^[A-Za-z0-9+/]*={0,2}$")

def is_valid_base64(s: str) -> bool:
    stripped = s.rstrip("=")
    padded = stripped + "=" * (-len(stripped) % 4)
    return bool(_STANDARD_RE.match(padded))

print(is_valid_base64("SGVsbG8gV29ybGQ="))   # True
print(is_valid_base64("SGVsbG8gV29ybGQ!"))   # False

High-Performance Alternative: pybase64

For high-throughput pipelines processing thousands of payloads per second, pybase64 is a C-extension wrapper around libbase64 that is typically 2–5× faster than stdlib on large inputs.

pip install pybase64

import pybase64

# Drop-in replacement — identical API to stdlib base64
image_bytes = pybase64.b64decode(encoded_image, validate=False)

# URL-safe variant
token_bytes = pybase64.urlsafe_b64decode("eyJpZCI6IDQ3MX0=")
print(token_bytes)  # b'{"id": 471}'

The API is intentionally identical to base64 — swap the import and nothing else changes. Use it only when profiling confirms Base64 is actually a bottleneck.

Common Mistakes

Mistake 1: Forgetting to call .decode() on the result

# ❌ b64decode() returns bytes — this crashes downstream
raw = base64.b64decode("eyJ1c2VyX2lkIjogNDcxLCAicm9sZSI6ICJhZG1pbiJ9")
user_id = raw["user_id"]  # TypeError: byte indices must be integers

# ✅ decode bytes → str, then parse
raw = base64.b64decode("eyJ1c2VyX2lkIjogNDcxLCAicm9sZSI6ICJhZG1pbiJ9")
payload = json.loads(raw.decode("utf-8"))
print(payload["user_id"])  # 471

Mistake 2: Using b64decode() on URL-safe Base64 input

# ❌ JWT tokens use '-' and '_' — not in standard alphabet
jwt_segment = "eyJ1c2VyX2lkIjogMjg5M30"
base64.b64decode(jwt_segment)  # binascii.Error or silently wrong bytes

# ✅ use urlsafe_b64decode() for any token with '-' or '_'
padded = jwt_segment + "=" * (-len(jwt_segment) % 4)
data = base64.urlsafe_b64decode(padded)
print(json.loads(data.decode("utf-8")))  # {'user_id': 2893}

Mistake 3: Not fixing padding on stripped tokens

# ❌ JWTs strip '=' — this crashes
segment = "eyJ0eXBlIjogImFjY2VzcyIsICJqdGkiOiAiMzgxIn0"
base64.urlsafe_b64decode(segment)  # binascii.Error: Incorrect padding

# ✅ always add padding before urlsafe_b64decode()
padded = segment + "=" * (-len(segment) % 4)
result = json.loads(base64.urlsafe_b64decode(padded).decode("utf-8"))
print(result["type"])  # access

Mistake 4: Calling .decode("utf-8") on binary data

# ❌ PDFs, PNGs, ZIPs are not UTF-8 — this crashes
pdf_b64 = "JVBERi0xLjQKJeLjz9MKNyAwIG9iago8PC9U..."
pdf_text = base64.b64decode(pdf_b64).decode("utf-8")  # UnicodeDecodeError

# ✅ write binary directly to a file — no .decode() needed
pdf_bytes = base64.b64decode(pdf_b64)
Path("report_q1_2026.pdf").write_bytes(pdf_bytes)

Decoding Large Base64 Files

For files larger than ~50–100 MB, use a chunked approach to avoid loading everything into memory at once:

import base64

def decode_large_b64_file(input_path: str, output_path: str, chunk_size: int = 65536) -> None:
    """chunk_size must be a multiple of 4 to keep Base64 block boundaries aligned."""
    assert chunk_size % 4 == 0

    with open(input_path, "rb") as src, open(output_path, "wb") as dst:
        while True:
            chunk = src.read(chunk_size)
            if not chunk:
                break
            chunk = chunk.strip()
            if chunk:
                dst.write(base64.b64decode(chunk))

decode_large_b64_file("snapshot_2026_03.b64", "snapshot_2026_03.sql.gz")

For PEM certificates and MIME attachments with line wrapping, use base64.decodebytes() — it silently ignores whitespace and newlines.

Python Base64 Decoding Methods — Quick Comparison

Method	Alphabet	Padding	Best For
`base64.b64decode()`	Standard (+/)	Required	General-purpose, email, PEM
`base64.decodebytes()`	Standard (+/)	Ignored	PEM certs, MIME, multiline
`base64.urlsafe_b64decode()`	URL-safe (-_)	Required	JWT, OAuth, Google Cloud
`base64.b32decode()`	32-char (A–Z, 2–7)	Required	TOTP secrets, DNS-safe IDs
`base64.b16decode()`	Hex (0–9, A–F)	None	Hex checksums
`pybase64.b64decode()`	Standard (+/)	Required	High-throughput pipelines

Use b64decode() as your default. Switch to urlsafe_b64decode() the moment you see - or _ in the input — those characters are the unmistakable sign of URL-safe Base64. For one-off checks during development, this online Base64 decoder handles both alphabets and auto-repairs padding — no Python environment needed.

Frequently Asked Questions

How do I decode a Base64 string to a regular string in Python?

Call base64.b64decode(encoded) to get bytes, then call .decode("utf-8") on the result. The two steps are always separate because b64decode() only reverses the Base64 alphabet — it does not know whether the original content was UTF-8, Latin-1, or binary.

Why do I get "Incorrect padding" when decoding Base64 in Python?

Base64 strings must be a multiple of 4 characters long. JWTs and URLs strip trailing = padding. Fix it with padded = s + "=" * (-len(s) % 4). This formula adds exactly 0, 1, or 2 characters as needed.

What is the difference between b64decode() and urlsafe_b64decode()?

Both decode the same Base64 algorithm but with different alphabets. b64decode() uses + and /; urlsafe_b64decode() uses - and _. Mixing them up causes either a binascii.Error or silently corrupt output.

How do I decode a Base64-encoded image in Python?

Decode to bytes with base64.b64decode(encoded), then write those bytes directly to a file — do not call .decode("utf-8") on image data. If the input is a data URL (data:image/png;base64,...), strip the prefix first with _, encoded = data_url.split(",", 1).

Can I decode Base64 in Python without importing any module?

No reason to. The base64 module is part of Python's standard library — always available, implemented in C, zero dependencies.

DEV Community