ToolDeck for ToolDeck

Posted on Apr 2 • Originally published at tooldeck.top

Base64 Encode in Python — Complete Guide (b64encode, urlsafe, JWT, files)

#python #base64 #tutorial #programming

When you build Python services that pass credentials in HTTP Basic Auth headers, embed binary assets in API responses, or store TLS certificates in environment variables, you end up writing base64 encode Python code on a regular basis. Python ships the base64 module in the standard library — no pip install needed — but the bytes-vs-string distinction and the difference between b64encode, urlsafe_b64encode, and encodebytes trip up developers more often than you might expect. For a quick encode without writing any code, ToolDeck's Base64 Encoder handles it instantly in the browser. This guide covers the full stdlib API, URL-safe encoding for JWTs, file and API response scenarios, CLI shortcuts, a high-performance alternative, and the four mistakes that appear most often during code review.

Key takeaways:

base64.b64encode() expects bytes, not str — always call .encode("utf-8") on the input string before passing it in
The return value is also bytes — call .decode("utf-8") or .decode("ascii") to get a plain str you can embed in JSON or HTTP headers
base64.urlsafe_b64encode() replaces + → - and / → _ but keeps = padding — strip it manually with .rstrip("=") for JWT segments
base64.encodebytes() inserts a \n every 76 characters (MIME format) — never use it for data URIs, JSON fields, or environment variables
pybase64 (C extension, drop-in API) encodes 2–10× faster than stdlib; worth it for high-throughput services processing large payloads

What is Base64 Encoding?

Base64 converts arbitrary binary data into a string built from 64 printable ASCII characters: A–Z, a–z, 0–9, +, and /. Every 3 input bytes map to exactly 4 Base64 characters. If the input length is not a multiple of 3, one or two = padding characters are appended. The encoded output is always about 33% larger than the original.

Base64 is not encryption — it provides no confidentiality whatsoever. Its purpose is transport safety: many protocols and storage systems were designed for 7-bit ASCII text and cannot safely carry arbitrary binary bytes. Base64 bridges that gap. Common Python use cases include HTTP Basic Auth headers, data URIs for inlining images in HTML or CSS, JWT token segments, email MIME attachments, and passing binary blobs through environment variables or JSON APIs.

# Before — raw credential pair
deploy-svc:sk-prod-9f2a1c3e8b4d

# After — Base64 encoded
ZGVwbG95LXN2Yzpzay1wcm9kLTlmMmExYzNlOGI0ZA==

base64.b64encode() — Standard Encoding Guide with Examples

base64.b64encode(s, altchars=None) is the primary encoding function in Python's stdlib. It lives in the base64 module, which ships with every Python installation. The function accepts a bytes object and returns a bytes object containing the ASCII Base64 representation. Python 3.6+ is assumed throughout this guide.

Minimal working example

import base64

# Encoding an API credential pair for an HTTP Basic Auth header
service_id = "deploy-svc"
api_key    = "sk-prod-9f2a1c3e8b4d"

credential_bytes   = f"{service_id}:{api_key}".encode("utf-8")
encoded_bytes      = base64.b64encode(credential_bytes)
encoded_str        = encoded_bytes.decode("ascii")  # bytes → str

print(encoded_str)
# ZGVwbG95LXN2Yzpzay1wcm9kLTlmMmExYzNlOGI0ZA==

import urllib.request

req = urllib.request.Request("https://api.internal/v1/deployments")
req.add_header("Authorization", f"Basic {encoded_str}")
# Header value: Basic ZGVwbG95LXN2Yzpzay1wcm9kLTlmMmExYzNlOGI0ZA==

Extended example — sort_keys, nested objects, round-trip decode

import base64
import json

# Encoding a structured server configuration for an env variable
server_config = {
    "host":           "db-primary.eu-west-1.internal",
    "port":           5432,
    "database":       "analytics_prod",
    "max_connections": 150,
    "ssl": {
        "mode":          "verify-full",
        "cert_path":     "/etc/ssl/certs/db-client.crt",
        "reject_self_signed": True,
    },
}

config_json    = json.dumps(server_config, sort_keys=True)
encoded_bytes  = base64.b64encode(config_json.encode("utf-8"))
encoded_str    = encoded_bytes.decode("ascii")

print(encoded_str[:60] + "...")
# eyJkYXRhYmFzZSI6ICJhbmFseXRpY3NfcHJvZCIsICJob3N0IjogImRi...

# Decode and round-trip
decoded_json   = base64.b64decode(encoded_str).decode("utf-8")
restored       = json.loads(decoded_json)

print(restored["host"])            # db-primary.eu-west-1.internal
print(restored["ssl"]["mode"])     # verify-full

Note: b64decode() is lenient by default — it silently ignores invalid characters including whitespace and newlines. Pass validate=True to raise a binascii.Error on any non-Base64 character. Use this when decoding untrusted input from external systems.

Encoding Non-ASCII and Unicode Strings in Python

Python 3 strings are Unicode by default. The base64 module operates on bytes, not on str — so you must encode the string to bytes before passing it in. The choice of encoding matters: UTF-8 handles every Unicode code point and is the right default for almost all use cases.

import base64

# Encoding multilingual content — user display names from an international platform
user_names = [
    "Мария Соколова",      # Cyrillic — U+041C and above
    "田中太郎",              # CJK ideographs — 3 bytes each in UTF-8
    "Sarah Chen",           # ASCII — 1 byte per character
    "José Rodríguez",       # Latin extended — é is 2 bytes in UTF-8
]

for name in user_names:
    encoded = base64.b64encode(name.encode("utf-8")).decode("ascii")
    decoded = base64.b64decode(encoded).decode("utf-8")

    print(f"Original : {name}")
    print(f"Encoded  : {encoded}")
    print(f"Roundtrip: {decoded}")
    print(f"Match    : {name == decoded}")
    print()

# Original : Мария Соколова
# Encoded  : 0JzQsNGA0LjRjyDQodC+0LrQvtC70L7QstCw
# Roundtrip: Мария Соколова
# Match    : True

base64 Module — Functions Reference

Function	Input	Returns	Description
`b64encode(s, altchars=None)`	bytes	bytes	Standard Base64 (RFC 4648 §4). altchars replaces the + and / characters with two custom bytes.
`b64decode(s, altchars=None, validate=False)`	bytes \	str	bytes
`urlsafe_b64encode(s)`	bytes	bytes	URL-safe Base64 (RFC 4648 §5). Uses - and _ instead of + and /. Keeps = padding.
`urlsafe_b64decode(s)`	bytes \	str	bytes
`encodebytes(s)`	bytes	bytes	MIME Base64: inserts \n every 76 characters and appends a trailing \n. For email/MIME only.
`decodebytes(s)`	bytes	bytes	Decodes MIME Base64. Ignores whitespace and embedded newlines.
`b16encode(s)`	bytes	bytes	Hex encoding (Base16). Each byte becomes two uppercase hex characters. No padding.
`b32encode(s)`	bytes	bytes	Base32 encoding. Uses A–Z and 2–7. Larger output than Base64; used in TOTP secrets.

The altchars parameter in b64encode accepts a 2-byte object that substitutes the + and / characters. Passing altchars=b'-_' produces output identical to urlsafe_b64encode but lets you control padding separately.

URL-Safe Base64 — urlsafe_b64encode() for JWTs and Query Parameters

Standard Base64 uses + and /, both of which are reserved characters in URLs. A + in a query string is decoded as a space, and / is a path separator. When the encoded value appears in a URL, a filename, or a cookie, you need the URL-safe variant: urlsafe_b64encode() substitutes - for + and _ for /.

JWTs use URL-safe Base64 without padding for all three segments (header, payload, signature). The padding must be stripped manually — Python's stdlib keeps it.

Encoding a JWT payload segment

import base64
import json

def encode_jwt_segment(data: dict) -> str:
    """Encode a dict as a URL-safe Base64 string without padding (JWT format)."""
    json_bytes = json.dumps(data, separators=(",", ":")).encode("utf-8")
    return base64.urlsafe_b64encode(json_bytes).rstrip(b"=").decode("ascii")

def decode_jwt_segment(segment: str) -> dict:
    """Decode a URL-safe Base64 JWT segment (handles missing padding)."""
    # Add back padding: Base64 requires length to be a multiple of 4
    padding  = 4 - len(segment) % 4
    padded   = segment + ("=" * (padding % 4))
    raw      = base64.urlsafe_b64decode(padded)
    return json.loads(raw)

# Build a JWT header and payload
header  = {"alg": "HS256", "typ": "JWT"}
payload = {
    "sub":       "usr_7c3a9f1b2d",
    "workspace": "ws_eu-west-1-prod",
    "role":      "data-engineer",
    "iat":       1741824000,
    "exp":       1741910400,
}

header_segment  = encode_jwt_segment(header)
payload_segment = encode_jwt_segment(payload)

print(header_segment)
# eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9

print(payload_segment)
# eyJzdWIiOiJ1c3JfN2MzYTlmMWIyZCIsIndvcmtzcGFjZSI6IndzX2...

# Verify round-trip
restored = decode_jwt_segment(payload_segment)
print(restored["role"])  # data-engineer

Note: urlsafe_b64decode() accepts both padded and unpadded input as of Python 3.x, but only if the characters are URL-safe (- and _). Never pass a standard Base64 string (with + or /) to urlsafe_b64decode — the mismatched characters will cause silent data corruption or a binascii.Error.

Encoding Files and API Responses in Python

In production code, Base64 encoding most commonly appears around files being transmitted and around responses from external APIs that deliver binary content. Both scenarios require careful handling of the bytes boundary.

Reading a file from disk and encoding it

import base64
import json
from pathlib import Path

def encode_file_to_base64(file_path: str) -> str:
    """Read a binary file and return its Base64-encoded representation."""
    try:
        raw_bytes = Path(file_path).read_bytes()
        return base64.b64encode(raw_bytes).decode("ascii")
    except FileNotFoundError:
        raise FileNotFoundError(f"File not found: {file_path}")
    except PermissionError:
        raise PermissionError(f"Permission denied reading: {file_path}")

# Attach a TLS certificate to a deployment manifest
cert_b64 = encode_file_to_base64("./ssl/service-client.crt")

deployment_manifest = {
    "service":     "payment-processor",
    "environment": "production",
    "region":      "eu-west-1",
    "tls": {
        "client_cert":     cert_b64,
        "cert_format":     "base64-pem",
    },
}

# Write the manifest — cert is safely embedded as a string
with open("./dist/deployment-manifest.json", "w") as f:
    json.dump(deployment_manifest, f, indent=2)

print(f"Certificate encoded: {len(cert_b64)} characters")

Encoding an HTTP API response for debugging

import base64
import requests  # pip install requests

def fetch_and_encode_binary(url: str, headers: dict | None = None) -> str:
    """Fetch a binary resource from an API and return it as Base64."""
    response = requests.get(url, headers=headers or {}, timeout=10)
    response.raise_for_status()  # raises HTTPError for 4xx/5xx

    content_type = response.headers.get("Content-Type", "unknown")
    encoded      = base64.b64encode(response.content).decode("ascii")

    print(f"Content-Type : {content_type}")
    print(f"Raw size     : {len(response.content):,} bytes")
    print(f"Encoded size : {len(encoded):,} characters")
    return encoded

# Example: download a signed PDF invoice from an internal billing API
invoice_b64 = fetch_and_encode_binary(
    "https://billing.internal/api/v2/invoices/INV-2026-0042/pdf",
    headers={"Authorization": "Bearer eyJhbGc..."},
)

# Attach to a notification payload
notification = {
    "recipient_id":  "team-finance",
    "invoice_id":    "INV-2026-0042",
    "attachment": {
        "filename":     "invoice-2026-0042.pdf",
        "content":      invoice_b64,
        "content_type": "application/pdf",
        "encoding":     "base64",
    },
}
print(f"Payload ready: {len(str(notification)):,} characters")

How to Base64 Encode an Image File in Python

Encoding an image to Base64 and embedding it as a data URI is the standard approach for HTML email templates, PDF generation, and self-contained HTML snapshots. The browser interprets the encoded string directly — no separate image request is needed. The same pattern works for any binary file type: PNG, JPEG, SVG, WebP, or PDF.

import base64
import mimetypes
from pathlib import Path

def image_to_data_uri(image_path: str) -> str:
    """Convert an image file to a Base64 data URI for inline HTML embedding."""
    path      = Path(image_path)
    mime_type = mimetypes.guess_type(image_path)[0] or "image/octet-stream"
    raw_bytes = path.read_bytes()
    encoded   = base64.b64encode(raw_bytes).decode("ascii")
    return f"data:{mime_type};base64,{encoded}"

# Embed product images inline in an HTML email template
hero_uri      = image_to_data_uri("./assets/product-hero-768px.png")
thumbnail_uri = image_to_data_uri("./assets/product-thumb-128px.webp")

html_fragment = f"""
<img src="{hero_uri}"
     alt="Product hero"
     width="768" height="432"
     style="display:block;max-width:100%" />
"""

print(f"PNG data URI starts with: {hero_uri[:60]}...")
# data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAwAAAAA...

Note: For SVG files, a data URI with URL-encoding (data:image/svg+xml,{encoded}) is often smaller than Base64 because SVG is text-based and Base64 inflates size by ~33%. Use Base64 for raster formats (PNG, JPEG, WebP) and URL-encoding for SVG.

Working with Large Files — Chunked Base64 Encoding

Loading an entire file into memory with Path.read_bytes() is fine for files up to ~50 MB. Above that threshold, peak memory usage becomes significant — a 200 MB file requires ~200 MB for the raw bytes plus ~267 MB for the Base64 output, totalling ~467 MB in a single process. For large files, read and encode in chunks instead.

The critical constraint: chunk size must be a multiple of 3 bytes. Base64 encodes 3 input bytes into exactly 4 output characters. If a chunk boundary falls on a non-multiple of 3, the encoder appends = padding mid-stream, making the concatenated output invalid.

Streaming encode to a file (no full-file memory load)

import base64
from pathlib import Path

CHUNK_SIZE = 3 * 1024 * 256  # 786,432 bytes — multiple of 3, ~768 KB per chunk

def encode_large_file(input_path: str, output_path: str) -> int:
    """
    Encode a large binary file to Base64 without loading it fully into memory.
    Returns the number of Base64 characters written.
    """
    total_chars = 0
    with open(input_path, "rb") as src, open(output_path, "w") as dst:
        while True:
            chunk = src.read(CHUNK_SIZE)
            if not chunk:
                break
            encoded_chunk = base64.b64encode(chunk).decode("ascii")
            dst.write(encoded_chunk)
            total_chars += len(encoded_chunk)
    return total_chars

# Encode a 300 MB product video for an asset delivery manifest
chars_written = encode_large_file(
    "./uploads/product-demo-4k.mp4",
    "./dist/product-demo-4k.b64",
)
print(f"Encoded: {chars_written:,} Base64 characters")
# Encoded: 407,374,184 Base64 characters

Switch from read_bytes() to chunked reading when the input file exceeds ~50–100 MB, or when your service processes many files concurrently and memory pressure becomes a concern. For files under 50 MB, the simpler b64encode(path.read_bytes()).decode() one-liner is faster and easier to reason about.

Command-Line Base64 Encoding with Python

Python ships a CLI interface for the base64 module — no additional tools needed. It works cross-platform, making it useful in CI pipelines and Windows environments where the system base64 command may not be available.

# ── python -m base64 ───────────────────────────────────────────────────
# Encode a string (pipe stdin)
echo -n "deploy-svc:sk-prod-9f2a1c3e8b4d" | python3 -m base64
# ZGVwbG95LXN2Yzpzay1wcm9kLTlmMmExYzNlOGI0ZA==

# Encode a file
python3 -m base64 ./ssl/service-client.crt

# Decode a Base64 string
echo "ZGVwbG95LXN2Yzpzay1wcm9kLTlmMmExYzNlOGI0ZA==" | python3 -m base64 -d

# Decode a Base64 file back to binary
python3 -m base64 -d ./dist/service-client.b64 > ./restored.crt

# ── Python one-liner — cross-platform, works on Windows ────────────────
# Encode a string
python3 -c "import base64,sys; print(base64.b64encode(sys.argv[1].encode()).decode())" "my-secret"
# bXktc2VjcmV0

# URL-safe encode (no padding)
python3 -c "import base64,sys; print(base64.urlsafe_b64encode(sys.argv[1].encode()).rstrip(b'=').decode())" "my-secret"
# bXktc2VjcmV0

# Encode a file inline (result on stdout)
python3 -c "import base64,sys; print(base64.b64encode(open(sys.argv[1],'rb').read()).decode())" ./config.json

Unlike the macOS system base64 command, python -m base64 does not wrap output at 76 characters by default. The output is a single unbroken line, which is what you want for environment variables, JSON fields, and HTTP headers.

High-Performance Alternative: pybase64

Python's stdlib base64 module is implemented in pure Python (with a thin C layer in CPython). For services that encode large payloads at high throughput — image processing pipelines, bulk export jobs, real-time telemetry ingestion — pybase64 is a drop-in replacement backed by libbase64, a SIMD-accelerated C library. Benchmarks show 2–10× throughput improvements depending on payload size and CPU architecture.

pip install pybase64

import pybase64

# pybase64 is a drop-in replacement — same function signatures as stdlib
sample_payload = b"x" * (1024 * 1024)  # 1 MB of binary data

# Standard encoding — identical output to base64.b64encode()
encoded = pybase64.b64encode(sample_payload)
decoded = pybase64.b64decode(encoded)
assert decoded == sample_payload

# URL-safe encoding — identical output to base64.urlsafe_b64encode()
url_safe = pybase64.urlsafe_b64encode(sample_payload)

# b64encode_as_string() returns str directly — no .decode() call needed
telemetry_event = b'{"event":"page_view","session_id":"sess_3a7f91c2","ts":1741824000}'
encoded_str: str = pybase64.b64encode_as_string(telemetry_event)

print(encoded_str[:48] + "...")
# eyJldmVudCI6InBhZ2VfdmlldyIsInNlc3Npb25faWQi...

# Throughput comparison (approximate, varies by hardware)
# stdlib  base64.b64encode(1 MB):   ~80 MB/s
# pybase64.b64encode(1 MB):         ~800 MB/s (SIMD path on AVX2 CPU)

Switch to pybase64 when profiling shows Base64 encoding as a bottleneck, or when you encode payloads above ~100 KB repeatedly. For one-off encoding of small strings (credentials, tokens), the stdlib is fast enough and has no install dependency.

Common Mistakes

Mistake 1 — Passing a str instead of bytes to b64encode()

Problem: b64encode() expects a bytes object. Passing a str raises TypeError: a bytes-like object is required immediately. Fix: always call .encode("utf-8") on the string before encoding.

# ❌ TypeError: a bytes-like object is required, not 'str'
webhook_secret = "wh-secret-a3f91c2b4d"
encoded = base64.b64encode(webhook_secret)  # crashes

# ✅ Encode the str to bytes first
webhook_secret = "wh-secret-a3f91c2b4d"
encoded = base64.b64encode(webhook_secret.encode("utf-8"))
# b'd2gtc2VjcmV0LWEzZjkxYzJiNGQ='

Mistake 2 — Forgetting to .decode() the bytes result

Problem: b64encode() returns bytes, not str. Embedding it directly in an f-string produces b'...' in the output, which is an invalid HTTP header value and breaks JSON serialisation. Fix: always call .decode("ascii") on the encoded result.

import base64

# ❌ Authorization header contains "b'c3ZjLW1vbml0b3I6c2stN2YzYTFi'"
credential = base64.b64encode(b"svc-monitor:sk-7f3a1b")
headers = {"Authorization": f"Basic {credential}"}

# ✅ Authorization: Basic c3ZjLW1vbml0b3I6c2stN2YzYTFi
credential = base64.b64encode(b"svc-monitor:sk-7f3a1b").decode("ascii")
headers = {"Authorization": f"Basic {credential}"}

Mistake 3 — Using encodebytes() where b64encode() is needed

Problem: encodebytes() inserts a \n every 76 characters (MIME line-wrapping) and appends a trailing newline. Storing this in a JSON field, an environment variable, or a data URI embeds literal newline characters that corrupt the value downstream. Fix: use b64encode() everywhere except MIME email composition.

import base64, json
from pathlib import Path

# ❌ encodebytes() adds \n every 76 chars — breaks JSON and env vars
cert_bytes = open("./ssl/root-ca.crt", "rb").read()
cert_b64 = base64.encodebytes(cert_bytes).decode()
config   = json.dumps({"ca_cert": cert_b64})  # newlines inside string value

# ✅ b64encode() produces a single unbroken string
cert_bytes = Path("./ssl/root-ca.crt").read_bytes()
cert_b64 = base64.b64encode(cert_bytes).decode("ascii")
config   = json.dumps({"ca_cert": cert_b64})  # clean single-line value

Mistake 4 — Decoding URL-safe Base64 with the standard decoder

Problem: URL-safe Base64 uses - and _ instead of + and /. Passing a URL-safe string to b64decode() silently produces wrong bytes for any segment that contained those characters — no exception is raised by default. Fix: use urlsafe_b64decode() for URL-safe input, or pass validate=True to detect the mismatch early.

import base64

# ❌ JWT payload segment uses URL-safe Base64 (- and _)
# b64decode() silently produces wrong bytes for those characters
jwt_segment = "eyJzdWIiOiJ1c3JfN2MzYTlmMWIyZCIsInJvbGUiOiJhZG1pbiJ9"
wrong = base64.b64decode(jwt_segment)  # silently wrong if - or _ present

# ✅ Use urlsafe_b64decode() for JWT and URL-safe input
jwt_segment = "eyJzdWIiOiJ1c3JfN2MzYTlmMWIyZCIsInJvbGUiOiJhZG1pbiJ9"
padding     = 4 - len(jwt_segment) % 4
raw         = base64.urlsafe_b64decode(jwt_segment + "=" * (padding % 4))
# b'{"sub":"usr_7c3a9f1b2d","role":"admin"}'

Python Base64 Encoding Methods — Quick Comparison

Method	URL-safe chars	Padding	Line breaks	Returns	Requires install
`b64encode()`	❌ + and /	✅ = padding	❌ none	bytes	No
`urlsafe_b64encode()`	✅ - and _	✅ = padding	❌ none	bytes	No
`b64encode(altchars=b"-_")`	✅ custom 2 chars	✅ = padding	❌ none	bytes	No
`encodebytes()`	❌ + and /	✅ = padding	✅ \n every 76 chars	bytes	No
`pybase64.b64encode()`	❌ + and /	✅ = padding	❌ none	bytes	pip install
`pybase64.b64encode_as_string()`	❌ + and /	✅ = padding	❌ none	str	pip install

Choose b64encode() for the vast majority of use cases: HTTP headers, JSON fields, environment variables, and data URIs. Switch to urlsafe_b64encode() whenever the output will appear in a URL, a filename, a cookie, or a JWT segment. Use encodebytes() only when composing MIME email attachments. Reach for pybase64 when encoding payloads above ~100 KB in a hot path.

Frequently Asked Questions

Why does base64.b64encode() return bytes instead of a string?

Python 3 strictly separates text (str) and binary data (bytes). base64.b64encode() operates on binary data and returns binary data — even though the output characters happen to be printable ASCII. This design is intentional: it forces you to be explicit about encoding boundaries. To get a str, call .decode("ascii") or .decode("utf-8") on the result. Since valid Base64 output contains only ASCII characters, both encodings produce identical results.

What is the difference between b64encode() and encodebytes() in Python?

b64encode() produces a single unbroken Base64 string — the correct choice for HTTP headers, JSON fields, data URIs, environment variables, and JWT segments. encodebytes() (formerly encodestring() in Python 2) inserts a newline character every 76 bytes and appends a trailing newline. This is the MIME line-wrapping format required for email attachments per RFC 2045. Using encodebytes() outside of email composition will embed literal newlines in your output, corrupting headers, JSON strings, and URL values.

How do I base64 encode a string with non-ASCII characters in Python?

Call .encode("utf-8") on the string to convert it to bytes, then pass those bytes to base64.b64encode(). To decode, reverse the steps: base64.b64decode(encoded), then .decode("utf-8") on the result. UTF-8 is the right choice for nearly all text — it handles every Unicode code point, including Cyrillic, CJK ideographs, Arabic, and emoji. Using .encode("ascii") on non-ASCII text will raise a UnicodeEncodeError, which is usually the correct behavior since it surfaces the encoding mismatch early.

How do I base64 encode a file in Python?

Read the file in binary mode, then call base64.b64encode() on the bytes. The simplest one-liner is: encoded = base64.b64encode(Path("file.bin").read_bytes()).decode("ascii"). For large files (above ~50–100 MB), avoid loading the entire file into memory. Instead, read in chunks of a size that is a multiple of 3 bytes (e.g., 3 × 1024 × 256 = 786,432 bytes) and encode each chunk separately — processing chunk sizes that are multiples of 3 avoids spurious = padding characters appearing in the middle of the output.

Why does Python's urlsafe_b64encode() still include = padding? JWT doesn't use it.

The stdlib follows the RFC 4648 §5 specification, which keeps = padding. JWT (RFC 7519) defines its own Base64url encoding that strips padding entirely. The mismatch is a deliberate spec decision: RFC 4648 padding makes the string self-describing (you can always determine the original byte length), while JWT strips it to reduce token length. To match JWT format, call .rstrip(b"=") on the encoded output before calling .decode("ascii"). When decoding, add back the correct padding: padding = 4 - len(segment) % 4; padded = segment + "=" * (padding % 4).

Is there a way to validate that a string is valid Base64 before decoding it?

Pass validate=True to base64.b64decode(). With this flag, any character outside the standard Base64 alphabet (A–Z, a–z, 0–9, +, /, =) raises a binascii.Error. Without validate=True, b64decode() silently ignores invalid characters, which can mask corrupted input. For URL-safe Base64, there is no validate parameter in urlsafe_b64decode() — you can validate manually with a regex: import re; bool(re.fullmatch(r"[A-Za-z0-9_-]+=*", segment)). Always validate input from untrusted external sources before decoding.

If you need a one-click encode or decode without writing any Python, paste your string or file directly into ToolDeck's Base64 Encoder — it handles standard and URL-safe modes instantly in your browser, with no setup required.

DEV Community