My Python script runs fine on macOS and Linux. Perfect UTF-8 support. But the moment it runs on a Windows machine, it crashes with:
UnicodeEncodeError: 'gbk' codec can't encode character '❤' in position 42: illegal multibyte sequence
The script is outputting emoji and Japanese characters. Nothing exotic. Just a simple print() call. But Windows was in GBK mode (Simplified Chinese encoding), not UTF-8.
The fix was a single line added at startup:
import sys
sys.stdout.reconfigure(encoding='utf-8')
But this opened a larger puzzle: Why does Windows default to GBK for console output? How do you debug encoding issues? And what's the right way to handle this in production?
The Root Cause: Windows Console Encoding Hell
Windows defaults console output encoding to the system locale, not UTF-8. If your Windows machine is set to Chinese (Simplified), the console uses GBK (an 8-bit encoding for Chinese). If it's set to Japanese, it uses Shift-JIS. If it's US English, it's cp1252 (Windows Latin-1).
Python inherits this:
import sys
print(sys.stdout.encoding) # Windows output: 'gbk' or 'cp1252' or 'shift_jis'
When you try to print a character outside that encoding's range (like emoji, or Japanese characters on a GBK system), Python throws UnicodeEncodeError.
The problem:
- macOS defaults to UTF-8
- Linux defaults to UTF-8
- Windows defaults to the system locale
Your code works everywhere except Windows, because Windows doesn't have a sane default.
The Symptom: Crashes on Specific Characters
This happens to me when automating app deployment scripts. The script logs to console using emoji and Unicode:
def log_deployment(app_name: str, status: str):
emoji = "✅" if status == "success" else "❌"
print(f"{emoji} {app_name}: {status}")
if app_name == "AutoChoice":
print(f"Japanese locale: 自動選択 機能") # Japanese text
# Works on Mac/Linux
# Crashes on Windows with GBK encoding:
# UnicodeEncodeError: 'gbk' codec can't encode character '❤'
log_deployment("AutoChoice", "success")
The error doesn't happen on print #1 (emoji) if the system locale supports emoji. But it always happens when you print non-ASCII characters outside the locale's encoding.
The Fix: Force UTF-8 at Startup
Add this at the top of your script, right after imports:
import sys
import os
# Force UTF-8 encoding for console output on Windows
# (macOS/Linux already use UTF-8, but this is harmless)
if sys.platform == "win32":
# Windows needs explicit UTF-8 reconfiguration
sys.stdout.reconfigure(encoding='utf-8')
sys.stderr.reconfigure(encoding='utf-8')
else:
# On Unix-like systems, ensure UTF-8 is set
os.environ.setdefault('PYTHONIOENCODING', 'utf-8')
# Now this works on all platforms
print("✅ Success")
print("日本語: テスト") # Japanese text
print("中文: 测试") # Chinese text
Key points:
-
sys.stdout.reconfigure(encoding='utf-8')changes the encoding for the current Python process - It doesn't change the Windows system locale (that requires registry changes)
- It works immediately — subsequent
print()calls use UTF-8 - It's safe to call on macOS/Linux (it just confirms UTF-8, which is already the default)
Production-Ready Pattern: Robust Encoding Setup
For scripts that run across platforms and locales, use this boilerplate:
import sys
import os
import locale
from typing import Tuple
def setup_unicode_output() -> Tuple[str, str]:
"""
Configure UTF-8 output encoding for console and file I/O.
Works on Windows, macOS, and Linux.
Returns:
(stdout_encoding, stderr_encoding) for debugging
"""
# Save original encodings (useful for debugging)
original_stdout = sys.stdout.encoding
original_stderr = sys.stderr.encoding
try:
# Windows: force UTF-8 reconfiguration
if sys.platform == "win32":
# Some Windows versions may not support UTF-8 well in legacy mode
# Use 'utf-8-sig' to include BOM if needed, but usually 'utf-8' is fine
sys.stdout.reconfigure(encoding='utf-8', errors='replace')
sys.stderr.reconfigure(encoding='utf-8', errors='replace')
else:
# Unix-like systems: set environment variable for child processes
os.environ['PYTHONIOENCODING'] = 'utf-8'
# Reconfigure anyway (harmless)
sys.stdout.reconfigure(encoding='utf-8', errors='replace')
sys.stderr.reconfigure(encoding='utf-8', errors='replace')
except Exception as e:
# Fallback if reconfigure fails (very rare)
print(f"Warning: couldn't reconfigure encoding: {e}", file=sys.stderr)
return (original_stdout, original_stderr)
return (sys.stdout.encoding, sys.stderr.encoding)
# Call at the very start of your script
if __name__ == "__main__":
stdout_enc, stderr_enc = setup_unicode_output()
print(f"Configured output: {stdout_enc} / {stderr_enc}")
# Now safe to use any Unicode
print("✅ Emoji works")
print("日本語: OK")
print("中文: OK")
print("العربية: OK")
Debugging: Check Your Current Encoding
If you're seeing UnicodeEncodeError and can't figure out why, check the active encoding:
import sys
import locale
def diagnose_encoding() -> dict:
"""
Print all encoding-related information for debugging.
"""
diag = {
"platform": sys.platform,
"stdout_encoding": sys.stdout.encoding,
"stderr_encoding": sys.stderr.encoding,
"file_encoding": sys.getfilesystemencoding(),
"default_encoding": sys.getdefaultencoding(),
"locale_preferred_encoding": locale.getpreferredencoding(False),
"environment_PYTHONIOENCODING": __import__('os').environ.get('PYTHONIOENCODING', 'not set')
}
for key, value in diag.items():
print(f"{key}: {value}")
return diag
# Run this to see what Python thinks the encoding is
diagnose_encoding()
On Windows with GBK locale:
platform: win32
stdout_encoding: gbk
stderr_encoding: gbk
...
After calling sys.stdout.reconfigure(encoding='utf-8'):
platform: win32
stdout_encoding: utf-8
stderr_encoding: gbk # stderr needs separate reconfigure
...
When This Happens in Automation Scripts
This is especially painful in CI/CD and deployment automation. Example: your app deployment script runs on Windows CI agents:
# app_deployer.py
import sys
sys.stdout.reconfigure(encoding='utf-8')
from app_store_api import submit_app
def deploy(app_name: str, version: str):
print(f"🚀 Deploying {app_name} v{version}") # ← emoji
print(f"構築ターゲット: iOS 17.0+") # ← Japanese (if app is JP-localized)
result = submit_app(app_name, version)
print(f"✅ Submission {result['id']} created")
if __name__ == "__main__":
deploy("AutoChoice", "1.0.0")
Without the reconfigure() line, this crashes on Windows with:
UnicodeEncodeError: 'gbk' codec can't encode character '\U0001f680' in position 0: illegal multibyte sequence
With the line, it works everywhere.
Alternative: Environment Variable (Less Reliable)
You can also set PYTHONIOENCODING before running Python:
# Windows CMD
set PYTHONIOENCODING=utf-8
python app_deployer.py
# Windows PowerShell
$env:PYTHONIOENCODING="utf-8"
python app_deployer.py
# macOS/Linux
export PYTHONIOENCODING=utf-8
python app_deployer.py
But this is less reliable because:
- It requires the user to set the variable before running
- It doesn't work if Python is invoked via a framework or IDE that doesn't inherit env vars
- It doesn't apply to
sys.stderr
Always use sys.stdout.reconfigure() in your code for reliability.
Key Takeaways
- Windows defaults console output to the system locale (GBK, cp1252, Shift-JIS, etc.), not UTF-8
- macOS/Linux default to UTF-8, so the bug only appears on Windows
-
sys.stdout.reconfigure(encoding='utf-8')fixes it — one line, call at startup -
Also reconfigure
sys.stderrif you're logging errors -
Use
errors='replace'to gracefully handle any remaining encoding issues - Test on Windows if you're using emoji or non-ASCII characters in output
If you're shipping scripts that need to work cross-platform (especially in CI/CD or for developers in non-English locales), add the UTF-8 reconfiguration boilerplate. It's a cheap fix that prevents mysterious crashes.
Sources
- Python Documentation: sys.stdout.reconfigure() — official behavior
- Windows Console Encoding Defaults — MS documentation on code page behavior
- Real production case: app deployment automation — where this was diagnosed
- Unicode Handling Best Practices (Ned Batchelder) — deep dive on Python encoding
Subscribe to my Substack for more Python cross-platform gotchas and shipping automation. Get the TestFlight Bible ($29) for 50+ real deployment workflows. Join the affiliate program and earn 30% recurring on every sale.
Top comments (0)