Running modern Python TTS toolchains on non-AVX2 CPUs
Notes from getting F5-TTS, StyleTTS2, kokoro/Misaki, and whisper.cpp to work
on an AMD Phenom II X6 1090T (2010 K10/Family-10h architecture).
The CPU has SSE/SSE2/SSE3/SSE4a, plus CX16/POPCNT/LAHF — but no SSE4.1, no
SSE4.2, no AVX, no AVX2, no FMA, no F16C. That puts it below the modern
x86-64-v2 baseline. A growing share of binary Python wheels in the AI
ecosystem assume v2 or v3, so they SIGILL or SIGFPE at import. This is a
ground-truth list of what we hit and what worked.
Quick triage
If your CPU is below x86-64-v2 (in particular, missing SSE4.1), expect:
-
pyarrowstatic-initpinsrqSIGILL on import -
numpy 2.xwheel SIGILL on import (numpy 1.26.4 still has a fallback path) -
torch 2.10+wheel SIGFPE intorch._dynamoon import -
pandasmodern wheels SIGILL on tokenisation -
monotonic_alignand other Cython extensions: build-from-source SIGILL - DataLoader subprocess workers SIGFPE re-importing torch
If your CPU is x86-64-v2 (Nehalem ~2008 or newer Intel; Bulldozer ~2011 or
newer AMD) but missing AVX/AVX2, you'll still hit some of these but fewer.
Working pin-set
These are versions empirically verified to import and run on this CPU:
| package | version | why |
|---|---|---|
| numpy | 1.26.4 | last with a non-AVX2 fallback path; from-source builds OK |
| torch | 2.7.0 | last with a usable _dynamo init that doesn't SIGFPE |
| torchaudio | 2.7.0 | last with the soundfile backend (2.10+ requires torchcodec) |
| transformers | 4.57.3 | 5.x triggers torch._dynamo import-time via torch.compiler.disable
|
| numba / scipy / librosa | latest binary wheels | OK |
| pyarrow / pandas / datasets / torchcodec | uninstalled | wheels assume SSE4.1+; not actually needed for inference |
For a fresh install, layer the pins after the project install:
pip install --prefer-binary <project> # whatever you actually want
pip install --prefer-binary --force-reinstall --no-deps \
"torch==2.7.0" "torchaudio==2.7.0" \
"transformers==4.57.3" "numpy<2"
pip uninstall -y datasets pyarrow pyarrow-hotfix pandas torchcodec
Patches required
Patch 1: torch._dynamo SIGFPE on int division by zero
Even after pinning to torch 2.7.0, the very first dynamo init still SIGFPEs
on this CPU. Cause: torch._dynamo.variables.torch_function.populate_builtin_to_tensor_fn_map()
probes Python operators on dummy tensors, including tensor // 0 (integer
floor-divide by zero). Newer Intel CPUs trap this into a Python
ZeroDivisionError via signal handler. AMD Phenom II just SIGFPEs.
The function's output isn't actually needed for inference. Stub it:
F=$(python -c "import torch._dynamo.variables.torch_function as m; print(m.__file__)")
cp $F $F.orig
sed -i "0,/ global BUILTIN_TO_TENSOR_FN_MAP/s// return # patched: SIGFPE on Phenom II\n global BUILTIN_TO_TENSOR_FN_MAP/" $F
This is non-invasive — only affects code that uses torch.compile() /
dynamo paths, which most fine-tuning trainers don't.
Patch 2: GPU-only mel-spectrogram computation
torch.matmul on CPU SIGFPEs on this CPU. Anything that calls torchaudio's
MelSpectrogram on CPU dies. For training pipelines that compute mels
in the data loader, this is fatal.
Two ways to fix:
a) Move the mel module to GPU (cheap audio→mel transfer per sample):
to_mel = torchaudio.transforms.MelSpectrogram(...).to("cuda")
def preprocess(wave):
wave = torch.from_numpy(wave).to("cuda")
mel = to_mel(wave)
return mel.cpu() # back to CPU for DataLoader collator
b) Pre-compute all mels once on GPU, save to disk, load at training time
(example script).
(b) is faster overall — no per-sample audio→GPU transfer, just torch.load.
Patch 3: num_workers=0 everywhere
DataLoader spawns subprocess workers that re-import torch and re-run
_dynamo init. Even with patch 1, the patched source isn't always picked up
in subprocess. Set num_workers=0 to keep all loading in the main process.
Patch 4: weights_only=False for older checkpoint formats
PyTorch 2.6+ flipped the default. If you load checkpoints saved before 2.6
that contain pickled Python objects, you need torch.load(path, weights_only=False).
Affected: many published TTS pretrained models (StyleTTS2's ASR/JDC/PLBERT
modules, F5-TTS in some cases).
Patch 5: Stub datasets for transformers' lazy loader
transformers.utils.import_utils._is_package_available("datasets") calls
importlib.util.find_spec("datasets"), which raises ValueError if
__spec__ is None. If you provide a stub datasets module via
sys.modules (to avoid pulling pyarrow), it must have a real ModuleSpec:
import importlib.machinery, types, sys
_stub = types.ModuleType("datasets")
_stub.__spec__ = importlib.machinery.ModuleSpec("datasets", loader=None)
_stub.Dataset = type("Dataset", (), {})
_stub.load_from_disk = lambda *a, **kw: None
sys.modules["datasets"] = _stub
Patch 6: --no-build-isolation for Cython extensions
monotonic_align (used by StyleTTS2) and similar packages build with their
own ephemeral build-env via pip's build isolation. That ephemeral env
re-installs numpy and cython and may pull AVX2 wheels. Use:
pip install --no-build-isolation --no-deps <package>
This forces the build to use your already-installed (pinned) numpy+cython.
Per-project status
F5-TTS
- Inference and training both work after patches 1–5.
- See companion gist for a minimal trainer that bypasses
datasets/accelerate. - Issue filed: SWivid/F5-TTS#1292 (EMA-only checkpoint structure).
StyleTTS2
- Inference and fine-tune both work after patches 1, 2, 3, 4, 6.
- PRs filed: yl4579/StyleTTS2#361 (weights_only=False), #362 (drop pandas).
kokoro
- Inference works (via the
kokoro-onnxONNX runtime path; PyTorch path blocked by upstream dep pinning, not CPU). - Issue filed: hexgrad/kokoro#321 (broken
misaki>=0.7.16PyPI pin).
whisper.cpp
- Works out of the box. Pure C++, no Python wheels involved. CUDA inference on the GPU.
What does not work
-
pyarrowsource build: succeeds eventually but the resulting library still uses SSE4.1 in places (Apache Arrow's CMakeARROW_SIMD_LEVEL=NONEdoesn't cover everything). Not worth the multi-hour build. -
numpy 2.x: even from-source build emits AVX-needing code via OpenBLAS bundled wheels. Stick with 1.26.4. - Anything using
bitsandbytesint8/int4 quantisation: those kernels hard-require AVX2.
Worth trying if you have AVX (no AVX2)
A 2011-era Sandy Bridge or later Intel CPU has AVX but no AVX2. Most of the
patches above still apply, but you may not need patch 1 (dynamo SIGFPE),
and pyarrow/datasets/pandas may install (just not the AVX2-specific code
paths). Try without the uninstalls first.
Summary
If you want to do TTS fine-tuning on hardware below x86-64-v2:
- Do inference work on the GPU. Keep CPU-side code to file I/O and JSON.
- Pin numpy 1.26 + torch 2.7 + transformers 4.57.
- Stub or uninstall
datasets/pyarrow/pandas/torchcodec. - Patch
torch._dynamoonce per torch install. - Pre-compute mel-spectrograms offline.
- Train at
num_workers=0.
The rig produces useful output. It's not a fast-iteration machine — every
upstream upgrade re-breaks something — but for fine-tuning (which doesn't
need a fast-iteration machine) it's economical: an RTX 3060 12 GB on a
2010-era CPU running real-world TTS workloads.
Originally posted at netlinux-ai.github.io/2026/05/09/non-avx2-cpu-tts-compat/.
Top comments (0)