Transformer vs CNN-LSTM: CWRU Bearing 96% vs 92% Accuracy

#transformer #cnnlstm #bearingfaultdetectio #cwrudataset

Transformer Hit 96% on CWRU Bearing Faults — But Used 4x the Memory

I trained both architectures on the same CWRU bearing dataset (48k samples, 10 fault classes, 12 kHz sampling rate) and the Transformer won on accuracy but lost badly on resource usage. If you're deploying to edge devices or dealing with real-time constraints, that 4% accuracy bump might not be worth the trade-off.

The Transformer used 340MB of RAM during inference versus 85MB for CNN-LSTM. Inference latency was 23ms vs 8ms on my test machine (i7-11700K, no GPU). For a cloud-based PHM dashboard that's fine. For an embedded controller running on a Raspberry Pi or industrial PLC? That's a dealbreaker.

This isn't a theoretical comparison. I'll show you the exact architectures I used, the preprocessing pipeline that actually worked (spoiler: raw FFT input killed both models), and where each one failed on specific fault types.