LSTM vs GRU vs Transformer RUL: NASA CMAPSS Memory Test

#rulprediction #lstm #gru #transformer

The 8GB GPU Problem Nobody Talks About

I threw NASA's CMAPSS turbofan dataset at three RUL architectures — LSTM, GRU, and a vanilla Transformer encoder — expecting speed differences. What I didn't expect was the Transformer eating 4.2GB of VRAM for a batch size that LSTM handled with 1.1GB.

This isn't a theoretical comparison. If you're building a predictive maintenance model on a tight budget (or stuck with whatever GPU your company has in the server room), memory constraints hit before training time does. That RTX 3060 with 12GB? Suddenly it matters whether you pick 128-unit GRU cells or 8-head self-attention.