DEV Community

Siddhartha Reddy
Siddhartha Reddy

Posted on

Training classic ML Models using GPU on windows

Machine learning on a GPU can be orders of magnitude faster than CPU training — and yes, you can do it properly on Windows.

The most stable and officially supported way is:

Windows → WSL2 → Linux ML stack → NVIDIA GPU
Why GPU ML on Windows Uses WSL

Most high-performance ML libraries (CUDA, cuML, cuDF, PyTorch GPU) are Linux-first.

Instead of fighting native Windows builds, Microsoft and NVIDIA recommend:

  • Windows 11
  • WSL2 (Windows Subsystem for Linux)
  • NVIDIA GPU passthrough
  • Linux ML libraries running inside WSL

From your perspective:

  • You still use your Windows GPU
  • No virtual machines
  • No dual boot
  • Near-native performance

What You Need

  • Windows 11
  • NVIDIA GPU (RTX / Quadro / A-series)
  • Latest NVIDIA Windows driver (WSL compatible)
  • WSL2 enabled Verify GPU access inside WSL:
nvidia-smi
Enter fullscreen mode Exit fullscreen mode

If you see your GPU, you’re good to go.
Step 1: Create a GPU Machine Learning Environment
Create a clean Conda environment dedicated to GPU ML.

conda create -n rapids-24 python=3.10 -y
conda activate rapids-24
Enter fullscreen mode Exit fullscreen mode

Install RAPIDS (GPU ML libraries):

conda install -c rapidsai -c nvidia -c conda-forge rapids=24.02 -y

Enter fullscreen mode Exit fullscreen mode

Install Jupyter support:

conda install -c conda-forge jupyterlab ipykernel -y
python -m ipykernel install --user --name rapids-24 --display-name "Python (GPU)"
Enter fullscreen mode Exit fullscreen mode

Step 2: Start Jupyter Using Your GPU

Always start Jupyter from the GPU environment:

conda activate rapids-24
jupyter lab --no-browser --ip=0.0.0.0 --port=8888
Enter fullscreen mode Exit fullscreen mode

Open in your Windows browser:
http://localhost:8888

In JupyterLab:
Kernel → Change Kernel → Python (GPU)
This step ensures the notebook uses your GPU.

Step 3: Mandatory GPU Initialization (Very Important)

Put this in the first cell of every GPU notebook:

import rmm
rmm.reinitialize(pool_allocator=False)
print("GPU memory initialized")
Enter fullscreen mode Exit fullscreen mode

Why this matters?
Prevents GPU memory fragmentation
Avoids silent kernel crashes
Makes Jupyter + GPU stable
This single line solves most GPU-related Jupyter issues on Windows.

Step 4: Move Your Data to the GPU
GPU models don’t train on pandas directly.
Convert your data:

import cudf

X_train_gpu = cudf.DataFrame.from_pandas(X_train).astype("float32")
X_test_gpu  = cudf.DataFrame.from_pandas(X_test).astype("float32")

y_train_gpu = cudf.Series(y_train).astype("float32")
y_test_gpu  = cudf.Series(y_test).astype("float32")

Enter fullscreen mode Exit fullscreen mode

float32 is essential for GPU performance and stability.

Step 5: Train a GPU Machine Learning Model (Generic Pattern)
A model-agnostic template that works for any cuML estimator (Random Forest, Linear Regression, KNN, XGBoost-style models, etc.).

# Import the GPU-based model you want to use
from cuml import estimator as GPUModel  # placeholder import

# Initialize the GPU model with appropriate hyperparameters
gpu_model = GPUModel(
    # model-specific hyperparameters go here
    random_state=42,
    n_streams=1      # recommended for stability & reproducibility
)

# Train the model on GPU-resident data
gpu_model.fit(X_train_gpu, y_train_gpu)

Enter fullscreen mode Exit fullscreen mode

Once .fit() is called, training is executed on your Windows GPU.

Step 6: Make Predictions and Evaluate
Convert predictions back to CPU for evaluation:

import cupy as cp
import numpy as np

def to_numpy(x):
    if hasattr(x, "to_array"):
        x = x.to_array()
    if isinstance(x, cp.ndarray):
        return cp.asnumpy(x)
    return np.asarray(x)
y_pred = to_numpy(model.predict(X_test_gpu))

Enter fullscreen mode Exit fullscreen mode

Evaluate normally:

from sklearn.metrics import r2_score
print("R²:", r2_score(y_test, y_pred))
Enter fullscreen mode Exit fullscreen mode

Step 7: Save GPU Models Correctly

Do not use mlflow.sklearn.log_model for GPU models.
Instead:

import joblib
joblib.dump(model, "gpu_model.pkl")

Enter fullscreen mode Exit fullscreen mode

With MLflow:

import mlflow
mlflow.log_artifact("gpu_model.pkl")
Enter fullscreen mode Exit fullscreen mode

Step 8: Use the Trained GPU Model (Inference & Evaluation)
Once a GPU model is trained, you can use it exactly like a scikit-learn model.
The only difference is that predictions are generated on the GPU.

8.1Run Inference on the GPU

# Run predictions on GPU-resident data
y_pred_gpu = gpu_model.predict(X_test_gpu)

Enter fullscreen mode Exit fullscreen mode

At this stage:

Computation happens on the GPU

Output lives in GPU memory (cudf.Series or cupy.ndarray)

8.2 Convert Predictions Back to CPU (Generic Helper)

Most evaluation libraries (sklearn, pandas, MLflow) expect NumPy arrays.
Use this universal conversion helper:

import cupy as cp
import numpy as np

def to_numpy(x):
    if hasattr(x, "to_array"):
        x = x.to_array()
    if isinstance(x, cp.ndarray):
        return cp.asnumpy(x)
    return np.asarray(x)

y_pred = to_numpy(y_pred_gpu)
Enter fullscreen mode Exit fullscreen mode

This works for any cuML model.

8.3Evaluate Model Performance (CPU)
Now evaluate normally using familiar tools:

from sklearn.metrics import r2_score, mean_squared_error

r2 = r2_score(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)

print("R²:", r2)
print("MSE:", mse)

Enter fullscreen mode Exit fullscreen mode

Only inference runs on GPU — evaluation stays on CPU, which is standard practice.

8.4 Reuse the Model for New Data

To use the trained model on new data:

import cudf

# Convert new data to GPU format
X_new_gpu = cudf.DataFrame.from_pandas(X_new).astype("float32")

# Predict on GPU
y_new_gpu = gpu_model.predict(X_new_gpu)

# Convert back to CPU if needed
y_new = to_numpy(y_new_gpu)
Enter fullscreen mode Exit fullscreen mode

This pattern works for batch inference and real-world pipelines.

8.5 Save the Trained GPU Model (Reusable)
GPU models should be saved as raw artifacts.

import joblib
joblib.dump(gpu_model, "gpu_model.pkl")
Enter fullscreen mode Exit fullscreen mode

To load later:

gpu_model = joblib.load("gpu_model.pkl")

Enter fullscreen mode Exit fullscreen mode

This allows reuse across sessions as long as the GPU environment is available.

Top comments (0)