Siddhartha Reddy

Posted on Feb 2

Training classic ML Models using GPU on windows

#python #machinelearning

Machine learning on a GPU can be orders of magnitude faster than CPU training — and yes, you can do it properly on Windows.

The most stable and officially supported way is:

Windows → WSL2 → Linux ML stack → NVIDIA GPU
Why GPU ML on Windows Uses WSL

Most high-performance ML libraries (CUDA, cuML, cuDF, PyTorch GPU) are Linux-first.

Instead of fighting native Windows builds, Microsoft and NVIDIA recommend:

Windows 11
WSL2 (Windows Subsystem for Linux)
NVIDIA GPU passthrough
Linux ML libraries running inside WSL

From your perspective:

You still use your Windows GPU
No virtual machines
No dual boot
Near-native performance

What You Need

Windows 11
NVIDIA GPU (RTX / Quadro / A-series)
Latest NVIDIA Windows driver (WSL compatible)
WSL2 enabled Verify GPU access inside WSL:

nvidia-smi

If you see your GPU, you’re good to go.
Step 1: Create a GPU Machine Learning Environment
Create a clean Conda environment dedicated to GPU ML.

conda create -n rapids-24 python=3.10 -y
conda activate rapids-24

Install RAPIDS (GPU ML libraries):

conda install -c rapidsai -c nvidia -c conda-forge rapids=24.02 -y

Install Jupyter support:

conda install -c conda-forge jupyterlab ipykernel -y
python -m ipykernel install --user --name rapids-24 --display-name "Python (GPU)"

Step 2: Start Jupyter Using Your GPU

Always start Jupyter from the GPU environment:

conda activate rapids-24
jupyter lab --no-browser --ip=0.0.0.0 --port=8888

Open in your Windows browser:
http://localhost:8888

In JupyterLab:
Kernel → Change Kernel → Python (GPU)
This step ensures the notebook uses your GPU.

Step 3: Mandatory GPU Initialization (Very Important)

Put this in the first cell of every GPU notebook:

import rmm
rmm.reinitialize(pool_allocator=False)
print("GPU memory initialized")

Why this matters?
Prevents GPU memory fragmentation
Avoids silent kernel crashes
Makes Jupyter + GPU stable
This single line solves most GPU-related Jupyter issues on Windows.

Step 4: Move Your Data to the GPU
GPU models don’t train on pandas directly.
Convert your data:

import cudf

X_train_gpu = cudf.DataFrame.from_pandas(X_train).astype("float32")
X_test_gpu  = cudf.DataFrame.from_pandas(X_test).astype("float32")

y_train_gpu = cudf.Series(y_train).astype("float32")
y_test_gpu  = cudf.Series(y_test).astype("float32")

float32 is essential for GPU performance and stability.

Step 5: Train a GPU Machine Learning Model (Generic Pattern)
A model-agnostic template that works for any cuML estimator (Random Forest, Linear Regression, KNN, XGBoost-style models, etc.).

# Import the GPU-based model you want to use
from cuml import estimator as GPUModel  # placeholder import

# Initialize the GPU model with appropriate hyperparameters
gpu_model = GPUModel(
    # model-specific hyperparameters go here
    random_state=42,
    n_streams=1      # recommended for stability & reproducibility
)

# Train the model on GPU-resident data
gpu_model.fit(X_train_gpu, y_train_gpu)

Once .fit() is called, training is executed on your Windows GPU.

Step 6: Make Predictions and Evaluate
Convert predictions back to CPU for evaluation:

import cupy as cp
import numpy as np

def to_numpy(x):
    if hasattr(x, "to_array"):
        x = x.to_array()
    if isinstance(x, cp.ndarray):
        return cp.asnumpy(x)
    return np.asarray(x)
y_pred = to_numpy(model.predict(X_test_gpu))

Evaluate normally:

from sklearn.metrics import r2_score
print("R²:", r2_score(y_test, y_pred))

Step 7: Save GPU Models Correctly

Do not use mlflow.sklearn.log_model for GPU models.
Instead:

import joblib
joblib.dump(model, "gpu_model.pkl")

With MLflow:

import mlflow
mlflow.log_artifact("gpu_model.pkl")

Step 8: Use the Trained GPU Model (Inference & Evaluation)
Once a GPU model is trained, you can use it exactly like a scikit-learn model.
The only difference is that predictions are generated on the GPU.

8.1Run Inference on the GPU

# Run predictions on GPU-resident data
y_pred_gpu = gpu_model.predict(X_test_gpu)

At this stage:

Computation happens on the GPU

Output lives in GPU memory (cudf.Series or cupy.ndarray)

8.2 Convert Predictions Back to CPU (Generic Helper)

Most evaluation libraries (sklearn, pandas, MLflow) expect NumPy arrays.
Use this universal conversion helper:

import cupy as cp
import numpy as np

def to_numpy(x):
    if hasattr(x, "to_array"):
        x = x.to_array()
    if isinstance(x, cp.ndarray):
        return cp.asnumpy(x)
    return np.asarray(x)

y_pred = to_numpy(y_pred_gpu)

This works for any cuML model.

8.3Evaluate Model Performance (CPU)
Now evaluate normally using familiar tools:

from sklearn.metrics import r2_score, mean_squared_error

r2 = r2_score(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)

print("R²:", r2)
print("MSE:", mse)

Only inference runs on GPU — evaluation stays on CPU, which is standard practice.

8.4 Reuse the Model for New Data

To use the trained model on new data:

import cudf

# Convert new data to GPU format
X_new_gpu = cudf.DataFrame.from_pandas(X_new).astype("float32")

# Predict on GPU
y_new_gpu = gpu_model.predict(X_new_gpu)

# Convert back to CPU if needed
y_new = to_numpy(y_new_gpu)

This pattern works for batch inference and real-world pipelines.

8.5 Save the Trained GPU Model (Reusable)
GPU models should be saved as raw artifacts.

import joblib
joblib.dump(gpu_model, "gpu_model.pkl")

To load later:

gpu_model = joblib.load("gpu_model.pkl")

This allows reuse across sessions as long as the GPU environment is available.

DEV Community

Training classic ML Models using GPU on windows

Top comments (0)