Nilofer 🚀

Posted on Jun 6

Carbon-Aware Model Training: Scheduling GPU Workloads Around Electricity Carbon Intensity

#pytorch #machinelearning #python #opensource

Training ML models has an environmental cost that most practitioners do not measure. A model trained during peak grid hours, when coal and gas plants are meeting high demand - can emit significantly more CO2 than the same model trained during off-peak hours when renewables dominate the grid. The carbon intensity of electricity varies by a factor of 2–5x throughout the day, but most training pipelines ignore this entirely.

Carbon-Aware Model Training Pipeline is a PyTorch-based training pipeline that monitors real-time electricity carbon intensity, delays training until a low-carbon window is available, reduces GPU memory footprint through gradient accumulation, and tracks CO2 emissions throughout the training process using CodeCarbon - with a comparison report that quantifies the carbon savings against a baseline run.

Features

Carbon-Aware Scheduling - real-time carbon intensity monitoring with smart training delays until low-carbon windows are detected.
Gradient Accumulation - reduces GPU memory footprint while maintaining effective batch size.
Emissions Tracking - real-time CO2 monitoring via CodeCarbon with comprehensive JSON reports.
Modular Design - YAML-based configuration with separate scheduler, tracker, and trainer modules.
GPU Optimized - automatic CUDA detection with mixed precision training (FP16).
Comparative Analysis - automated reporting quantifying carbon savings against a baseline run.

How It Works

The pipeline runs in four stages:

Stage 1 - Carbon-Aware Scheduling
Real-time monitoring checks electricity carbon intensity via APIs. Smart delays wait for low-carbon windows before starting training. Fallback mechanisms use realistic mock data when APIs are unavailable - with diurnal patterns simulating peak intensity at 18:00 and trough at 03:00. Configurable thresholds allow customization for different regions.

Stage 2 - Gradient Accumulation
Memory optimization processes smaller micro-batches. Effective batch size is maintained with reduced memory. Configurable steps (2, 4, 8, 16) adapt to hardware constraints. Convergence preservation ensures model quality is not compromised.

Stage 3 - Emissions Tracking
CodeCarbon integration monitors CO2 emissions in real-time. Energy metrics track power consumption in Watts and energy in kWh. Comprehensive reports generate JSON summaries with all metrics. Comparative analysis quantifies carbon savings versus the baseline.

Stage 4 - GPU Optimization
Mixed precision training (FP16) reduces memory and increases speed. Automatic CUDA detection uses GPU when available. Pin memory optimization enables faster data transfers. Graceful CPU fallback when GPU is unavailable.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                     Training Configuration                      │
│                       (YAML Config File)                        │
└─────────────────────────┬───────────────────────────────────────┘
                          │
                          ▼
         ┌────────────────────────────────────┐
         │   Carbon Intensity Scheduler       │
         │   - API/Mock data fetch            │
         │   - Threshold comparison           │
         │   - Wait for low-carbon window     │
         └────────────────┬───────────────────┘
                          │
                          ▼
              ┌───────────────────────┐
              │   Start Training?     │
              │   Intensity < 300?    │
              └─────┬─────────────┬───┘
                    │ NO          │ YES
                    ▼             ▼
            ┌───────────┐   ┌──────────────┐
            │   Wait    │   │ Start Tracker│
            │ & Recheck │   │ (CodeCarbon) │
            └───────────┘   └──────┬───────┘
                                   │
                                   ▼
                  ┌────────────────────────────────┐
                  │   PyTorch Training Loop        │
                  │   - Gradient Accumulation      │
                  │   - Mixed Precision (FP16)     │
                  │   - Checkpointing              │
                  └────────────────┬───────────────┘
                                   │
                                   ▼
                  ┌────────────────────────────────┐
                  │   Emissions Tracking           │
                  │   - CO2 (kg)                   │
                  │   - Energy (kWh)               │
                  │   - Power (Watts)              │
                  └────────────────┬───────────────┘
                                   │
                                   ▼
              ┌───────────────────────────────────┐
              │   Save Results                    │
              │   - Model checkpoint              │
              │   - Training summary (JSON)       │
              │   - Emissions log (CSV)           │
              └───────────────────────────────────┘

Installation

Prerequisites

Python 3.8+
PyTorch 2.0+
CUDA (optional, for GPU acceleration)

git clone https://github.com/dakshjain-1616/CarbonAwareModelTraining---by-NEO.git
cd CarbonAwareModelTraining---by-NEO

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

pip install -r requirements.txt

Required packages: torch>=2.0.0, torchvision>=0.15.0, codecarbon>=2.3.0, pyyaml>=6.0, numpy.

Quick Start

source venv/bin/activate

# Run baseline training (no optimization)
export PYTHONPATH="$PWD/src:$PYTHONPATH"
python src/train.py configs/baseline.yaml

# Run optimized training (carbon-aware + gradient accumulation)
python src/train.py configs/optimized.yaml

# Generate comparison report
python generate_comparison.py

This runs three steps: baseline training without carbon awareness, optimized training with carbon-aware scheduling and gradient accumulation, and a comparison report that quantifies carbon savings and performance metrics.

Demo

Configure carbon-aware training in configs/optimized.yaml:

scheduler:
  enabled: true
  carbon_threshold: 300           # gCO2/kWh
  wait_for_low_carbon: true

training:
  batch_size: 16
  gradient_accumulation_steps: 4  # Effective batch = 64
  epochs: 3

Run optimized training:

python src/train.py configs/optimized.yaml

Output:

============================================================
CARBON-AWARE TRAINING STARTED
============================================================

Carbon Intensity Check:
  Current Intensity: 420.5 gCO2/kWh
  Threshold: 300 gCO2/kWh
  Status: ⏳ Waiting for low-carbon window...

[10 minutes later]
  Current Intensity: 285.3 gCO2/kWh
  Status: ✅ Starting training now!

Training Progress:
  Epoch 1/3 - Loss: 0.324 - Accuracy: 91.2%
  CO2 Emissions: 0.042 kg
  Energy Consumed: 0.15 kWh

============================================================
CARBON SAVINGS vs BASELINE
============================================================

CO2 Reduction: 32.5% (0.024 kg saved)
GPU Memory Reduction: 45.8%
Accuracy: 93.1% (baseline: 93.4%)

Usage Examples

Carbon-Aware Scheduling Only

Disable gradient accumulation, enable scheduling:

scheduler:
  enabled: true
  carbon_threshold: 250

training:
  gradient_accumulation_steps: 1  # No accumulation

Gradient Accumulation Only

Disable scheduling, enable memory optimization:

scheduler:
  enabled: false

training:
  batch_size: 8
  gradient_accumulation_steps: 8  # Effective batch = 64

Real Carbon Intensity API
Configure for production with a real API:

scheduler:
  enabled: true
  use_mock_data: false
  api_endpoint: "https://api.carbonintensity.org.uk/intensity"
  region: "GB"

Custom Model Integration
Replace SimpleCNN in src/train.py:

from my_models import MyCustomModel

def prepare_model(config):
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model = MyCustomModel(
        input_channels=config['training']['input_channels'],
        num_classes=config['training']['num_classes']
    ).to(device)
    return model, device

Output Format
Training summary JSON saved to output/summary_optimized.json:

{
  "run_name": "optimized",
  "training_metrics": {
    "final_accuracy": 93.1,
    "final_loss": 0.124,
    "epochs": 3,
    "total_time_seconds": 245
  },
  "carbon_metrics": {
    "total_emissions_kg": 0.042,
    "energy_consumed_kwh": 0.15,
    "avg_power_watts": 145.2
  },
  "scheduler_metrics": {
    "wait_time_seconds": 600,
    "initial_intensity": 420.5,
    "training_intensity": 285.3
  },
  "gpu_metrics": {
    "peak_memory_mb": 2048,
    "gradient_accumulation_steps": 4,
    "effective_batch_size": 64
  }
}

Comparison report saved to output/comparison_report.json:

{
  "carbon_savings": {
    "baseline_emissions_kg": 0.074,
    "optimized_emissions_kg": 0.042,
    "reduction_kg": 0.032,
    "reduction_percentage": 43.2
  },
  "accuracy_impact": {
    "baseline_accuracy": 93.4,
    "optimized_accuracy": 93.1,
    "degradation_percentage": 0.3
  },
  "memory_savings": {
    "baseline_memory_mb": 4096,
    "optimized_memory_mb": 2048,
    "reduction_percentage": 50.0
  }
}

Performance

Evaluated on MNIST training - 3 epochs, RTX 3090 GPU:

Carbon Intensity Patterns (Mock Data):

Peak hours 18:00–22:00: ~450 gCO2/kWh
Off-peak hours 02:00–06:00: ~200 gCO2/kWh
Average reduction: 35–45% CO2 by scheduling during low-carbon windows

GPU Memory Savings:

Gradient accumulation 2x: ~30% memory reduction
Gradient accumulation 4x: ~50% memory reduction
Gradient accumulation 8x: ~60% memory reduction

Convergence Validation:

Accuracy degradation under 1% across all tested configurations
Loss convergence matches baseline within 2% tolerance
No divergence observed

Project Structure

CarbonAwareModelTraining---by-NEO/
├── src/
│   ├── scheduler.py                # Carbon intensity API & scheduling
│   ├── tracker.py                  # CodeCarbon emissions tracking
│   ├── train.py                    # Main training pipeline
│   └── utils.py                    # Config loading & logging
├── configs/
│   ├── baseline.yaml               # Baseline training config
│   └── optimized.yaml              # Carbon-aware optimized config
├── output/
│   ├── summary_baseline.json       # Baseline training summary
│   ├── summary_optimized.json      # Optimized training summary
│   ├── comparison_report.json      # Comparative analysis
│   ├── emissions.csv               # CodeCarbon emissions log
│   └── training_*.log              # Detailed training logs
├── models/
│   ├── model_baseline.pt           # Baseline model checkpoint
│   └── model_optimized.pt          # Optimized model checkpoint
├── data/                            # MNIST dataset (auto-downloaded)
├── requirements.txt                 # Python dependencies
├── generate_comparison.py          # Comparison report generator
└── README.md

Key Design Decisions

Why Carbon-Aware Scheduling?

Carbon intensity varies 2–5x throughout the day. Scheduling training during low-carbon windows reduces emissions without affecting model quality. Low-carbon periods also often correlate with cheaper electricity.

Why Gradient Accumulation?

Gradient accumulation enables training larger models on limited hardware by processing smaller micro-batches and updating weights less frequently. Used in BERT, GPT, and other large-scale models for the same reason.

Why CodeCarbon?

CodeCarbon uses lifecycle assessment methodologies, supports CPU, GPU, and multi-device setups, and produces transparent, community-validated calculations. It tracks energy, power, and emissions in a single library.

Why YAML Configuration?

YAML configs are version-controlled, human-readable, and separate code from experiment parameters - enabling reproducible A/B comparisons between baseline and optimized runs.

Testing

Validate installation:

python -c "import torch; print(f'PyTorch: {torch.__version__}')"
python -c "import codecarbon; print('CodeCarbon: OK')"
python -c "import yaml; print('PyYAML: OK')"
python -c "import torch; print(f'CUDA Available: {torch.cuda.is_available()}')"

Run a quick 5-minute test:

python src/train.py configs/test.yaml

Validate carbon savings:

python src/train.py configs/baseline.yaml
python src/train.py configs/optimized.yaml
python generate_comparison.py
cat output/comparison_report.json

Troubleshooting

CUDA Out of Memory

Reduce batch_size and increase gradient_accumulation_steps in the config.

Carbon Intensity API Timeout

No action needed - the pipeline automatically falls back to mock data and training proceeds.

Module Import Errors

export PYTHONPATH="$PWD/src:$PYTHONPATH"

CodeCarbon Tracking Fails

pip install --upgrade codecarbon

Training continues without emissions tracking if CodeCarbon fails.

Scheduler Waits Too Long

Increase max_wait_seconds, raise carbon_threshold, or set wait_for_low_carbon: false in the config.

How I Built This Using NEO

This project was built using NEO. NEO is a fully autonomous AI engineering agent that can write code and build solutions for AI/ML tasks including AI model evals, prompt optimization and end to end AI pipeline development.

The requirement was a PyTorch training pipeline that schedules GPU workloads based on real-time carbon intensity, reduces memory footprint through gradient accumulation, and tracks emissions with CodeCarbon - producing a side-by-side comparison report. NEO built the full implementation: the carbon intensity scheduler in scheduler.py with API integration and mock fallback, the CodeCarbon emissions tracker in tracker.py, the main training pipeline in train.py with gradient accumulation and mixed precision FP16, the config loader and logging utilities in utils.py, the YAML configs for baseline and optimized runs, the comparison report generator in generate_comparison.py, and the full output structure covering JSON summaries, emissions CSV, and model checkpoints.

How You Can Use and Extend This With NEO

Use it to measure the carbon cost of your existing training runs.
Run python src/train.py configs/baseline.yaml on your own model and data by replacing SimpleCNN in src/train.py with your model. The CodeCarbon tracker produces a JSON summary with CO2 in kg, energy in kWh, and average power in Watts, a baseline measurement before any optimization.

Use the comparison report to justify scheduling infrastructure.
Run both the baseline and optimized configs on the same dataset. The comparison_report.json gives you a concrete before and after - percentage reduction in emissions, energy, and memory, alongside accuracy degradation, that makes the case for carbon-aware scheduling with real numbers from your own hardware.

Use mock data for development and real API for production.
Set use_mock_data: true during development so training always proceeds without waiting. Switch to use_mock_data: false with a real api_endpoint for production runs where actual carbon savings matter.

Extend the scheduler with additional carbon intensity sources.
The scheduler in scheduler.py fetches from a configurable api_endpoint. Adding support for additional regional carbon intensity APIs - Electricity Maps, WattTime, or a custom internal source, means updating the fetch logic in scheduler.py without touching the training loop, tracker, or reporting pipeline.

Final Notes

Carbon intensity varies throughout the day, and most training pipelines ignore it. A 43% reduction in CO2 emissions with less than 1% accuracy degradation, achieved by scheduling when the grid is cleaner and accumulating gradients to reduce memory - shows that sustainable ML is a practical engineering choice, not just an aspiration.

The code is at https://github.com/dakshjain-1616/CarbonAwareModelTraining
You can also build with NEO in your IDE using the VS Code extension or Cursor.
You can use NEO MCP with Claude Code: https://heyneo.com/claude-code

DEV Community