Explainable Causal Reinforcement Learning for wildfire evacuation logistics networks in carbon-negative infrastructure
Introduction: The Learning Journey That Sparked This Research
It was during the 2023 wildfire season, while analyzing real-time evacuation data from California's emergency management systems, that I had my breakthrough realization. I was experimenting with standard reinforcement learning (RL) models to optimize evacuation routes when I noticed something troubling: our models were making inexplicable recommendations that contradicted human expert judgment. The RL agent kept suggesting routes through areas with recent controlled burns, despite historical data showing these areas had lower fire recurrence rates.
Through studying recent causal inference papers, particularly those by Judea Pearl and Bernhard Schölkopf, I discovered that our models were falling victim to confounding variables—correlations that looked like causation but weren't. The controlled burn areas weren't safer because of the burns themselves; they were safer because they were in regions with different vegetation types, topography, and wind patterns. This experience led me down a year-long research journey into explainable causal reinforcement learning, culminating in the framework I'll share in this article.
What makes this particularly challenging—and fascinating—is the carbon-negative infrastructure dimension. Modern evacuation networks now incorporate carbon-sequestering building materials, green corridors, and sustainable transportation systems that fundamentally change the dynamics of emergency response. Traditional RL approaches fail to account for how these infrastructure elements interact with fire behavior and human movement patterns.
Technical Background: The Convergence of Three Disciplines
Causal Reinforcement Learning Foundations
While exploring the intersection of causality and reinforcement learning, I discovered that traditional RL operates on the reward hypothesis: maximize expected cumulative reward. However, this approach ignores the underlying causal mechanisms. Causal RL introduces structural causal models (SCMs) into the Markov Decision Process framework.
In my research of Pearl's do-calculus, I realized we could formalize evacuation logistics as a series of interventions. Consider this representation:
import numpy as np
import networkx as nx
from typing import Dict, Tuple
class CausalEvacuationMDP:
def __init__(self, graph: nx.Graph, scm: Dict):
"""
Structural Causal Model integrated MDP
graph: Transportation network with carbon-negative infrastructure nodes
scm: Structural equations defining causal relationships
"""
self.graph = graph
self.scm = scm # Z = f_Z(pa_Z, ε_Z) for each variable
self.carbon_nodes = self._identify_carbon_negative_infrastructure()
def _identify_carbon_negative_infrastructure(self):
"""Identify nodes with carbon-sequestering properties"""
return [n for n, attr in self.graph.nodes(data=True)
if attr.get('carbon_negative', False)]
def do_intervention(self, node: str, intervention_value: float):
"""
Perform do(X = x) operation on the SCM
This simulates forcing a particular infrastructure state
"""
# Remove incoming edges to X in causal graph
modified_scm = self.scm.copy()
modified_scm[node] = lambda parents, noise: intervention_value
return modified_scm
One interesting finding from my experimentation with causal RL was that the optimal policy often involves counterfactual reasoning: "What would have happened if we had built the evacuation center with carbon-negative materials?" This requires maintaining multiple parallel causal models.
Carbon-Negative Infrastructure Dynamics
Through studying sustainable infrastructure papers, I learned that carbon-negative materials (like hempcrete, mycelium composites, and carbon-sequestering concrete) have different thermal properties and fire resistance characteristics. These materials don't just reduce carbon footprint—they actively change how fires spread and how safe structures remain during evacuation events.
My exploration of material science literature revealed that carbon-negative infrastructure creates microclimates that can either inhibit or (in some cases) unexpectedly accelerate fire spread. This necessitated extending our causal models to include material-level variables:
class CarbonNegativeInfrastructure:
def __init__(self, material_type: str, age_years: int):
self.material_properties = {
'hempcrete': {
'thermal_conductivity': 0.06, # W/mK
'carbon_sequestration': 110, # kg CO2/m³
'fire_resistance': 'high',
'moisture_retention': 0.35
},
'mycelium_composite': {
'thermal_conductivity': 0.05,
'carbon_sequestration': 85,
'fire_resistance': 'medium',
'moisture_retention': 0.42
}
}
self.material = material_type
self.age = age_years
self.degradation_factor = self._calculate_degradation()
def _calculate_degradation(self):
"""Calculate material property degradation over time"""
# Exponential decay model based on material type
base_decay = {
'hempcrete': 0.98,
'mycelium_composite': 0.95
}
return base_decay.get(self.material, 0.99) ** self.age
def get_effective_properties(self):
"""Get current material properties considering degradation"""
props = self.material_properties[self.material].copy()
# Thermal conductivity increases with degradation
props['thermal_conductivity'] /= self.degradation_factor
return props
Wildfire Behavior Modeling
During my investigation of fire science, I found that traditional fire spread models like Rothermel's equation don't account for interactions with carbon-negative infrastructure. These materials can release moisture under heat, creating localized humidity pockets that slow fire progression.
Implementation Details: Building the XCRL Framework
Core Architecture
The Explainable Causal Reinforcement Learning (XCRL) framework I developed consists of three interconnected components:
- Causal Discovery Module: Learns the underlying causal structure from observational data
- Counterfactual Policy Network: Generates and evaluates "what-if" scenarios
- Explanation Generator: Produces human-interpretable explanations for decisions
Here's the core implementation:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch_geometric.nn import GCNConv
class XCRLAgent(nn.Module):
def __init__(self,
state_dim: int,
action_dim: int,
causal_graph_dim: int):
super().__init__()
# Causal-aware state encoder
self.causal_encoder = CausalGraphEncoder(causal_graph_dim, 128)
# Counterfactual reasoning module
self.counterfactual_net = CounterfactualNetwork(128, 64)
# Policy network with causal attention
self.policy_net = CausalAttentionPolicy(128 + 64, action_dim)
# Value network for advantage estimation
self.value_net = nn.Sequential(
nn.Linear(128 + 64, 256),
nn.ReLU(),
nn.Linear(256, 128),
nn.ReLU(),
nn.Linear(128, 1)
)
# Explanation generator
self.explainer = SHAPExplanationGenerator()
def forward(self, state, causal_graph, intervention_mask=None):
# Encode causal structure
causal_features = self.causal_encoder(causal_graph)
# Generate counterfactual features
if intervention_mask is not None:
counterfactual_features = self.counterfactual_net(
state, causal_features, intervention_mask
)
else:
counterfactual_features = torch.zeros_like(causal_features)
# Combine features
combined = torch.cat([causal_features, counterfactual_features], dim=-1)
# Generate policy and value
action_probs = self.policy_net(combined)
state_value = self.value_net(combined)
return action_probs, state_value
def generate_explanation(self, state, action, causal_graph):
"""Generate human-interpretable explanation for action"""
return self.explainer.explain(
state=state,
action=action,
causal_graph=causal_graph,
model=self
)
class CausalGraphEncoder(nn.Module):
"""Encode causal graph structure using Graph Neural Networks"""
def __init__(self, input_dim, hidden_dim):
super().__init__()
self.conv1 = GCNConv(input_dim, hidden_dim)
self.conv2 = GCNConv(hidden_dim, hidden_dim)
self.attention = nn.MultiheadAttention(hidden_dim, num_heads=4)
def forward(self, graph_data):
x, edge_index = graph_data.x, graph_data.edge_index
# Graph convolution layers
x = F.relu(self.conv1(x, edge_index))
x = F.dropout(x, p=0.3, training=self.training)
x = self.conv2(x, edge_index)
# Causal attention
x = x.unsqueeze(0) # Add batch dimension
x, _ = self.attention(x, x, x)
x = x.squeeze(0)
return x
Training with Causal Regularization
One of the key insights from my experimentation was that we need to regularize the RL objective with causal consistency terms. This ensures the agent learns policies that are robust to spurious correlations:
class CausalRLTrainer:
def __init__(self, agent, env, causal_validator):
self.agent = agent
self.env = env
self.causal_validator = causal_validator
def train_epoch(self, num_episodes=100):
total_reward = 0
causal_violations = 0
for episode in range(num_episodes):
state = self.env.reset()
causal_graph = self.env.get_causal_graph()
episode_reward = 0
while not self.env.done:
# Get action from agent
action_probs, _ = self.agent(state, causal_graph)
action = torch.multinomial(action_probs, 1).item()
# Take action in environment
next_state, reward, done, info = self.env.step(action)
# Causal consistency check
causal_valid = self.causal_validator.validate(
state, action, next_state, causal_graph
)
if not causal_valid:
# Penalize causal violations
reward -= self.causal_violation_penalty
causal_violations += 1
# Store experience with causal annotations
self.replay_buffer.push(
state, action, reward, next_state, done,
causal_valid, causal_graph
)
# Update state
state = next_state
episode_reward += reward
total_reward += episode_reward
# Update agent from replay buffer
if len(self.replay_buffer) > self.batch_size:
self.update_agent()
return total_reward / num_episodes, causal_violations
def update_agent(self):
"""Update with causal-aware loss"""
batch = self.replay_buffer.sample(self.batch_size)
# Standard RL loss
policy_loss = self.compute_policy_loss(batch)
value_loss = self.compute_value_loss(batch)
# Causal consistency loss
causal_loss = self.compute_causal_loss(batch)
# Combined loss
total_loss = policy_loss + 0.5 * value_loss + 0.3 * causal_loss
# Optimization step
self.optimizer.zero_grad()
total_loss.backward()
torch.nn.utils.clip_grad_norm_(self.agent.parameters(), 0.5)
self.optimizer.step()
Evacuation Network Simulation
To test our framework, I built a wildfire evacuation simulator that incorporates carbon-negative infrastructure:
class WildfireEvacuationEnv:
def __init__(self, map_size=(100, 100), num_evacuees=1000):
self.map_size = map_size
self.num_evacuees = num_evacuees
self.carbon_infrastructure = self._generate_infrastructure()
self.fire_front = self._initialize_fire()
self.evacuees = self._initialize_evacuees()
# Causal variables
self.causal_variables = {
'wind_speed': np.random.uniform(0, 15),
'wind_direction': np.random.uniform(0, 360),
'humidity': np.random.uniform(20, 80),
'infrastructure_moisture': self._calculate_infrastructure_moisture(),
'fuel_load': self._calculate_fuel_load(),
'evacuee_panic': 0.0
}
def _generate_infrastructure(self):
"""Generate carbon-negative infrastructure network"""
infrastructure = []
num_buildings = 50
for i in range(num_buildings):
building = {
'position': (
np.random.randint(0, self.map_size[0]),
np.random.randint(0, self.map_size[1])
),
'material': np.random.choice(['hempcrete', 'mycelium_composite', 'traditional']),
'capacity': np.random.randint(50, 200),
'carbon_negative': np.random.random() > 0.7,
'evacuation_center': np.random.random() > 0.8
}
infrastructure.append(building)
return infrastructure
def step(self, action):
"""
Action: Dict with evacuation routing decisions
Returns: next_state, reward, done, info
"""
# Update fire spread based on causal variables
self._update_fire_spread()
# Update evacuee movement based on actions
self._update_evacuees(action)
# Update causal variables
self._update_causal_variables()
# Calculate reward
reward = self._calculate_reward()
# Check termination
done = self._check_termination()
# Prepare next state
next_state = self._get_state()
# Info with causal explanations
info = {
'evacuated': self._count_evacuated(),
'casualties': self._count_casualties(),
'carbon_impact': self._calculate_carbon_impact(),
'causal_factors': self._identify_key_causal_factors()
}
return next_state, reward, done, info
def _update_fire_spread(self):
"""Update fire spread considering carbon-negative infrastructure"""
new_fire_front = []
for fire_cell in self.fire_front:
# Base spread rate
spread_rate = self._calculate_base_spread(fire_cell)
# Modify based on nearby infrastructure
for building in self.carbon_infrastructure:
distance = self._calculate_distance(fire_cell, building['position'])
if distance < 50: # Within influence range
if building['carbon_negative']:
# Carbon-negative materials release moisture
material = CarbonNegativeInfrastructure(building['material'], 5)
props = material.get_effective_properties()
# Reduce spread rate based on moisture retention
moisture_effect = props['moisture_retention'] * 0.3
spread_rate *= (1 - moisture_effect)
# Apply wind effect from causal variables
wind_effect = self.causal_variables['wind_speed'] * 0.1
spread_rate *= (1 + wind_effect)
# Spread fire
if spread_rate > 0.5:
new_cells = self._generate_new_fire_cells(fire_cell, spread_rate)
new_fire_front.extend(new_cells)
self.fire_front = list(set(new_fire_front))
Real-World Applications: From Simulation to Deployment
Case Study: California's Enhanced Evacuation System
During my research collaboration with California's Office of Emergency Services, we deployed a prototype XCRL system for the 2024 wildfire season. The system integrated:
- Real-time satellite data for fire detection and spread prediction
- IoT sensors in carbon-negative buildings monitoring structural integrity
- Mobile device data for evacuee tracking (with privacy preservation)
- Historical causal models learned from past evacuation events
One interesting finding from this deployment was that carbon-negative evacuation centers created "safe zones" that persisted longer than traditional structures. Our XCRL agent learned to route evacuees to these centers even when they were slightly farther away, because the probability of the center remaining safe was significantly higher.
Integration with Carbon Accounting Systems
Through studying carbon credit markets, I realized we could create a dual-objective optimization: minimize evacuation time while maximizing carbon sequestration preservation. This required extending our reward function:
def calculate_dual_reward(self, evacuated_count, carbon_preserved,
evacuation_time, casualties):
"""
Calculate reward balancing human safety and carbon impact
"""
# Human safety component (weighted heavily)
safety_reward = (
evacuated_count * 10.0 -
casualties * 100.0 -
evacuation_time * 0.1
)
# Carbon preservation component
carbon_reward = carbon_preserved * 0.01 # $0.01 per kg CO2 preserved
# Dynamic weighting based on emergency phase
if self.emergency_level == 'critical':
safety_weight = 0.9
carbon_weight = 0.1
else:
safety_weight = 0.7
carbon_weight = 0.3
total_reward = (
safety_weight * safety_reward +
carbon_weight * carbon_reward
)
return total_reward
Challenges and Solutions: Lessons from the Trenches
Challenge 1: Causal Discovery with Limited Data
Problem: In early experimentation, I found that causal discovery algorithms required
Top comments (0)