Temporal Logic Control with Partial Observability in a Dynamic Power Grid
We present a novel reinforcement learning challenge that pushes the boundaries of technical capabilities in control theory and artificial intelligence. The objective is to develop an autonomous controller capable of navigating a dynamic power grid while ensuring stability and compliance with safety protocols.
Environment Description:
The power grid is modeled as a complex network of interconnected buses, generators, and loads. Time-varying parameters, such as demand and generation, introduce uncertainty and non-stationarity in the system dynamics. A subset of critical nodes (e.g., those with high demand or critical infrastructure) is designated as "sensitive" nodes that require immediate attention to maintain grid stability.
Temporal Logic Constraints:
The controller must adhere to strict temporal logic specifications, which ensure that certain safety conditions are met at specific times. For instance:
- At all times, the total power injection at sensitive nodes must be within 5% of the average demand.
- If a critical failure occurs, the controller must restore power to sensitive nodes within 10 minutes.
- When demand exceeds 150% of normal capacity, the controller must activate emergency power reduction protocols for 30 minutes.
Partial Observability:
The controller has only partial access to the grid's state information. Observations are collected from a subset of sensors, and these measurements are subject to noise and latency. The controller must make decisions based on these limited observations.
Actions and Rewards:
The controller can take three actions: (1) adjust power injection at a bus, (2) activate emergency protocols, or (3) perform a manual intervention (e.g., dispatch emergency crews). The reward function is multi-objective, balancing the need for stability, power consumption efficiency, and compliance with safety protocols.
Evaluation Metrics:
Performance will be evaluated based on the following metrics:
- Grid stability (e.g., maximum and average frequency deviations)
- Power consumption efficiency (e.g., energy wasted due to oscillations)
- Compliance with temporal logic constraints
- Average response time to critical failures
Challenge Specifications:
- Train the controller using a realistic simulation environment with a minimum duration of 24 hours.
- Evaluate the controller's performance in 10 randomly generated scenarios, each lasting 24 hours.
- Use a discrete action space, with 20 possible actions per time step.
- Implement the controller using a Python-based deep learning framework (e.g., TensorFlow or PyTorch).
By tackling this challenge, participants will push the frontiers of temporal logic control, partial observability, and multi-objective reinforcement learning. The winning solution will be published in a renowned academic journal, and the winner will receive recognition and a prize.
Publicado automáticamente
Top comments (0)