DEV Community

Dr. Carlos Ruiz Viquez
Dr. Carlos Ruiz Viquez

Posted on

**Temporal Logic Control with Partial Observability in a Dyn

Temporal Logic Control with Partial Observability in a Dynamic Power Grid

We present a novel reinforcement learning challenge that pushes the boundaries of technical capabilities in control theory and artificial intelligence. The objective is to develop an autonomous controller capable of navigating a dynamic power grid while ensuring stability and compliance with safety protocols.

Environment Description:

The power grid is modeled as a complex network of interconnected buses, generators, and loads. Time-varying parameters, such as demand and generation, introduce uncertainty and non-stationarity in the system dynamics. A subset of critical nodes (e.g., those with high demand or critical infrastructure) is designated as "sensitive" nodes that require immediate attention to maintain grid stability.

Temporal Logic Constraints:

The controller must adhere to strict temporal logic specifications, which ensure that certain safety conditions are met at specific times. For instance:

  • At all times, the total power injection at sensitive nodes must be within 5% of the average demand.
  • If a critical failure occurs, the controller must restore power to sensitive nodes within 10 minutes.
  • When demand exceeds 150% of normal capacity, the controller must activate emergency power reduction protocols for 30 minutes.

Partial Observability:

The controller has only partial access to the grid's state information. Observations are collected from a subset of sensors, and these measurements are subject to noise and latency. The controller must make decisions based on these limited observations.

Actions and Rewards:

The controller can take three actions: (1) adjust power injection at a bus, (2) activate emergency protocols, or (3) perform a manual intervention (e.g., dispatch emergency crews). The reward function is multi-objective, balancing the need for stability, power consumption efficiency, and compliance with safety protocols.

Evaluation Metrics:

Performance will be evaluated based on the following metrics:

  • Grid stability (e.g., maximum and average frequency deviations)
  • Power consumption efficiency (e.g., energy wasted due to oscillations)
  • Compliance with temporal logic constraints
  • Average response time to critical failures

Challenge Specifications:

  • Train the controller using a realistic simulation environment with a minimum duration of 24 hours.
  • Evaluate the controller's performance in 10 randomly generated scenarios, each lasting 24 hours.
  • Use a discrete action space, with 20 possible actions per time step.
  • Implement the controller using a Python-based deep learning framework (e.g., TensorFlow or PyTorch).

By tackling this challenge, participants will push the frontiers of temporal logic control, partial observability, and multi-objective reinforcement learning. The winning solution will be published in a renowned academic journal, and the winner will receive recognition and a prize.


Publicado automáticamente

Top comments (0)