Tasks¶
PowerZoo ships ready-made benchmark tasks that combine a grid case, data split, agent design, and evaluation protocol into a single object. Use make_task_env to instantiate any task by name:
from powerzoo.tasks import make_task_env, list_public_tasks
print(list_public_tasks())
env = make_task_env("marl_opf", split="train")
# Use PettingZoo Parallel API (no RLlib required)
env = make_task_env("marl_opf", split="train", framework="pettingzoo")
Multi-agent tasks always use specialised task adapters by default (TaskOPFMultiAgentEnv, TaskUCMultiAgentEnv, TaskEVMultiAgentEnv, etc.). These adapters work with or without RLlib; when ray[rllib] is installed, the returned object also satisfies the RLlib MultiAgentEnv interface. To get a task-aware PettingZoo Parallel API wrapper around the same adapter semantics instead, pass framework='pettingzoo'.
The explicit public benchmark set is available via powerzoo.tasks.public or the re-exported helpers:
from powerzoo.tasks import PUBLIC_TASKS, list_public_tasks, get_public_task_catalog
print(PUBLIC_TASKS)
print(list_public_tasks())
print(get_public_task_catalog()[0]["task_id"])
print(get_public_task_catalog()[0]["default_episode_horizon_steps"])
Only tasks that satisfy the benchmark contract stay in PUBLIC_TASKS: documented, registered, instantiable, and smoke-tested. Registered-but-incomplete tasks remain accessible through list_tasks() / make_task_env(...), but are not part of the public benchmark set.
The framework parameter controls which multi-agent interface is used:
| Value | Description |
|---|---|
'auto' (default) |
Specialized task adapter (RLlib-compatible when ray is installed) |
'pettingzoo' |
Task-aware PettingZoo Parallel API via powerzoo.tasks.interfaces.TaskPettingZooWrapper (lightweight, no RLlib needed) |
'rllib' |
Same as 'auto' but raises if ray[rllib] is missing |
Registered task routing¶
| Task name | make_task_env() / create_env() returns |
|---|---|
battery_arbitrage |
FlattenWrapper around PowerEnv (single-agent Gymnasium) |
marl_opf |
TaskOPFMultiAgentEnv |
marl_der_arbitrage, marl_ders_benchmark |
TaskResourceMultiAgentEnv |
marl_ev_v2g |
TaskEVMultiAgentEnv |
dc_scheduling |
FlattenWrapper around PowerEnv (single-agent Gymnasium) |
dc_microgrid, dc_microgrid_safe |
DCMicrogridEnv (single-agent Gymnasium, self-contained) |
gencos_bidding |
GenCosMARLEnv (PettingZoo Parallel API; competitive 5-agent market) |
marl_uc |
TaskUCMultiAgentEnv |
opf_118 / opf_118_7d |
TaskOPFMultiAgentEnv |
joint_trans_dist / joint_trans_dist_7d |
Experimental only — still registered, but not part of PUBLIC_TASKS; instantiation currently fails until the joint adapter/reward path ships |
Public Benchmark Task Cards¶
get_public_task_catalog() returns stable task-card metadata for the current public benchmark set. Use it as the source of truth when building experiment menus, benchmark summaries, or docs that need to stay in sync with the real public tasks.
from powerzoo.tasks import get_public_task_catalog
for card in get_public_task_catalog():
print(card["task_id"], card["grid_case"], card["default_episode_horizon_steps"])
| Task | Grid | Agent mode | Default observation | Reward / cost contract | Horizon | Frameworks |
|---|---|---|---|---|---|---|
battery_arbitrage |
distribution / Case33bw |
single | flattened |
objective-only peak / off-peak arbitrage profit with SOC target shaping; SOC violations in info['cost'] |
48 | gymnasium |
marl_opf |
transmission / Case5 |
multi | global |
shared economic dispatch reward; physical violations in info['cost'] |
48 | auto, rllib, pettingzoo |
marl_der_arbitrage |
distribution / Case33bw |
multi | local_plus_forecast |
shared battery arbitrage reward; voltage / SOC violations in info['cost'] |
48 | auto, rllib, pettingzoo |
marl_ev_v2g |
distribution / Case33bw |
multi | local_plus_forecast |
shared EV arbitrage and departure-readiness reward; grid / EV violations in info['cost'] |
168 | auto, rllib, pettingzoo |
dc_scheduling |
distribution / Case33bw |
single | flattened |
objective-only single-agent energy-SLA-PUE reward; grid and datacenter thermal violations in info['cost_sum'] |
48 | gymnasium |
dc_microgrid |
self-contained DC microgrid | single | flattened |
scalarised r_energy + w_cost·r_cost + w_carbon·r_carbon; vector in info['reward_vector']; SLA / overtemp / power-deficit in info['cost'] |
288 | gymnasium |
dc_microgrid_safe |
self-contained DC microgrid | single | flattened |
same as dc_microgrid with CMDP cost_threshold = 0.5 |
288 | gymnasium |
marl_uc |
transmission / Case5 |
multi | global |
shared UC economic reward; physical violations in info['cost'] |
48 | auto, rllib, pettingzoo |
opf_118 |
transmission / Case118 |
multi | global |
shared large-scale economic dispatch reward; physical violations in info['cost'] |
48 | auto, rllib, pettingzoo |
opf_118_7d |
transmission / Case118 |
multi | global |
shared large-scale economic dispatch reward; physical violations in info['cost'] |
336 | auto, rllib, pettingzoo |
Internal atomic validation presets live under powerzoo.tasks.atomic, but they are intentionally not part of the public benchmark set.
Simple Tasks¶
powerzoo.tasks.simple.MARLOPFTask(case='Case5', split='train', start_date=None, end_date=None, delta_t_minutes=30, max_load_ratio=None, max_steps=48, action_mode='score', observation_mode='global', forecast_horizon_steps=4, constraint_tightness='standard', **kwargs)
¶
Bases: MultiAgentTask
Multi-Agent OPF Control Task
Multi-agent economic dispatch on the IEEE 5-bus system. Each generator learns to coordinate its output as an independent agent.
Initialize the MARL OPF task.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
case
|
str
|
Grid case ('Case5', 'Case118', etc.) |
'Case5'
|
split
|
Optional[str]
|
Data split — 'train', 'val', or 'test'. Sets the
date range for episode sampling. Ignored when
|
'train'
|
start_date
|
Optional[str]
|
Explicit start date (overrides |
None
|
end_date
|
Optional[str]
|
Explicit end date (overrides |
None
|
delta_t_minutes
|
int
|
Time step in minutes. |
30
|
max_load_ratio
|
float
|
Maximum load ratio. |
None
|
max_steps
|
int
|
Max steps per episode (default 48 = 1 day @ 30 min). |
48
|
action_mode
|
str
|
'score' for softmax allocation, 'direct' for MW. |
'score'
|
observation_mode
|
str
|
One of 'global', 'local', 'local_plus_forecast'. |
'global'
|
forecast_horizon_steps
|
int
|
Forecast horizon used in local_plus_forecast mode. |
4
|
**kwargs
|
Other override parameters passed to Task base. |
{}
|
Key parameters
| Parameter | Default | Description |
|---|---|---|
case |
'Case5' |
Grid case name |
split |
'train' |
Data split: 'train', 'val', or 'test' |
action_mode |
'score' |
'score' (softmax allocation) or 'direct' (MW output) |
max_load_ratio |
0.9 |
Maximum load as fraction of total generation capacity |
max_steps |
48 |
Steps per episode (48 = 1 day at 30-min resolution) |
Agent design
- Action: score ∈ [0, 1] — softmax allocation of net load across generators
- Observation: global features (total load, line flows, time) + local features (unit index, p_min, p_max, cost coefficients)
- Reward: −(generation cost) / 1000 (shared, cooperative)
Data splits (non-overlapping, fixed for benchmark reproducibility)
| Split | Date range |
|---|---|
| train | 2023-07-05 – 2024-12-31 |
| val | 2025-01-01 – 2025-06-30 |
| test | 2025-07-01 – 2025-12-15 |
Middle Tasks¶
powerzoo.tasks.middle.MARLUCTask(case='Case5', split='train', start_date=None, end_date=None, delta_t_minutes=30, max_load_ratio=None, max_steps=48, observation_mode='global', forecast_horizon_steps=4, constraint_tightness='standard', **kwargs)
¶
Bases: MultiAgentTask
Multi-Agent Unit Commitment Task on the IEEE 5-bus system.
Initialise the MARL UC task.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
case
|
str
|
Grid case. Default 'Case5'. |
'Case5'
|
split
|
Optional[str]
|
Data split ('train', 'val', 'test'). |
'train'
|
start_date
|
Optional[str]
|
Explicit start date (overrides split). |
None
|
end_date
|
Optional[str]
|
Explicit end date (overrides split). |
None
|
delta_t_minutes
|
int
|
Time step in minutes. Default 30. |
30
|
max_load_ratio
|
float
|
Max load as fraction of capacity. Defaults to tightness preset (0.9 for standard). |
None
|
max_steps
|
int
|
Steps per episode. Default 48. |
48
|
observation_mode
|
str
|
One of 'global', 'local', 'local_plus_forecast'. |
'global'
|
forecast_horizon_steps
|
int
|
Forecast horizon used in local_plus_forecast mode. |
4
|
constraint_tightness
|
str
|
One of 'loose', 'standard', 'strict'. |
'standard'
|
**kwargs
|
Passed to Task base. |
{}
|
Extends MARLOPFTask with unit commitment decisions. Each generator agent must decide both how much to generate and whether to be online.
Key parameters
| Parameter | Default | Description |
|---|---|---|
case |
'Case5' |
Grid case name |
split |
'train' |
Data split |
max_load_ratio |
0.9 |
Maximum load ratio |
max_steps |
48 |
Steps per episode |
UC defaults (used when not present in case.units columns)
| Column | Default | Unit |
|---|---|---|
startup_cost |
500 | $/start |
shutdown_cost |
200 | $/stop |
ramp_rate |
999 | MW/step |
min_up_time |
1 | steps |
min_down_time |
1 | steps |
Agent design
- Action:
[score, on_off]— 2-element vector;on_off ≥ 0.5commits the unit - Observation: global + local + commitment vector (current on/off status of all units)
- Reward: −(generation cost + startup cost + shutdown cost) / 1000 (economic only)
- Cost signal: physical constraint violations →
info['cost'](CMDP separation)
Complex Tasks¶
powerzoo.tasks.complex.OPF118Task(split='train', start_date=None, end_date=None, delta_t_minutes=30, max_load_ratio=None, max_steps=48, action_mode='score', observation_mode='global', forecast_horizon_steps=4, constraint_tightness='standard', **kwargs)
¶
Bases: MultiAgentTask
Multi-Agent OPF Control on the IEEE 118-bus system.
A complex-difficulty benchmark for large-scale cooperative dispatch.
Initialize the 118-bus OPF task.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
split
|
Optional[str]
|
Data split — 'train', 'val', or 'test'. |
'train'
|
start_date
|
Optional[str]
|
Explicit start date (overrides |
None
|
end_date
|
Optional[str]
|
Explicit end date (overrides |
None
|
delta_t_minutes
|
int
|
Time step in minutes. Default 30. |
30
|
max_load_ratio
|
float
|
Maximum load as fraction of total capacity. Defaults to tightness preset (0.85 for standard). |
None
|
max_steps
|
int
|
Max steps per episode. Default 48 (1 day). |
48
|
action_mode
|
str
|
'score' (softmax allocation) or 'direct' (MW). |
'score'
|
constraint_tightness
|
str
|
One of 'loose', 'standard', 'strict'. |
'standard'
|
**kwargs
|
Passed to Task base. |
{}
|
Large-scale cooperative OPF on the IEEE 118-bus system with 54 generators and 186 transmission lines.
Key parameters
| Parameter | Default | Description |
|---|---|---|
split |
'train' |
Data split |
max_load_ratio |
0.85 |
Maximum load ratio (lower than 5-bus due to larger system) |
max_steps |
48 |
Steps per episode |
action_mode |
'score' |
'score' or 'direct' |
Agent design
- 54 cooperative agents (one per generator)
- Same score-based action / OPF observation protocol as
MARLOPFTask - Shared cooperative reward
OPF118Task7Days¶
7-day (336-step) variant of OPF118Task. All parameters are inherited; max_steps defaults to 336.