Benchmarks overview¶

PowerZoo's public benchmark set is organised around five agent-centric task suites. Each suite targets a different RL research question and uses a different underlying environment, agent structure, action space and constraint regime; any two suites therefore differ on at least four of these dimensions.

The five suites¶

Suite	Pillar	Underlying env	Agents	Steps × Δt	RL paradigm
TSO	Security dispatch	Transmission `Case5` / `Case118`	1 (UC) / 54 (`opf_118`)	48 × 30 min (336 × 30 min for 7d)	Safe RL with mixed discrete-continuous action
DSO	DER control	Distribution `Case33bw` + 6× `FlexLoad` + Ausgrid	1	48 × 30 min	Non-stationary RL (loss minimisation)
DERs	DER control	Distribution `Case33bw` / `Case118zh` + heterogeneous DERs	3 / 5 / 12	48 × 30 min (168 for EV)	Scalable safe MARL
DC microgrid	DER control	Self-contained DC microgrid	1	288 × 5 min	Multi-objective robust RL
GenCos	Market lite	Transmission `Case5` + market clearing	5	48 × 30 min	Competitive MARL

Each suite targets a different RL paradigm, but all five share the same make_task_env interface and the same reward / cost split. No single algorithm performs best on all five suites; this is by design.

flowchart LR
    SD["Security Dispatch"] --> TSO[TSO]
    DERC["Intertemporal DER Control"] --> DSO[DSO]
    DERC --> DERS[DERs]
    DERC --> DCM[DC microgrid]
    MKT["Market Lite"] --> GC[GenCos]

Public PowerZoo task names per suite¶

Suite	PowerZoo task(s)
TSO	`marl_uc` (UC, `Case5`), `opf_118`, `opf_118_7d` (large-scale ED, `Case118`)
DSO	`make_dso_env(...)` factory (assembled outside the standard registry)
DERs	`marl_der_arbitrage` (`Case33bw`, 3 batteries), `marl_ders_benchmark` (`Case118zh`, 12 heterogeneous DERs), `marl_ev_v2g` (5 EVs)
DC microgrid	`dc_microgrid`, `dc_microgrid_safe`
GenCos	`gencos_bidding`

Smaller starter tasks sit outside the main set but remain in the public API: battery_arbitrage, marl_opf (5-bus MARL ED), dc_scheduling. They are the recommended first targets for unit testing and quick iteration; see Examples.

How to read a benchmark page¶

Every per-suite page follows the same template:

Design intent and research question — what RL difficulty this suite is built for.
Physical setup — case, resources, constraints.
Agent design — observation, action, reward, cost.
Variants — the in-suite difficulty ladder.
Splits — train / val / test (or train / iid / OOD for DSO).
Baselines — random, rule-based, and any oracle reference.
Metrics — what to report.
Code recipes — make_task_env(name) + a minimal training stub.

Difficulty axes across the five suites¶

The suites together span five orthogonal difficulty axes. You can cite this matrix in a paper to argue that the benchmark is diverse:

Axis	Easiest	→	Hardest
Information structure	TSO (centralised full info)	DSO (single-agent non-stationary) → DERs (Dec-POMDP)	GenCos (private-info competition)
Action complexity	DSO (continuous)	DC (multi-dim continuous) → DERs (per-agent joint) → TSO (mixed)	GenCos (multi-agent strategic)
Safety regime	GenCos (profit-driven)	DSO (operational quality) → DC (SLA + thermal + power balance)	DERs (hard voltage) → TSO (full safety)
Exogenous uncertainty	TSO (intra-day variation)	DSO (drift) → DERs (renewable)	DC (workload + solar + thermal) → GenCos (+ adaptive opponents)
Objective complexity	GenCos (profit)	DSO (loss + serve) → TSO (cost + safety)	DERs (safety + fairness) → DC (3-vector + constraints)

Each axis has a monotone ordering across the five suites, so picking different suites gives different RL difficulty rather than the same problem under different names.

What the public-API helpers report¶

from powerzoo.tasks import PUBLIC_TASKS, list_public_tasks, get_public_task_catalog

print(PUBLIC_TASKS)
print(list_public_tasks())
catalog = get_public_task_catalog()
print(catalog[0]['task_id'], catalog[0]['default_episode_horizon_steps'])

PUBLIC_TASKS includes the five main suites plus the smaller starter tasks. make_dso_env(...) is the only main task absent from PUBLIC_TASKS: it uses its own factory instead of the standard registry.