Skip to content

Examples

PowerZoo ships script examples in examples/*.py, mirrored as the four short pages in this section. They are deliberately low-level: they show the underlying grid + resource API and how the pieces fit together before task wrappers are applied.

How to Read These Examples

flowchart LR
    A["01 — Load a case\n(static topology)"] --> B["02 — Run power flow\n(physics step)"]
    B --> C["03 — Register resources\n(grid + battery / EV / ...)"]
    C --> D["RL — Battery control\n(custom Gym wrapper + PPO)"]
    D --> E[Training and evaluation\nmake_task_env / powerzoo.rl]

In short:

  • Examples 01–03 rebuild the stack from the ground up. They are useful when designing a new task or debugging the physics, and none of them use make_task_env.
  • RL — Battery control is the bridge: same low-level wiring, trained with a real RL algorithm. For benchmark experiments use make_task_env('battery_arbitrage') or powerzoo.rl.make_env(...) instead; see Training · Trainers.
  • For benchmark experiments, move from these low-level examples to make_task_env(...), powerzoo.rl, and the training pages.

Online RL benchmark — the basics

PowerZoo is an online RL benchmark for power-system environments. Agents learn by interacting with the simulator in real time. The normalized score linearly maps mean episode return between a random-policy baseline (0) and an oracle baseline (1):

normalized_score = (policy_score − random_score) / (oracle_score − random_score) 0 = random, 1 = oracle, values > 1 are possible if a policy beats the oracle heuristic.

Feature CartPole / MuJoCo Atari Offline RL datasets PowerZoo
Domain Robotics / physics Games Robotics / locomotion Power systems
RL type Online Online Offline (static dataset) Online
Agent structure Single Single Single MARL-first[^pz-agents]
Action space Continuous / Discrete Discrete Continuous Continuous
Physical constraints Soft (joint limits) None Soft Hard (grid physics)
Real-world data No No Yes (logged) Yes (bundled real grid traces)
Normalized score No Yes (human = 1) Yes (expert = 1) Yes (oracle OPF = 1)

[^pz-agents]: PowerZoo's public benchmark set includes both single-agent (battery_arbitrage, dc_scheduling, dc_microgrid*) and multi-agent tasks. Use list_public_tasks() for the authoritative list.

Core starter tasks

The four cards below follow the recommended learning path: single-agent distribution → transmission MARL → distribution MARL (batteries) → distribution MARL (EVs).

  • Single-battery arbitrage


    • Grid: IEEE 33-bus distribution (Case33bw)
    • Agent: 1 — continuous charge / discharge
    • Goal: peak / off-peak arbitrage with SOC kept in band
    • Episode: 48 steps × 30 min = 1 day
    • Difficulty: simple
    env = make_task_env("battery_arbitrage")
    
  • MARL economic dispatch (OPF)


    • Grid: IEEE 5-bus transmission
    • Agents: 5 generators, score-based action
    • Goal: minimise generation cost under line limits
    • Episode: 48 steps × 30 min = 1 day
    • Difficulty: simple
    env = make_task_env("marl_opf")
    
  • MARL battery arbitrage (DER)


    • Grid: IEEE 33-bus distribution
    • Agents: 3 batteries (buses 6 / 12 / 18)
    • Goal: peak / valley arbitrage with voltage and SOC limits
    • Episode: 48 steps × 30 min = 1 day
    • Difficulty: simple
    env = make_task_env("marl_der_arbitrage")
    
  • EV fleet V2G


    • Grid: IEEE 33-bus distribution
    • Agents: 5 EVs with V2G / G2V
    • Goal: arbitrage profit + departure SOC ≥ 80%
    • Episode: 168 steps × 60 min = 1 week
    • Difficulty: simple
    env = make_task_env("marl_ev_v2g")
    

The full public task list — including marl_uc, opf_118, opf_118_7d, dc_scheduling, dc_microgrid, dc_microgrid_safe, marl_ders_benchmark and gencos_bidding — lives in API · Tasks. Run list_public_tasks() for the authoritative list at any time:

from powerzoo.tasks import list_public_tasks, get_public_task_catalog
print(list_public_tasks())

for card in get_public_task_catalog():
    print(card['task_id'], card['grid_case'], card['default_episode_horizon_steps'])

Fixed Data Splits

These tasks share the same non-overlapping splits driven by the bundled GB demand trace (GB_Forecast_Actual_Demand_2023_2025_30min):

Split Date range Purpose
train 2023-07-05 – 2024-12-31 Algorithm training
val 2025-01-01 – 2025-06-30 Hyperparameter tuning
test 2025-07-01 – 2025-12-15 Official benchmark evaluation

The DSO benchmark (make_dso_env(...)) uses Ausgrid splits instead — see Benchmarks · DSO.

train_env = make_task_env("marl_opf", split="train")
test_env  = make_task_env("marl_opf", split="test")