Python API Contract¶
PowerZoo follows a small set of stable conventions on top of Gymnasium / PettingZoo / RLlib. This page is the authoritative description of those conventions; other pages assume them.
The contract has four layers:
- The base env interface (
BaseEnv). - Single-agent task envs (Gymnasium 5-tuple).
- Multi-agent task envs (PettingZoo Parallel or RLlib
MultiAgentEnv). - The framework / observation / cost contract that every task obeys.
The reward / cost split is central enough to have its own page: Reward and cost split.
1. BaseEnv — the abstract parent¶
BaseEnv (powerzoo/envs/base.py) inherits from gymnasium.Env and adds two PowerZoo-specific attributes:
| Attribute / method | Purpose |
|---|---|
time_step |
Step counter inside the current episode. |
delta_t_minutes |
Step length in minutes (must divide 1440). Default 30. |
action_space / observation_space |
Filled in by subclasses. |
reset(seed, options) |
Resets time_step and returns (state, info) (subclass-specific). |
step(action) |
Subclass-specific. Returns Gymnasium-style 5-tuples at the task layer. |
obs() / reward() / cost() |
Hooks; cost() defaults to 0 (CMDP-friendly). |
Subclasses do not store mutable state in arbitrary attributes. GridEnv and ResourceEnv keep their state in well-defined fields (case data, current_p_mw, soc, …) so that resets are reproducible.
BaseEnv is used directly by GridEnv, ResourceEnv, PowerEnv, MarketEnv and DCMicrogridEnv. The first three are described in Architecture · Environment stack; the last two have dedicated pages under Physics and Physics · Microgrid.
2. Single-agent task envs (Gymnasium 5-tuple)¶
Single-agent tasks (battery_arbitrage, dc_scheduling, dc_microgrid, dc_microgrid_safe) return a standard gymnasium.Env. Use the usual loop:
from powerzoo.tasks import make_task_env
env = make_task_env("battery_arbitrage", split="train")
obs, info = env.reset(seed=0)
terminated = truncated = False
while not (terminated or truncated):
obs, reward, terminated, truncated, info = env.step(env.action_space.sample())
The flat observation is produced by stacking FlattenWrapper on top of PowerEnv (which itself binds a GridEnv and one or more ResourceEnv instances). info carries the full physical breakdown described in §4.
3. Multi-agent task envs¶
Multi-agent tasks have two compatible interfaces. Both share identical reward, cost and observation semantics — they only differ in what step() returns and how done is signalled.
framework= |
Returned env | When to use |
|---|---|---|
'auto' (default) |
Specialised task adapter (RLlib-compatible when ray is installed) |
Default; works without RLlib too. The episode ends when terminateds.get("__all__") or truncateds.get("__all__") is true. |
'pettingzoo' |
Task-aware PettingZoo Parallel API wrapper around the same adapter | Use the while env.agents: idiom; the wrapper clears env.agents when the episode ends. |
'rllib' |
Same as 'auto', but raises if ray[rllib] is missing |
Make the dependency explicit in production runs. |
PettingZoo path:
env = make_task_env("marl_opf", framework="pettingzoo")
obs, info = env.reset(seed=42)
while env.agents:
actions = {a: env.action_space(a).sample() for a in env.agents}
obs, rewards, terms, truncs, info = env.step(actions)
Auto / RLlib path:
env = make_task_env("marl_opf", framework="auto")
obs, infos = env.reset(seed=42)
terminated = truncated = False
while not (terminated or truncated):
actions = {a: env.action_space[a].sample() for a in env.possible_agents}
obs, rewards, terms, truncs, infos = env.step(actions)
terminated = bool(terms.get("__all__", False))
truncated = bool(truncs.get("__all__", False))
The PettingZoo bridge that powers framework='pettingzoo' lives in powerzoo/tasks/interfaces/pettingzoo.py and is re-exported as powerzoo.wrappers.TaskPettingZooWrapper for backward compatibility. It must preserve every aspect of the underlying adapter's contract.
4. The info dict¶
Every grid env populates info with at least the following fields. Tasks and PowerEnv may add more, but never less.
| Key | Type | Meaning |
|---|---|---|
is_safe |
bool | All physical limits satisfied this step. |
pf_converged |
bool | Power-flow solver converged. |
cost_exception |
bool | An exception was raised inside the PF solve. |
cost_thermal_overload |
float (MW) | Sum of line-flow over-limits. |
cost_voltage_violation |
float (pu) | Sum of bus-voltage out-of-band magnitudes. |
cost_sum |
float | Total physical violation cost (sum of cost_* from grid + resources). |
p_slack_MW / q_slack_MVAr |
float | Slack-bus active / reactive injection (distribution: feeder-head exchange). |
is_diverged |
bool | BFS hit max_iter before reaching tolerance. |
voltage_collapse |
bool | Severe unclamped low-voltage detected (treated as PF failure). |
PowerEnv then aggregates resource cost contributions into the same dict — the full data flow is in Reward and cost split.
5. Observation modes¶
PowerZoo defines five canonical observation modes. The list lives in code at powerzoo.tasks.observation:
from powerzoo.tasks.observation import OBSERVATION_MODES
print(OBSERVATION_MODES)
# ('global', 'local', 'local_plus_forecast', 'local_plus_voltage', 'ders_local')
Each mode is a tuple of feature names that the task adapter materialises into a per-agent Box observation. Tasks declare which modes they support via make_observation_config(...); you can inspect the actual field order via get_observation_fields() on the adapter.
| Mode | What the agent sees | Used as |
|---|---|---|
global |
Shared grid summary (total load, normalized line flows, time features) plus the agent's immutable parameters. | Easiest setting; closest to centralised training (CTDE). |
local |
Only the agent's own state and adjacent grid signals. No system-wide summary. | Hardest default setting; typically only solvable with learned communication. |
local_plus_forecast |
local plus the task's declared forecast window (load and / or price and / or availability). |
Medium setting; lets you study how forecast quality affects performance. |
local_plus_voltage |
local plus a per-feeder or per-zone voltage summary. |
Distribution-side tasks where voltage is the binding safety signal but you do not want to give away the global state. |
ders_local |
Compact local layout for heterogeneous DER fleets, in a uniform per-agent vector typed by resource role. |
marl_ders_benchmark and similar mixed-resource tasks. |
The default mode for each public task lives on the task class itself; get_public_task_info(name)['default_observation_mode'] reports the current value.
| Task | Default mode | Other modes typically supported |
|---|---|---|
marl_opf, marl_uc, opf_118, opf_118_7d |
global |
local, local_plus_forecast |
marl_der_arbitrage |
local_plus_forecast |
local, local_plus_voltage |
marl_ders_benchmark |
ders_local |
local, local_plus_voltage |
marl_ev_v2g |
local_plus_forecast |
local |
battery_arbitrage, dc_scheduling, dc_microgrid* |
flattened (single-agent) |
n/a |
You can convert one task into a difficulty ladder simply by switching modes:
from powerzoo.tasks import make_task_env
easy = make_task_env("marl_opf", split="train", obs_mode="global")
medium = make_task_env("marl_opf", split="train", obs_mode="local_plus_forecast")
hard = make_task_env("marl_opf", split="train", obs_mode="local")
6. The envs / tasks / wrappers boundary¶
PowerZoo keeps three packages strictly separated, each owning a single concern:
flowchart LR
subgraph envs ["envs/ — physical simulation"]
E1[GridEnv / ResourceEnv / PowerEnv]
E2[time progression, PF, cost_*]
end
subgraph tasks ["tasks/ — benchmark presets + adapters"]
T1[Task definitions]
T2[Adapters\n(OPF / UC / Resource / EV)]
T3[Public benchmark set + registry]
end
subgraph wrappers ["wrappers/ — generic API adaptation"]
W1[Gymnasium / Flatten / Normalize]
W2[SafeRL / Forecast / MARL]
end
envs --> tasks
tasks --> wrappers
The arrows are one-directional. envs/ must work standalone (no tasks/ import). wrappers/ must not patch task-specific bugs — those go in tasks/ or envs/. Adapters in powerzoo/tasks/adapters/ parse task-level actions, call the underlying PowerEnv, package per-agent info, cost and costs, and expose get_observation_fields(). They do not re-implement physics.
7. Public benchmark set¶
The explicit, stable public benchmark set is powerzoo.tasks.public.PUBLIC_TASKS:
from powerzoo.tasks import PUBLIC_TASKS, list_public_tasks, get_public_task_catalog
print(PUBLIC_TASKS)
print(list_public_tasks())
catalog = get_public_task_catalog()
print(catalog[0]['task_id'], catalog[0]['default_episode_horizon_steps'])
To stay in PUBLIC_TASKS, a task must:
- Be registered in
powerzoo.tasks.registry. - Be documented and instantiable via
make_task_env(name, split=...). - Be smoke-tested on at least one episode for each of
train/val/test. - Be consistent with the contract on this page.
Registered-but-incomplete tasks (joint_trans_dist*, atomic validation presets, …) remain accessible through list_tasks() and make_task_env(...) but are not part of the public benchmark set.