Environment stack¶
PowerZoo is built as a stack of layers that can each be used in isolation. The lower layers know nothing about RL; the upper layers add the conventions that RL agents and benchmarks rely on. This page focuses on the runtime layers — see Repository map for the source-tree view.
flowchart TD
subgraph TaskL ["Task layer (powerzoo/tasks)"]
T["Task definitions (simple / middle / complex)"]
A["Task adapters (OPF / UC / Resource / EV / DSO)"]
end
subgraph OrchL ["Orchestration (powerzoo/envs/power_env.py)"]
PE["PowerEnv: grid + resources + reward + clock"]
end
subgraph PhyL ["Physical envs (powerzoo/envs/grid · resource · market · microgrid)"]
G["GridEnv: TransGridEnv · DistGridEnv · DistGrid3PhaseEnv"]
R["ResourceEnv: BatteryEnv · VehicleEnv · SolarEnv · WindEnv · FlexLoad · DataCenterEnv"]
M["MarketEnv: CostBasedMarketEnv · BidBasedMarketEnv · GenCosMARLEnv"]
MG["DCMicrogridEnv (self-contained)"]
end
subgraph DataL ["Data + cases (powerzoo/case · powerzoo/data)"]
C["ClearCase (Case5 / Case33bw / ...)"]
D["DataLoader (semantic signals + parquet)"]
end
subgraph WrapL ["Wrappers (powerzoo/wrappers)"]
W["Gymnasium · Normalization · Forecast · SafeRL · Flatten · MARL"]
end
T --> A --> PE
PE --> G
PE --> R
G --> R
G --> C
G --> D
M --> G
A --> W
The rest of this page describes each runtime layer in 1–2 short paragraphs and links to the page that documents the details.
1. BaseEnv — common interface¶
BaseEnv (powerzoo/envs/base.py) is the abstract parent of every PowerZoo environment. It inherits from gymnasium.Env and adds two PowerZoo-specific attributes:
| Attribute / method | Purpose |
|---|---|
time_step |
Step counter inside the current episode. |
delta_t_minutes |
Step length in minutes (must divide 1440). Default 30. |
action_space / observation_space |
Filled in by subclasses. |
reset(seed, options) |
Resets time_step and returns (state, info) (subclass-specific). |
step(action) |
Subclass-specific. Returns Gymnasium-style 5-tuples at the task layer. |
obs() / reward() / cost() |
Hooks; cost() defaults to 0 (CMDP-friendly). |
Subclasses do not store mutable state in arbitrary attributes — GridEnv and ResourceEnv keep their state in well-defined fields (case data, current_p_mw, soc, …) so that resets are reproducible.
2. GridEnv — physical grids¶
GridEnv (powerzoo/envs/grid/base.py) is the abstract parent for transmission and distribution grids. Its job is to:
- Hold a
ClearCase(topology + per-asset parameters). - Maintain a registry of attached resources (
sub_resources,nodes_resources_map). - Pull the next slice of demand / renewable time series from the bundled data.
- Run a power-flow solve and produce a
statedict plus a structuredinfodict.
The reset → step data flow is the same across all grid types:
flowchart LR
R["env.reset(day_id)"] --> S0["state_0\n(loads, profile sliced)"]
S0 --> Step["env.step(action)"]
Step --> Res["resource.step()\n→ current_p_mw"]
Res --> Agg["nodes_resources_map @ current_p_mw\n(aggregate per bus)"]
Agg --> PF["_run_power_flow → bool"]
PF --> Chk["safety_check + reward"]
Chk --> Out["state_t+1, reward, term, trunc, info"]
Three concrete subclasses ship today:
| Class | Solver | Default case | Notes |
|---|---|---|---|
TransGridEnv |
DC / AC × OPF / PF (4 modes) | Case5 |
physics ∈ {dc, ac} × solver_mode ∈ {opf, pf}. The OPF LP backend is selected by solver_type ∈ {auto, gurobi, scipy, cvxpy}. |
DistGridEnv |
Single-phase BFS (DistFlow) | Case33bw |
Resources modelled as PQ injections; non-radial inputs auto-pruned to spanning tree. |
DistGrid3PhaseEnv |
Three-phase BIBC/BCBV | Case123 |
Per-phase voltages, VUF and phase-aware thermal limits. |
The full per-class parameter table is in Physics · Transmission and Physics · Distribution. The info schema and CMDP cost rules live in Reward and cost split.
3. ResourceEnv — controllable assets¶
ResourceEnv (powerzoo/envs/resource/base.py) is the parent of every controllable asset. Resources are physical sub-components, not standalone RL envs: a resource's step() updates internal state (SOC, temperature, queue, …) but does not produce a reward or termination signal on its own. The Gymnasium step tuple plus CMDP cost vector in info is assembled by PowerEnv plus a Task.
| Class | Action dim | Notes |
|---|---|---|
BatteryEnv |
1D (or 2D with enable_q_control) |
SOC integrator with charge/discharge efficiency; one-way eta_charge=eta_discharge=0.95 by default. |
VehicleEnv |
1D | Battery + commute schedule + departure SOC requirement (G2V / V2G). |
SolarEnv / WindEnv |
1D (curtailment) | Profile-driven, optional reactive control. |
FlexLoad |
2D | Curtailment + demand shifting with a deferred-demand buffer (DR resource). |
DataCenterEnv |
3D | GPU train / finetune ratios + cooling setpoint, with first-order thermal dynamics. |
Cost convention. Any key in resource.status() whose name starts with cost_ is treated as a non-negative CMDP cost component (in physical units). PowerEnv collects them and folds them into info['cost_resource'] and the fixed-order constraint_costs vector; info['cost_resource_violation'] is a legacy alias — see Reward and cost split. This is the only registration step needed when adding a new resource cost.
Detailed parameter tables live in Physics · Resources and API · Resources.
4. PowerEnv — orchestration¶
PowerEnv (powerzoo/envs/power_env.py) binds a grid and a list of resources into a single Gymnasium-style env. It owns the unified episode clock, builds dict observations (grid + per-resource observation + time features), accepts dict actions (unit_power_mw + per-resource), and produces a fully-augmented info dict that aggregates the cost contributions from both layers.
sequenceDiagram
participant Agent
participant Task
participant PowerEnv
participant Grid
participant Resource
Agent->>Task: action (per agent)
Task->>PowerEnv: scenario action dict
PowerEnv->>Resource: resource.step(action_i)
Resource-->>PowerEnv: status (incl. cost_*)
PowerEnv->>Grid: _run_power_flow(action)
Grid-->>PowerEnv: state, info (cost_thermal, cost_voltage, ...)
PowerEnv-->>Task: obs, reward, terminated, truncated, info + constraint_costs
Task-->>Agent: per-agent obs/reward/info + selected_constraint_costs
PowerEnv.from_yaml(path) builds the same object from a YAML scenario config. Most users never instantiate PowerEnv directly; they use make_task_env(...) or powerzoo.rl.make_env(...), which build the appropriate PowerEnv and adapter automatically.
5. Tasks and adapters¶
A task (powerzoo/tasks/) is a benchmark preset: a scenario config (grid + resources), an agent design (per-agent obs / action), a reward / cost contract, an evaluation protocol and (when applicable) SPLIT_DATES for fixed train / val / test windows. Tasks are organised by difficulty:
simple/—battery_arbitrage,marl_opf,marl_der_arbitrage,marl_ders_benchmark,marl_ev_v2g,dc_scheduling,dc_microgrid,dc_microgrid_safe,gencos_bidding.middle/—marl_uc,comparison_tso_centralized.complex/—opf_118,opf_118_7d,joint_trans_dist*(experimental).
A task adapter (powerzoo/tasks/adapters/) translates the task into a concrete RL-facing env (e.g. TaskOPFMultiAgentEnv, TaskUCMultiAgentEnv, TaskResourceMultiAgentEnv, TaskEVMultiAgentEnv).
The envs / tasks / wrappers boundary, per-task adapter routing, and the public benchmark set (stable vs experimental) are all documented in Python contract §6–§7. Per-task hyperparameter tables live in API · Tasks.
6. Wrappers¶
powerzoo/wrappers/ provides generic env-API adapters. They assume the task contract is already satisfied and do not own task semantics. The full set — GymnasiumWrapper, NormalizationWrapper, ForecastWrapper, SafeRLWrapper, GymnasiumSafeWrapper, MARLWrapper, TaskPettingZooWrapper, FlattenWrapper — is documented in Training · Wrappers; per-class signatures are in API · Wrappers.
7. Optional layers¶
The layers above describe the default benchmark stack. Several optional layers extend it:
- Markets —
CostBasedMarketEnv,BidBasedMarketEnvandGenCosMARLEnvadd LMP-driven settlement on top ofTransGridEnv. See Physics · Markets and Benchmarks · GenCos. - DC microgrid —
DCMicrogridEnvis a self-contained behind-the-meter benchmark with no external grid; it composesDataCenterEnv+BatteryEnv+ inline PV + diesel. See Physics · Microgrid and Benchmarks · DC microgrid. - DSO benchmark —
make_dso_env(...)is a separate factory that assemblesCase33bw+ 6×FlexLoad+ Ausgrid time series for distribution-system-operator research. See Benchmarks · DSO. - RL trainer —
powerzoo.rlprovidesmake_env,RLConfig,Trainer,RewardWrapperanddescribe/infofor a one-line training workflow. See Training · Trainers.
See also¶
- Repository map — source-tree view of the same layers.
- Data pipeline — how
case/anddata/actually feed an env. - Training pipeline — env → wrappers → trainer flow.
- Python contract, Reward and cost split.