Grid Environments¶
powerzoo.envs.base.BaseEnv(delta_t_minutes=1.0)
¶
Bases: Env, ABC
Base environment for PowerZoo.
Provides:
- np_random: Gymnasium's seeded RNG, initialised through
super().reset(seed=seed, options=options).
- observation_space / action_space placeholders (None until
subclass initialises them in __init__).
- Abstract step / obs methods that subclasses must implement.
- Default reward / cost hooks for reward-shaping and CMDP cost.
Seed contract¶
Subclasses should call super().reset(seed=seed, options=options) as
the first line of their reset() implementation. This delegates RNG
management to Gymnasium and preserves its seed=None semantics: reuse
the current generator when one already exists.
reset(*, seed=None, options=None)
¶
Apply common Gymnasium reset bookkeeping.
The base implementation seeds np_random (via super().reset),
resets time_step to 0, and returns the placeholder pair
(None, {}).
Subclass contract¶
- Call
super().reset(seed=seed, options=options)as the first line — this seedsself.np_randomand resets the clock. - After super(), build and return a valid
(observation, info)pair that satisfiesself.observation_space. - Never return the base placeholder
(None, {})from a concrete environment — downstream wrappers and RL libraries will fail onNoneobservations.
step(action)
abstractmethod
¶
Apply action and return (obs, reward, terminated, truncated, info).
Subclasses are responsible for advancing self.time_step inside
step() and for computing truncated from their own episode-limit
policy (for example max_episode_steps).
Recommended implementation order¶
- Action decode — map the RL agent's action (e.g. normalised
values in
[-1, 1]) to physical setpoints (MW, MVar, °C …). - Physics — run power flow / OPF / dynamic simulation for the current time step.
- State extract — read voltages, flows, costs, and resource state-of-charge from the solver result.
- Observation — call
self.obs(state)to produce the agent- facing observation array. - Reward / cost — compute scalar reward and, if applicable, CMDP
cost terms; populate
infowithcost_*keys. - Clock advance — increment
self.time_step; settruncatedwhen the episode step limit is reached.
obs(state)
abstractmethod
¶
Convert internal state dict to a flat observation array.
Subclasses should implement this to transform the internal full physical state into the agent-facing observation, which may be local, partial, noisy, or otherwise filtered relative to the simulator state.
reward(r, state, info)
¶
Return the scalar reward after optional reward-shaping adjustments.
This is a lightweight base-class hook for reward shaping. Concrete environments or wrappers can override it to transform the scalar reward while leaving the main transition logic elsewhere.
For non-trivial reward shaping, prefer using a gym.Wrapper
instead; this hook is intended as a lightweight escape hatch for
environment-internal adjustments only.
powerzoo.envs.grid.base.GridEnv(case=None, solver=None, delta_t_minutes=30.0, data_loader=None, start_date='2024-01-01', end_date='2024-01-31', load_columns=None, max_load_ratio=0.9, min_load_ratio=None, time_series=None, max_episode_steps=None, randomize_start_time=False, time_alignment=None)
¶
Bases: BaseEnv
Base power grid environment (inner physical simulation engine).
GridEnv acts as the physical state machine layer of the PowerZoo
stack. Its reset() and step() return a raw state dict (the full
power-flow solution) rather than a flat numpy observation array, because the
state is consumed by upper-layer components that each need different subsets
of it:
PowerEnv— the benchmark-facing RL facade. Callsgrid.step()to advance the simulation, then passes the returned state dict togrid.obs(state)and to each resource'sobs()to assemble the agent observation.ObsWrapper/gym_wrappers.GymWrapper— thin Gym-compatibility wrappers that callenv.obs(state)directly on aGridEnvsubclass for single-grid use-cases that don't needPowerEnv.
Because of this design, passing a bare GridEnv subclass to standard Gym
validation tools (e.g. gymnasium.utils.env_checker.check_env) will
produce an observation-type warning — the raw state dict does not conform to
observation_space. Always wrap the env in PowerEnv or a Gym wrapper
before running a standard RL training loop or check.
Subclasses must implement¶
_run_power_flow(action)— run the solver; returnTrueon success._get_state()— build and return the full state dict after a solve.obs(state)— project the state dict to a flatfloat32array matchingobservation_space.build_info(state)— build the step info dict.
Initialize grid environment
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
case
|
Any
|
Power system case |
None
|
solver
|
Any
|
Power flow solver |
None
|
delta_t_minutes
|
float
|
Time step in minutes (default: 30) |
30.0
|
data_loader
|
Optional[DataLoader]
|
DataLoader instance. If None, creates default DataLoader.
Ignored when |
None
|
start_date
|
str
|
Start date for data loading (format: 'YYYY-MM-DD').
Ignored when |
'2024-01-01'
|
end_date
|
str
|
End date for data loading (format: 'YYYY-MM-DD').
Ignored when |
'2024-01-31'
|
load_columns
|
Optional[List[str]]
|
Semantic signal names to load.
Default: |
None
|
max_load_ratio
|
float
|
Maximum load as ratio of total generation capacity (default: 0.9) |
0.9
|
min_load_ratio
|
Optional[float]
|
Minimum load as ratio of total generation capacity (default: None) |
None
|
time_series
|
Any
|
Custom time-series data supplied directly by the user.
Accepted formats:
- |
None
|
max_episode_steps
|
Optional[int]
|
Maximum steps per episode before |
None
|
randomize_start_time
|
bool
|
When True, choose a random intra-day starting
offset at each reset instead of always beginning at
|
False
|
time_alignment
|
Optional[Dict[str, str]]
|
Per-signal time-alignment overrides for
cross-period data. Example:
|
None
|
reset(*, seed=None, options=None, day_id=None)
¶
Reset grid environment and all sub-resources.
Follows the Gymnasium v26+ convention: seed is the primary
reproducibility parameter; day_id (legacy kwarg) or
options={'day_id': N} both work for selecting a specific day.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
seed
|
Optional[int]
|
Random seed. Passed through |
None
|
options
|
Optional[Dict]
|
Optional dict. Recognised keys:
|
None
|
day_id
|
Optional[int]
|
Legacy kwarg; takes precedence over |
None
|
step(action=None)
¶
Execute one time step
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
action
|
Any
|
Dict containing: - resource actions (key=resource_id, value=resource action) - grid control parameters (subclass-specific, e.g., unit_power_mw, node_load_mw) |
None
|
Returns:
| Type | Description |
|---|---|
|
state, reward, done, truncated, info |
obs(state=None)
¶
Convert a raw state dict to a flat float32 observation array.
This method is part of GridEnv's public API and is called by:
PowerEnv._build_agent_observation(state)— assembles the combined grid + resource + time observation for the RL agent.GymWrapper(powerzoo.wrappers.gym_wrappers) — provides a thin Gym-compliant wrapper for direct single-grid use.
Subclasses must implement this and define self.observation_space
to match the returned array's shape and dtype. The default raises
NotImplementedError.
register_resource(resource, bus_id, name=None)
¶
Register a resource and assign unique ID.
Calls _on_resource_changed() after updating internal state so that
subclasses can rebuild observation_space and action_space.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
resource
|
Any
|
Resource instance to register |
required |
bus_id
|
int
|
Bus ID where resource is connected |
required |
name
|
Optional[str]
|
Optional custom name. If None, auto-generated (e.g., 'solar_0') |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
resource_id |
str
|
The assigned resource ID |
unregister_resource(resource_id)
¶
Unregister a resource.
Calls _on_resource_changed() after updating internal state.
cal_pf(*args, **kwargs)
¶
Run power flow and return result dict (must be implemented).
If the solver does not converge, implementations should return a
result dict with converged=False (or equivalent flag) rather
than raising an exception, so that downstream safety_check and
reward/cost logic can handle divergence gracefully.
safety_check(*args, **kwargs)
¶
Check physical constraints (must be implemented).
Implementations must be robust to non-converged or NaN-laden states
(e.g. when cal_pf did not converge). In such cases the method
should report maximum violation / unsafe status instead of crashing.
powerzoo.envs.grid.trans.TransGridEnv(case=None, solver=None, delta_t_minutes=30.0, data_loader=None, start_date='2024-01-01', end_date='2024-01-31', load_columns=None, max_load_ratio=0.9, min_load_ratio=None, time_series=None, max_episode_steps=None, randomize_start_time=False, physics='dc', solver_mode='opf', solver_type='auto', normalize_actions=True, ac_config=None, difficulty=None, reward_scale=0.01, control_der=False)
¶
Bases: GridEnv
Transmission grid environment.
Two orthogonal parameters control the solver behaviour:
physics—'dc'(linearised, P only) or'ac'(full AC with voltage and reactive power).solver_mode—'opf'(environment runs OPF internally; RL agent provides bids / commitments) or'pf'(environment evaluates power flow only; RL agent provides unit dispatch directly).
The four resulting modes are DCOPF, ACOPF, DCPF, and ACPF.
Default case is Case5.
Initialize transmission grid environment.
Two orthogonal parameters control the solver behaviour:
physics— physical model:'ac'(full AC equations) or'dc'(linearised DC approximation).solver_mode— solver role:'opf'(environment runs optimal power flow internally; RL agent provides bids / commitments) or'pf'(environment only evaluates power flow; RL agent provides unit dispatch directly viaaction['unit_power_mw']).
The four combinations are:
======= =========== ========== ======================================= physics solver_mode Solver RL use-case ======= =========== ========== ======================================= dc opf DCOPF Agent learns bidding / UC strategy ac opf ACOPF Agent learns bidding (with voltage) dc pf DCPF Agent learns dispatch (P only) ac pf ACPF (NR) Agent learns dispatch (P + V + Q) ======= =========== ========== =======================================
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
case
|
ClearCase
|
Power system case (default: Case5) |
None
|
solver
|
Any
|
Power flow solver |
None
|
delta_t_minutes
|
float
|
Time step in minutes (default: 30) |
30.0
|
data_loader
|
Optional[DataLoader]
|
DataLoader instance |
None
|
start_date
|
str
|
Start date for data loading |
'2024-01-01'
|
end_date
|
str
|
End date for data loading |
'2024-01-31'
|
load_columns
|
Optional[List[str]]
|
List of columns to load |
None
|
max_load_ratio
|
float
|
Maximum load as ratio of total capacity |
0.9
|
min_load_ratio
|
Optional[float]
|
Minimum load as ratio of total capacity (optional) |
None
|
time_series
|
Any
|
Custom time-series data |
None
|
max_episode_steps
|
Optional[int]
|
Max steps before truncation (default: one full day) |
None
|
physics
|
str
|
Physical model — |
'dc'
|
solver_mode
|
str
|
Solver role — |
'opf'
|
solver_type
|
str
|
OPF LP solver - 'auto', 'gurobi', 'scipy', 'cvxpy'. |
'auto'
|
normalize_actions
|
bool
|
Whether to normalise actions to [-1, 1]. |
True
|
ac_config
|
ACConfig
|
AC solver parameters as an :class: |
None
|
difficulty
|
Optional[str]
|
Preset difficulty level - 'easy', 'medium', or 'hard'.
When set, overrides |
None
|
reward_scale
|
float
|
Multiplicative scaling factor applied to generation cost in the
reward signal ( |
0.01
|
control_der
|
bool
|
When |
False
|
reset(*, seed=None, options=None, day_id=None)
¶
Reset transmission grid and run initial power flow.
obs(state=None)
¶
Convert current (or provided) state to a flat float32 observation array.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
state
|
Any
|
Internal state dict (as returned by |
None
|
Returns:
| Type | Description |
|---|---|
ndarray
|
numpy.ndarray of shape |
cal_pf(unit_power_mw, node_load_mw, df=False)
¶
Thin wrapper — see :func:trans_solve.cal_pf.
Side effects: updates self._power_imbalance_mw and
self._slack_gen_violation_mw.
safety_check(line_flow_mw, with_info=False)
¶
Thin wrapper — see :func:trans_solve.safety_check.
render(mode='human')
¶
Render the transmission grid state.
Produces a two-panel figure (network topology + unit dispatch chart).
See :func:powerzoo.envs.grid._render.render_trans_grid for details.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mode
|
str
|
|
'human'
|
powerzoo.envs.grid.dist.DistGridEnv(case=None, solver=None, delta_t_minutes=30.0, data_loader=None, start_date='2024-01-01', end_date='2024-01-31', load_columns=None, max_load_ratio=0.9, min_load_ratio=None, time_series=None, max_iter=100, tol=1e-06, max_episode_steps=None, randomize_start_time=False, v_slack=1.0, v_min=0.9, v_max=1.1, allow_mesh_pruning=True, difficulty=None, violation_penalty_weight=0.0, v_dev_penalty_weight=0.0, loss_penalty_weight=0.1)
¶
Bases: _DistPFMixin, _DistLoadsMixin, GridEnv
Distribution grid environment using BFS (Backward/Forward Sweep) power flow.
Default case is Case33bw; handles both active (P) and reactive (Q) power. The physical core is a single-phase balanced radial DistFlow solver: explicit voltage-angle states, phase coupling, and unbalance are outside this benchmark surface.
Naming conventions (consistent with MATPOWER):
baseMVA, baseKV — system base values
slack_bus_id — slack/reference bus index (0-based); alias: ref_bus
v_slack — slack bus voltage setpoint (p.u.); alias: v_ref_mag
p_flow_MW — sending-end (from-bus) active power flow on each branch
p_loss_MW — active power loss (I²R) on each branch
p_slack_MW — total slack-bus active-power injection into the feeder
q_slack_MVAr — total slack-bus reactive-power injection into the feeder
is_diverged — BFS failed to satisfy the iteration tolerance before
hitting max_iter (distinct from voltage collapse)
Resource control mode
All non-slack buses are treated as PQ buses by the BFS solver.
When a resource is registered via register_resource(), it operates
in PQ control mode: its P/Q setpoints are subtracted from the nodal
load before the BFS solve.
Action space: only gymnasium.spaces.Box (continuous) is supported.
Initialize distribution grid environment
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
case
|
ClearCase
|
ClearCase instance (default: Case33bw) |
None
|
solver
|
Any
|
Optional external solver |
None
|
delta_t_minutes
|
float
|
Time step length in minutes (default: 30). |
30.0
|
data_loader
|
Optional[DataLoader]
|
DataLoader for external time-series data. |
None
|
start_date
|
str
|
Start date for data loading. |
'2024-01-01'
|
end_date
|
str
|
End date for data loading. |
'2024-01-31'
|
load_columns
|
Optional[List[str]]
|
Columns to load from DataLoader. Distribution envs
also honour an optional |
None
|
max_load_ratio
|
float
|
Peak load as fraction of case capacity (default: 0.9). |
0.9
|
min_load_ratio
|
Optional[float]
|
Minimum load ratio (optional). |
None
|
time_series
|
Any
|
Custom time-series (numpy array or DataFrame). |
None
|
max_iter
|
int
|
Maximum iterations for power flow (default: 100) |
100
|
tol
|
float
|
Convergence tolerance (default: 1e-6) |
1e-06
|
max_episode_steps
|
Optional[int]
|
Max steps per episode before truncation. |
None
|
randomize_start_time
|
bool
|
Randomize intra-day start offset on reset. |
False
|
v_slack
|
float
|
Slack bus voltage setpoint in p.u. (default: 1.0). |
1.0
|
v_min
|
float
|
Minimum voltage limit (p.u., default: 0.90). |
0.9
|
v_max
|
float
|
Maximum voltage limit (p.u., default: 1.10). |
1.1
|
allow_mesh_pruning
|
bool
|
If True (default), extra lines in a non-radial input are pruned to the BFS first-visit spanning tree with a warning. If False, env initialization fails fast on mesh input. |
True
|
difficulty
|
Optional[str]
|
Preset - 'easy', 'medium', or 'hard'. Overrides v_min/v_max. |
None
|
violation_penalty_weight
|
float
|
Weight for soft-penalty mode. When > 0,
|
0.0
|
v_dev_penalty_weight
|
float
|
Weight for voltage-deviation penalty. When > 0,
adds |
0.0
|
loss_penalty_weight
|
float
|
Weight on active-loss reward shaping.
Default reward uses |
0.1
|
reset(*, seed=None, options=None, day_id=None)
¶
Reset distribution grid and run initial power flow.
obs(state=None)
¶
Return flat float32 observation array.
When state is provided, its nodes/lines/time_step are used instead
of the live cache (useful for replaying a past step).
On PF failure the agent receives a penalty observation (below-normal
voltages, zero flows) rather than the pre-divergence state, so that
the catastrophic reward is correctly paired with an out-of-band
observation. See _obs_should_use_failure_fallback() for trigger
conditions.
Observation layout and normalisation: see _build_spaces().
render(mode='human')
¶
Render the distribution grid state.
Produces a two-panel figure (radial network + voltage profile chart).
See :func:powerzoo.envs.grid._render.render_dist_grid for details.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mode
|
str
|
|
'human'
|
powerzoo.envs.grid.dist_3phase.DistGrid3PhaseEnv(case=None, solver=None, delta_t_minutes=30.0, data_loader=None, start_date='2024-01-01', end_date='2024-01-31', load_columns=None, max_load_ratio=0.9, min_load_ratio=None, time_series=None, max_iter=100, tol=1e-06, max_episode_steps=None, randomize_start_time=False, v_min=0.9, v_max=1.1, vuf_max=2.0, difficulty=None, violation_penalty_weight=0.0, loss_penalty_weight=0.1, vuf_dense_penalty_weight=0.0)
¶
Bases: _Dist3PhasePhysicsMixin, _Dist3PhaseLoadsMixin, DistGridEnv
Three-phase distribution grid environment using BIBC/BCBV power flow.
Default case is Case123; handles three-phase unbalanced power flow.
The solver stores a full three-state A/B/C vector at every non-slack
bus, so branch phase availability must already be reflected in the case's
3x3 impedance data rather than inferred from a separate missing-phase
topology.
Solver vectors use node-major ABC order
[node1_A, node1_B, node1_C, node2_A, ...]; the explicit node mapping is
available via self.topo3ph.non_ref_node_ids and
self.topo3ph.node_id_to_matrix_index.
Resource phase injection
Resources carry a phase attribute ('A', 'B', 'C', or 'ABC').
Power is injected only into the connected phase(s), enabling the
RL agent to learn phase-balancing strategies.
Non-convergence contract
cal_pf() still returns the last BFS iterate for debugging, but those
voltages/flows/losses are diagnostic only. RL-facing callers must check
self._converged / info['pf_converged'] and treat a False value
as a power-flow failure rather than a valid operating point.
Initialize three-phase distribution grid environment.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
case
|
ClearCase
|
ClearCase instance (default: Case123). |
None
|
solver
|
Any
|
Optional external solver. |
None
|
delta_t_minutes
|
float
|
Time step length in minutes (default: 30). |
30.0
|
data_loader
|
Optional[DataLoader]
|
DataLoader for external time-series data. |
None
|
start_date
|
str
|
Start date for data loading. |
'2024-01-01'
|
end_date
|
str
|
End date for data loading. |
'2024-01-31'
|
load_columns
|
Optional[List[str]]
|
Columns to load from DataLoader. |
None
|
max_load_ratio
|
float
|
Peak load as fraction of case capacity (default: 0.9). |
0.9
|
min_load_ratio
|
Optional[float]
|
Minimum load ratio (optional). |
None
|
time_series
|
Any
|
Custom time-series (numpy array or DataFrame). |
None
|
max_iter
|
int
|
Maximum iterations for power flow (default: 100). |
100
|
tol
|
float
|
Convergence tolerance (default: 1e-6). |
1e-06
|
max_episode_steps
|
Optional[int]
|
Max steps per episode before truncation. |
None
|
randomize_start_time
|
bool
|
Randomize intra-day start offset on reset. |
False
|
v_min
|
float
|
Minimum voltage limit (p.u., default: 0.90). |
0.9
|
v_max
|
float
|
Maximum voltage limit (p.u., default: 1.10). |
1.1
|
vuf_max
|
float
|
Maximum voltage unbalance factor (%, default: 2.0). IEEE Std 1159 / EN 50160 typical limit is 2%. |
2.0
|
difficulty
|
Optional[str]
|
Preset - 'easy', 'medium', or 'hard'. Overrides v_min/v_max. |
None
|
violation_penalty_weight
|
float
|
Weight for soft-penalty mode (default: 0.0). |
0.0
|
loss_penalty_weight
|
float
|
Weight on active-loss reward shaping
|
0.1
|
vuf_dense_penalty_weight
|
float
|
Weight for dense VUF penalty (default: 0.0).
When > 0, adds
|
0.0
|