跳转至

电网环境

powerzoo.envs.base.BaseEnv(delta_t_minutes=1.0)

Bases: Env, ABC

Base environment for PowerZoo.

Provides: - np_random: Gymnasium's seeded RNG, initialised through super().reset(seed=seed, options=options). - observation_space / action_space placeholders (None until subclass initialises them in __init__). - Abstract step / obs methods that subclasses must implement. - Default reward / cost hooks for reward-shaping and CMDP cost.

Seed contract

Subclasses should call super().reset(seed=seed, options=options) as the first line of their reset() implementation. This delegates RNG management to Gymnasium and preserves its seed=None semantics: reuse the current generator when one already exists.

reset(*, seed=None, options=None)

Apply common Gymnasium reset bookkeeping.

The base implementation seeds np_random (via super().reset), resets time_step to 0, and returns the placeholder pair (None, {}).

Subclass contract
  • Call super().reset(seed=seed, options=options) as the first line — this seeds self.np_random and resets the clock.
  • After super(), build and return a valid (observation, info) pair that satisfies self.observation_space.
  • Never return the base placeholder (None, {}) from a concrete environment — downstream wrappers and RL libraries will fail on None observations.

step(action) abstractmethod

Apply action and return (obs, reward, terminated, truncated, info).

Subclasses are responsible for advancing self.time_step inside step() and for computing truncated from their own episode-limit policy (for example max_episode_steps).

  1. Action decode — map the RL agent's action (e.g. normalised values in [-1, 1]) to physical setpoints (MW, MVar, °C …).
  2. Physics — run power flow / OPF / dynamic simulation for the current time step.
  3. State extract — read voltages, flows, costs, and resource state-of-charge from the solver result.
  4. Observation — call self.obs(state) to produce the agent- facing observation array.
  5. Reward / cost — compute scalar reward and, if applicable, CMDP cost terms; populate info with cost_* keys.
  6. Clock advance — increment self.time_step; set truncated when the episode step limit is reached.

obs(state) abstractmethod

Convert internal state dict to a flat observation array.

Subclasses should implement this to transform the internal full physical state into the agent-facing observation, which may be local, partial, noisy, or otherwise filtered relative to the simulator state.

reward(r, state, info)

Return the scalar reward after optional reward-shaping adjustments.

This is a lightweight base-class hook for reward shaping. Concrete environments or wrappers can override it to transform the scalar reward while leaving the main transition logic elsewhere.

For non-trivial reward shaping, prefer using a gym.Wrapper instead; this hook is intended as a lightweight escape hatch for environment-internal adjustments only.


powerzoo.envs.grid.base.GridEnv(case=None, solver=None, delta_t_minutes=30.0, data_loader=None, start_date='2024-01-01', end_date='2024-01-31', load_columns=None, max_load_ratio=0.9, min_load_ratio=None, time_series=None, max_episode_steps=None, randomize_start_time=False, time_alignment=None)

Bases: BaseEnv

Base power grid environment (inner physical simulation engine).

GridEnv acts as the physical state machine layer of the PowerZoo stack. Its reset() and step() return a raw state dict (the full power-flow solution) rather than a flat numpy observation array, because the state is consumed by upper-layer components that each need different subsets of it:

  • PowerEnv — the benchmark-facing RL facade. Calls grid.step() to advance the simulation, then passes the returned state dict to grid.obs(state) and to each resource's obs() to assemble the agent observation.
  • ObsWrapper / gym_wrappers.GymWrapper — thin Gym-compatibility wrappers that call env.obs(state) directly on a GridEnv subclass for single-grid use-cases that don't need PowerEnv.

Because of this design, passing a bare GridEnv subclass to standard Gym validation tools (e.g. gymnasium.utils.env_checker.check_env) will produce an observation-type warning — the raw state dict does not conform to observation_space. Always wrap the env in PowerEnv or a Gym wrapper before running a standard RL training loop or check.

Subclasses must implement

  • _run_power_flow(action) — run the solver; return True on success.
  • _get_state() — build and return the full state dict after a solve.
  • obs(state) — project the state dict to a flat float32 array matching observation_space.
  • build_info(state) — build the step info dict.

Initialize grid environment

Parameters:

Name Type Description Default
case Any

Power system case

None
solver Any

Power flow solver

None
delta_t_minutes float

Time step in minutes (default: 30)

30.0
data_loader Optional[DataLoader]

DataLoader instance. If None, creates default DataLoader. Ignored when time_series is provided.

None
start_date str

Start date for data loading (format: 'YYYY-MM-DD'). Ignored when time_series is provided.

'2024-01-01'
end_date str

End date for data loading (format: 'YYYY-MM-DD'). Ignored when time_series is provided.

'2024-01-31'
load_columns Optional[List[str]]

Semantic signal names to load. Default: [signals.LOAD_ACTUAL_MW, signals.SOLAR_AVAILABLE_MW, signals.WIND_AVAILABLE_MW]. Distribution envs may also request signals.LOAD_REACTIVE_MVAR to provide an explicit feeder reactive-demand time series. Legacy raw column names ('ActualDemand', 'Wind', …) are auto-mapped with a deprecation warning. Ignored when time_series is provided.

None
max_load_ratio float

Maximum load as ratio of total generation capacity (default: 0.9)

0.9
min_load_ratio Optional[float]

Minimum load as ratio of total generation capacity (default: None)

None
time_series Any

Custom time-series data supplied directly by the user. Accepted formats: - pandas.DataFrame: must have a DatetimeIndex (or 'datetime' column) and at least a signals.LOAD_ACTUAL_MW (or legacy 'ActualDemand') column. An optional signals.LOAD_REACTIVE_MVAR (or legacy 'ReactiveDemand') column is used by distribution envs as an explicit Q-demand series. - numpy.ndarray of shape (T,) or (T, 1): treated as a single-column load.actual_mw time series. A synthetic DatetimeIndex starting at start_date is created automatically. When provided, data_loader / start_date / end_date / load_columns are NOT used for loading data (max_load_ratio and min_load_ratio still apply for scaling).

None
max_episode_steps Optional[int]

Maximum steps per episode before truncated=True. Defaults to one full day (steps_per_day).

None
randomize_start_time bool

When True, choose a random intra-day starting offset at each reset instead of always beginning at time_step=0. Increases initial-state diversity from O(n_days) to O(n_timesteps). Default False to preserve backward compatibility (F5 fix).

False
time_alignment Optional[Dict[str, str]]

Per-signal time-alignment overrides for cross-period data. Example: {"solar.available_mw": "2024-01-01"} maps the solar source's 2024 data onto the simulation's start_date. Profile-mode datasets (e.g. data-center traces) are tiled automatically and do not need an override.

None

reset(*, seed=None, options=None, day_id=None)

Reset grid environment and all sub-resources.

Follows the Gymnasium v26+ convention: seed is the primary reproducibility parameter; day_id (legacy kwarg) or options={'day_id': N} both work for selecting a specific day.

Parameters:

Name Type Description Default
seed Optional[int]

Random seed. Passed through super().reset(seed=seed, options=options) so Gymnasium manages self.np_random.

None
options Optional[Dict]

Optional dict. Recognised keys: day_id (int) – override which day to simulate.

None
day_id Optional[int]

Legacy kwarg; takes precedence over options['day_id'].

None

step(action=None)

Execute one time step

Parameters:

Name Type Description Default
action Any

Dict containing: - resource actions (key=resource_id, value=resource action) - grid control parameters (subclass-specific, e.g., unit_power_mw, node_load_mw)

None

Returns:

Type Description

state, reward, done, truncated, info

obs(state=None)

Convert a raw state dict to a flat float32 observation array.

This method is part of GridEnv's public API and is called by:

  • PowerEnv._build_agent_observation(state) — assembles the combined grid + resource + time observation for the RL agent.
  • GymWrapper (powerzoo.wrappers.gym_wrappers) — provides a thin Gym-compliant wrapper for direct single-grid use.

Subclasses must implement this and define self.observation_space to match the returned array's shape and dtype. The default raises NotImplementedError.

register_resource(resource, bus_id, name=None)

Register a resource and assign unique ID.

Calls _on_resource_changed() after updating internal state so that subclasses can rebuild observation_space and action_space.

Parameters:

Name Type Description Default
resource Any

Resource instance to register

required
bus_id int

Bus ID where resource is connected

required
name Optional[str]

Optional custom name. If None, auto-generated (e.g., 'solar_0')

None

Returns:

Name Type Description
resource_id str

The assigned resource ID

unregister_resource(resource_id)

Unregister a resource.

Calls _on_resource_changed() after updating internal state.

cal_pf(*args, **kwargs)

Run power flow and return result dict (must be implemented).

If the solver does not converge, implementations should return a result dict with converged=False (or equivalent flag) rather than raising an exception, so that downstream safety_check and reward/cost logic can handle divergence gracefully.

safety_check(*args, **kwargs)

Check physical constraints (must be implemented).

Implementations must be robust to non-converged or NaN-laden states (e.g. when cal_pf did not converge). In such cases the method should report maximum violation / unsafe status instead of crashing.


powerzoo.envs.grid.trans.TransGridEnv(case=None, solver=None, delta_t_minutes=30.0, data_loader=None, start_date='2024-01-01', end_date='2024-01-31', load_columns=None, max_load_ratio=0.9, min_load_ratio=None, time_series=None, max_episode_steps=None, randomize_start_time=False, physics='dc', solver_mode='opf', solver_type='auto', normalize_actions=True, ac_config=None, difficulty=None, reward_scale=0.01, control_der=False)

Bases: GridEnv

Transmission grid environment.

Two orthogonal parameters control the solver behaviour:

  • physics'dc' (linearised, P only) or 'ac' (full AC with voltage and reactive power).
  • solver_mode'opf' (environment runs OPF internally; RL agent provides bids / commitments) or 'pf' (environment evaluates power flow only; RL agent provides unit dispatch directly).

The four resulting modes are DCOPF, ACOPF, DCPF, and ACPF.

Default case is Case5.

Initialize transmission grid environment.

Two orthogonal parameters control the solver behaviour:

  • physics — physical model: 'ac' (full AC equations) or 'dc' (linearised DC approximation).
  • solver_mode — solver role: 'opf' (environment runs optimal power flow internally; RL agent provides bids / commitments) or 'pf' (environment only evaluates power flow; RL agent provides unit dispatch directly via action['unit_power_mw']).

The four combinations are:

======= =========== ========== ======================================= physics solver_mode Solver RL use-case ======= =========== ========== ======================================= dc opf DCOPF Agent learns bidding / UC strategy ac opf ACOPF Agent learns bidding (with voltage) dc pf DCPF Agent learns dispatch (P only) ac pf ACPF (NR) Agent learns dispatch (P + V + Q) ======= =========== ========== =======================================

Parameters:

Name Type Description Default
case ClearCase

Power system case (default: Case5)

None
solver Any

Power flow solver

None
delta_t_minutes float

Time step in minutes (default: 30)

30.0
data_loader Optional[DataLoader]

DataLoader instance

None
start_date str

Start date for data loading

'2024-01-01'
end_date str

End date for data loading

'2024-01-31'
load_columns Optional[List[str]]

List of columns to load

None
max_load_ratio float

Maximum load as ratio of total capacity

0.9
min_load_ratio Optional[float]

Minimum load as ratio of total capacity (optional)

None
time_series Any

Custom time-series data

None
max_episode_steps Optional[int]

Max steps before truncation (default: one full day)

None
physics str

Physical model — 'dc' (default) or 'ac'.

'dc'
solver_mode str

Solver role — 'opf' (default, environment optimises) or 'pf' (agent provides dispatch, environment evaluates).

'opf'
solver_type str

OPF LP solver - 'auto', 'gurobi', 'scipy', 'cvxpy'.

'auto'
normalize_actions bool

Whether to normalise actions to [-1, 1].

True
ac_config ACConfig

AC solver parameters as an :class:ACConfig object. Defaults to ACConfig() (standard voltage limits, built-in solver). See :class:ACConfig for all configurable fields.

None
difficulty Optional[str]

Preset difficulty level - 'easy', 'medium', or 'hard'. When set, overrides delta_t_minutes and max_load_ratio.

None
reward_scale float

Multiplicative scaling factor applied to generation cost in the reward signal (economic_cost = -reward_scale * gen_cost). Default 0.01 is calibrated for Case5 (typical cost ~100-1000 $/step). For larger systems (e.g. Case118) where cost can reach 10⁵-10⁶, pass a smaller value such as 0.0001 to keep rewards in a well-conditioned range for policy gradient algorithms.

0.01
control_der bool

When True, flatten DER control dimensions into the environment's action_space. In PF modes the flat action becomes [unit_power_mw (n_units), der_actions (n_resources)]. In OPF modes with registered DER, the flat action contains only der_actions so ndarray agents cannot silently bypass the OPF by injecting unit_power_mw. Agents can then control DER through a single flat Box. DER states are always visible in observation_space regardless of this flag (when resources are registered). Default False (DER run as non-dispatchable injections).

False

reset(*, seed=None, options=None, day_id=None)

Reset transmission grid and run initial power flow.

obs(state=None)

Convert current (or provided) state to a flat float32 observation array.

Parameters:

Name Type Description Default
state Any

Internal state dict (as returned by _get_state). If None, the most recently cached state is used.

None

Returns:

Type Description
ndarray

numpy.ndarray of shape observation_space.shape and dtype float32.

cal_pf(unit_power_mw, node_load_mw, df=False)

Thin wrapper — see :func:trans_solve.cal_pf.

Side effects: updates self._power_imbalance_mw and self._slack_gen_violation_mw.

safety_check(line_flow_mw, with_info=False)

Thin wrapper — see :func:trans_solve.safety_check.

render(mode='human')

Render the transmission grid state.

Produces a two-panel figure (network topology + unit dispatch chart). See :func:powerzoo.envs.grid._render.render_trans_grid for details.

Parameters:

Name Type Description Default
mode str

'human' (interactive) or 'rgb_array' (returns a (H, W, 3) uint8 ndarray without opening a window).

'human'

powerzoo.envs.grid.dist.DistGridEnv(case=None, solver=None, delta_t_minutes=30.0, data_loader=None, start_date='2024-01-01', end_date='2024-01-31', load_columns=None, max_load_ratio=0.9, min_load_ratio=None, time_series=None, max_iter=100, tol=1e-06, max_episode_steps=None, randomize_start_time=False, v_slack=1.0, v_min=0.9, v_max=1.1, allow_mesh_pruning=True, difficulty=None, violation_penalty_weight=0.0, v_dev_penalty_weight=0.0, loss_penalty_weight=0.1)

Bases: _DistPFMixin, _DistLoadsMixin, GridEnv

Distribution grid environment using BFS (Backward/Forward Sweep) power flow.

Default case is Case33bw; handles both active (P) and reactive (Q) power. The physical core is a single-phase balanced radial DistFlow solver: explicit voltage-angle states, phase coupling, and unbalance are outside this benchmark surface.

Naming conventions (consistent with MATPOWER): baseMVA, baseKV — system base values slack_bus_id — slack/reference bus index (0-based); alias: ref_bus v_slack — slack bus voltage setpoint (p.u.); alias: v_ref_mag p_flow_MW — sending-end (from-bus) active power flow on each branch p_loss_MW — active power loss (I²R) on each branch p_slack_MW — total slack-bus active-power injection into the feeder q_slack_MVAr — total slack-bus reactive-power injection into the feeder is_diverged — BFS failed to satisfy the iteration tolerance before hitting max_iter (distinct from voltage collapse)

Resource control mode

All non-slack buses are treated as PQ buses by the BFS solver. When a resource is registered via register_resource(), it operates in PQ control mode: its P/Q setpoints are subtracted from the nodal load before the BFS solve.

Action space: only gymnasium.spaces.Box (continuous) is supported.

Initialize distribution grid environment

Parameters:

Name Type Description Default
case ClearCase

ClearCase instance (default: Case33bw)

None
solver Any

Optional external solver

None
delta_t_minutes float

Time step length in minutes (default: 30).

30.0
data_loader Optional[DataLoader]

DataLoader for external time-series data.

None
start_date str

Start date for data loading.

'2024-01-01'
end_date str

End date for data loading.

'2024-01-31'
load_columns Optional[List[str]]

Columns to load from DataLoader. Distribution envs also honour an optional load.reactive_mvar signal when provided.

None
max_load_ratio float

Peak load as fraction of case capacity (default: 0.9).

0.9
min_load_ratio Optional[float]

Minimum load ratio (optional).

None
time_series Any

Custom time-series (numpy array or DataFrame).

None
max_iter int

Maximum iterations for power flow (default: 100)

100
tol float

Convergence tolerance (default: 1e-6)

1e-06
max_episode_steps Optional[int]

Max steps per episode before truncation.

None
randomize_start_time bool

Randomize intra-day start offset on reset.

False
v_slack float

Slack bus voltage setpoint in p.u. (default: 1.0).

1.0
v_min float

Minimum voltage limit (p.u., default: 0.90).

0.9
v_max float

Maximum voltage limit (p.u., default: 1.10).

1.1
allow_mesh_pruning bool

If True (default), extra lines in a non-radial input are pruned to the BFS first-visit spanning tree with a warning. If False, env initialization fails fast on mesh input.

True
difficulty Optional[str]

Preset - 'easy', 'medium', or 'hard'. Overrides v_min/v_max.

None
violation_penalty_weight float

Weight for soft-penalty mode. When > 0, cost_voltage_violation and cost_thermal_overload are added as negative reward terms (standard RL mode). When 0 (default), violations are exposed only via info cost fields (CMDP mode).

0.0
v_dev_penalty_weight float

Weight for voltage-deviation penalty. When > 0, adds -v_dev_penalty_weight * mean((v - 1.0)²) to reward. MSE penalises large deviations quadratically with a natural soft deadband. Useful for Volt-VAR / voltage regulation tasks. When 0 (default), no voltage-deviation term is added.

0.0
loss_penalty_weight float

Weight on active-loss reward shaping. Default reward uses -0.1 * p_loss_MW.

0.1

reset(*, seed=None, options=None, day_id=None)

Reset distribution grid and run initial power flow.

obs(state=None)

Return flat float32 observation array.

When state is provided, its nodes/lines/time_step are used instead of the live cache (useful for replaying a past step).

On PF failure the agent receives a penalty observation (below-normal voltages, zero flows) rather than the pre-divergence state, so that the catastrophic reward is correctly paired with an out-of-band observation. See _obs_should_use_failure_fallback() for trigger conditions.

Observation layout and normalisation: see _build_spaces().

render(mode='human')

Render the distribution grid state.

Produces a two-panel figure (radial network + voltage profile chart). See :func:powerzoo.envs.grid._render.render_dist_grid for details.

Parameters:

Name Type Description Default
mode str

'human' (interactive) or 'rgb_array' (returns a (H, W, 3) uint8 ndarray without opening a window).

'human'

powerzoo.envs.grid.dist_3phase.DistGrid3PhaseEnv(case=None, solver=None, delta_t_minutes=30.0, data_loader=None, start_date='2024-01-01', end_date='2024-01-31', load_columns=None, max_load_ratio=0.9, min_load_ratio=None, time_series=None, max_iter=100, tol=1e-06, max_episode_steps=None, randomize_start_time=False, v_min=0.9, v_max=1.1, vuf_max=2.0, difficulty=None, violation_penalty_weight=0.0, loss_penalty_weight=0.1, vuf_dense_penalty_weight=0.0)

Bases: _Dist3PhasePhysicsMixin, _Dist3PhaseLoadsMixin, DistGridEnv

Three-phase distribution grid environment using BIBC/BCBV power flow.

Default case is Case123; handles three-phase unbalanced power flow. The solver stores a full three-state A/B/C vector at every non-slack bus, so branch phase availability must already be reflected in the case's 3x3 impedance data rather than inferred from a separate missing-phase topology. Solver vectors use node-major ABC order [node1_A, node1_B, node1_C, node2_A, ...]; the explicit node mapping is available via self.topo3ph.non_ref_node_ids and self.topo3ph.node_id_to_matrix_index.

Resource phase injection

Resources carry a phase attribute ('A', 'B', 'C', or 'ABC'). Power is injected only into the connected phase(s), enabling the RL agent to learn phase-balancing strategies.

Non-convergence contract

cal_pf() still returns the last BFS iterate for debugging, but those voltages/flows/losses are diagnostic only. RL-facing callers must check self._converged / info['pf_converged'] and treat a False value as a power-flow failure rather than a valid operating point.

Initialize three-phase distribution grid environment.

Parameters:

Name Type Description Default
case ClearCase

ClearCase instance (default: Case123).

None
solver Any

Optional external solver.

None
delta_t_minutes float

Time step length in minutes (default: 30).

30.0
data_loader Optional[DataLoader]

DataLoader for external time-series data.

None
start_date str

Start date for data loading.

'2024-01-01'
end_date str

End date for data loading.

'2024-01-31'
load_columns Optional[List[str]]

Columns to load from DataLoader.

None
max_load_ratio float

Peak load as fraction of case capacity (default: 0.9).

0.9
min_load_ratio Optional[float]

Minimum load ratio (optional).

None
time_series Any

Custom time-series (numpy array or DataFrame).

None
max_iter int

Maximum iterations for power flow (default: 100).

100
tol float

Convergence tolerance (default: 1e-6).

1e-06
max_episode_steps Optional[int]

Max steps per episode before truncation.

None
randomize_start_time bool

Randomize intra-day start offset on reset.

False
v_min float

Minimum voltage limit (p.u., default: 0.90).

0.9
v_max float

Maximum voltage limit (p.u., default: 1.10).

1.1
vuf_max float

Maximum voltage unbalance factor (%, default: 2.0). IEEE Std 1159 / EN 50160 typical limit is 2%.

2.0
difficulty Optional[str]

Preset - 'easy', 'medium', or 'hard'. Overrides v_min/v_max.

None
violation_penalty_weight float

Weight for soft-penalty mode (default: 0.0).

0.0
loss_penalty_weight float

Weight on active-loss reward shaping -loss_penalty_weight * p_loss_MW (default: 0.1).

0.1
vuf_dense_penalty_weight float

Weight for dense VUF penalty (default: 0.0). When > 0, adds -vuf_dense_penalty_weight * max(max_vuf_percent - 0.75 * vuf_max, 0) / 100 to the reward at every step. This keeps a deadband over the benign low-VUF region (default: 1.5 % when vuf_max=2.0), so the dense shaping only activates near the safety boundary. Operates independently of violation_penalty_weight.

0.0