DSO — Distribution operations¶

The DSO benchmark assembles Case33bw + 6× FlexLoad + Ausgrid zone-substation load shapes into one ready-to-train environment. The unifying RL question is non-stationary single-agent RL: episodes are driven by real Ausgrid feeder shapes that drift across seasons and across substations, and the agent must use flexible loads to keep voltages in band while minimising network loss.

Vocabulary check. A DSO (Distribution System Operator) runs the medium- and low-voltage feeders that connect end customers; its main RL question is "use flexible loads to keep voltages in band while upstream demand follows real exogenous shapes". FlexLoad is PowerZoo's demand-response resource: each device has a curtailment quota and a deferred-demand buffer. Ausgrid is an Australian distribution utility that publishes 30-minute zone-substation load shapes used as exogenous drivers here.

The benchmark is built as an independent factory (powerzoo.tasks.dso_task.make_dso_env) instead of a register_task entry, because it has its own data pipeline (Ausgrid feeder shapes) and its own task-level CMDP selection (voltage-only from the full distribution-grid cost vector).

flowchart LR
    A["Ausgrid feeder shapes\n(load_dso_feeder_shapes)"] --> B["make_dso_load_matrices\n(case33bw mapping)"]
    B --> C["PowerEnv\n(Case33bw + 6× FlexLoad)"]
    C --> D[FlattenWrapper]
    D --> E["TaskCMDPWrapper\nselected_constraint_costs = voltage_violation"]

Why this suite¶

The only suite driven by real distribution-level zone-substation loads, not by transmission-system aggregates.
Built-in time-driven distribution shift: train, IID-test, summer-OOD-test and zone-holdout splits cover season and substation OOD.
Reward is operational quality (network loss + curtailment) rather than safety or cost — a different objective surface from the other four suites.

Physical setup¶

Aspect	Default value
Underlying env	`Case33bw` (33-bus single-phase BFS distribution).
Resources	6× `FlexLoad` at buses `[6, 14, 18, 22, 28, 33]`.
Voltage limits	`v_min = 0.94 pu`, `v_max = 1.06 pu`.
Episode	48 steps × 30 min = 1 day.

Three feeder segments on Case33bw are driven by distinct Ausgrid feeder shapes (so the load shape varies along the feeder, not just in magnitude):

Feeder	Bus range
`feeder_A`	2 – 18
`feeder_B`	19 – 22
`feeder_C`	23 – 33

This matches the JAX implementation's feeder segmentation exactly.

Agent design¶

Item	Value
Action	`Box(2*N)` = `Box(12,)` — per device `[curtail_fraction, shift_out_fraction]`.
Observation	Flat (via `FlattenWrapper`) — bus voltages, branch loadings, time, per-device FlexLoad state.
Reward	`-loss_penalty_weight * p_loss_MW` (network loss only; default `loss_penalty_weight = 0.1`).
Core constraints	`constraint_names = ("voltage_violation", "thermal_overload", "resource")`; `info["constraint_costs"]` follows this order.
Task constraints	`selected_constraint_names = ("voltage_violation",)` with threshold `(5.0,)` and fallback weight `(1.0,)`.
Scalar compatibility	`info["cost"]` is produced only by safe-RL compatibility wrappers; it is not the core env contract.

A 1-device variant exists for fast iteration:

from powerzoo.tasks.dso_task import make_dso_1flex_env
env = make_dso_1flex_env(bus_id=18)   # action_space = Box(2,)

Splits¶

The Ausgrid pool is split into four roles. train / iid / summer_ood are time splits; zone_holdout swaps in held-out substations on the same dates as train.

Split	Date range	Feeder pool
`train`	2024-05-01 – 2024-11-30	3 substations per feeder (A / B / C).
`iid`	2024-12-01 – 2025-02-28	Same substations as `train`.
`summer_ood`	2025-03-01 – 2025-04-30	Same substations as `train`.
`zone_holdout`	2024-05-01 – 2024-11-30	2 different substations per feeder.

This four-split design lets you measure pure time OOD (summer_ood), pure substation OOD (zone_holdout) and the in-distribution baseline (iid) on the same task.

Code recipe¶

from powerzoo.tasks.dso_task import make_dso_env, make_dso_1flex_env
from powerzoo.data import DataLoader

env = make_dso_env(split="train", data_loader=DataLoader())
obs, info = env.reset(seed=0)
obs, reward, terminated, truncated, info = env.step(env.action_space.sample())

Baselines¶

dso_task.py ships three reference baselines and a metrics helper, so a baseline table needs no extra code:

from powerzoo.tasks.dso_task import (
    make_dso_env,
    dso_no_control_rollout,
    dso_tou_heuristic_rollout,
    dso_droop_heuristic_rollout,
    compute_dso_metrics,
    rollout_dso,
)

env = make_dso_env(split="iid")
no_ctrl = dso_no_control_rollout(env)
tou     = dso_tou_heuristic_rollout(env)
droop   = dso_droop_heuristic_rollout(env)
metrics = compute_dso_metrics(no_ctrl, tou, droop)

rollout_dso(env, policy_fn, n_steps=...) runs a single episode with any callable policy and returns per-step diagnostics.

Metrics to report¶

network_loss_reduction_pct — (loss_no_control - loss_rl) / loss_no_control.
served_flexible_demand_ratio — fraction of the FlexLoad buffer that is actually served (rather than overflowed).
peak_shaving_effectiveness — (peak_no_control - peak_rl) / peak_no_control.
voltage_violation_rate — derived from selected_constraint_costs, and should be near zero.
drift_tracking_gap — NormScore(iid) - NormScore(summer_ood); the primary robustness metric.
NormScore — standard, against the no-control baseline.