Skip to content

DSO — Distribution operations

The DSO benchmark assembles Case33bw + 6× FlexLoad + Ausgrid zone-substation load shapes into one ready-to-train environment. The unifying RL question is non-stationary single-agent RL: episodes are driven by real Ausgrid feeder shapes that drift across seasons and across substations, and the agent must use flexible loads to keep voltages in band while minimising network loss.

Vocabulary check. A DSO (Distribution System Operator) runs the medium- and low-voltage feeders that connect end customers; its main RL question is "use flexible loads to keep voltages in band while upstream demand follows real exogenous shapes". FlexLoad is PowerZoo's demand-response resource: each device has a curtailment quota and a deferred-demand buffer. Ausgrid is an Australian distribution utility that publishes 30-minute zone-substation load shapes used as exogenous drivers here.

The benchmark is built as an independent factory (powerzoo.tasks.dso_task.make_dso_env) instead of a register_task entry, because it has its own data pipeline (Ausgrid feeder shapes) and its own task-level CMDP selection (voltage-only from the full distribution-grid cost vector).

flowchart LR
    A["Ausgrid feeder shapes\n(load_dso_feeder_shapes)"] --> B["make_dso_load_matrices\n(case33bw mapping)"]
    B --> C["PowerEnv\n(Case33bw + 6× FlexLoad)"]
    C --> D[FlattenWrapper]
    D --> E["TaskCMDPWrapper\nselected_constraint_costs = voltage_violation"]

Why this suite

  • The only suite driven by real distribution-level zone-substation loads, not by transmission-system aggregates.
  • Built-in time-driven distribution shift: train, IID-test, summer-OOD-test and zone-holdout splits cover season and substation OOD.
  • Reward is operational quality (network loss + curtailment) rather than safety or cost — a different objective surface from the other four suites.

Physical setup

Aspect Default value
Underlying env Case33bw (33-bus single-phase BFS distribution).
Resources FlexLoad at buses [6, 14, 18, 22, 28, 33].
Voltage limits v_min = 0.94 pu, v_max = 1.06 pu.
Episode 48 steps × 30 min = 1 day.

Three feeder segments on Case33bw are driven by distinct Ausgrid feeder shapes (so the load shape varies along the feeder, not just in magnitude):

Feeder Bus range
feeder_A 2 – 18
feeder_B 19 – 22
feeder_C 23 – 33

This matches the JAX implementation's feeder segmentation exactly.

Agent design

Item Value
Action Box(2*N) = Box(12,) — per device [curtail_fraction, shift_out_fraction].
Observation Flat (via FlattenWrapper) — bus voltages, branch loadings, time, per-device FlexLoad state.
Reward -loss_penalty_weight * p_loss_MW (network loss only; default loss_penalty_weight = 0.1).
Core constraints constraint_names = ("voltage_violation", "thermal_overload", "resource"); info["constraint_costs"] follows this order.
Task constraints selected_constraint_names = ("voltage_violation",) with threshold (5.0,) and fallback weight (1.0,).
Scalar compatibility info["cost"] is produced only by safe-RL compatibility wrappers; it is not the core env contract.

A 1-device variant exists for fast iteration:

from powerzoo.tasks.dso_task import make_dso_1flex_env
env = make_dso_1flex_env(bus_id=18)   # action_space = Box(2,)

Splits

The Ausgrid pool is split into four roles. train / iid / summer_ood are time splits; zone_holdout swaps in held-out substations on the same dates as train.

Split Date range Feeder pool
train 2024-05-01 – 2024-11-30 3 substations per feeder (A / B / C).
iid 2024-12-01 – 2025-02-28 Same substations as train.
summer_ood 2025-03-01 – 2025-04-30 Same substations as train.
zone_holdout 2024-05-01 – 2024-11-30 2 different substations per feeder.

This four-split design lets you measure pure time OOD (summer_ood), pure substation OOD (zone_holdout) and the in-distribution baseline (iid) on the same task.

Code recipe

from powerzoo.tasks.dso_task import make_dso_env, make_dso_1flex_env
from powerzoo.data import DataLoader

env = make_dso_env(split="train", data_loader=DataLoader())
obs, info = env.reset(seed=0)
obs, reward, terminated, truncated, info = env.step(env.action_space.sample())

Baselines

dso_task.py ships three reference baselines and a metrics helper, so a baseline table needs no extra code:

from powerzoo.tasks.dso_task import (
    make_dso_env,
    dso_no_control_rollout,
    dso_tou_heuristic_rollout,
    dso_droop_heuristic_rollout,
    compute_dso_metrics,
    rollout_dso,
)

env = make_dso_env(split="iid")
no_ctrl = dso_no_control_rollout(env)
tou     = dso_tou_heuristic_rollout(env)
droop   = dso_droop_heuristic_rollout(env)
metrics = compute_dso_metrics(no_ctrl, tou, droop)

rollout_dso(env, policy_fn, n_steps=...) runs a single episode with any callable policy and returns per-step diagnostics.

Metrics to report

  • network_loss_reduction_pct(loss_no_control - loss_rl) / loss_no_control.
  • served_flexible_demand_ratio — fraction of the FlexLoad buffer that is actually served (rather than overflowed).
  • peak_shaving_effectiveness(peak_no_control - peak_rl) / peak_no_control.
  • voltage_violation_rate — derived from selected_constraint_costs, and should be near zero.
  • drift_tracking_gapNormScore(iid) - NormScore(summer_ood); the primary robustness metric.
  • NormScore — standard, against the no-control baseline.

See also