Skip to content

DC microgrid

dc_microgrid (and dc_microgrid_safe) is the multi-objective robust RL benchmark in PowerZoo. Both tasks share the same env, DCMicrogridEnv: a self-contained behind-the-meter microgrid (no external grid) composed of DataCenterEnv + BatteryEnv + inline solar PV + diesel generator. The safe task freezes the same vector CMDP contract and exposes a scalar compatibility budget for libraries that still expect one cost.

The physical model — power balance, action / observation / reward vector, cost components — is documented in Physics · Microgrid. This page focuses on the agent-facing benchmark setup: variants, splits, baselines and metrics.

Vocabulary check. PUE (Power Usage Effectiveness) — total facility power / IT equipment power, lower is better. SLA (Service Level Agreement) — workload completion guarantee; missing a deadline is an SLA violation. Pareto front — set of non-dominated (reward1, reward2, …) tuples; multi-objective RL approximates it.

Why this suite

  • Only suite with no grid backstop: power balance is a hard internal constraint, and an over-committed IT load creates a real cost_power_deficit.
  • Only suite with a vector reward exposed in info["reward_vector"] = [r_energy, r_cost, r_carbon]; multi-objective and Pareto methods can use it directly.
  • Only suite with 5-min resolution and a 24-hour horizon (288 steps), exercising long-horizon credit assignment under thermal inertia.
  • Only suite that combines AI workload modelling (training + finetuning + inference queues with EDF scheduling) with energy management.

The two registered tasks

Task CMDP thresholds When to use
dc_microgrid none Standard reward-only training; use the reward vector for multi-objective work.
dc_microgrid_safe (0.2, 0.15, 0.15) for ("sla", "overtemp", "power_deficit"); scalar alias 0.5 Safe-RL / fallback benchmarking against explicit per-constraint budgets.

Both are single-agent Gymnasium envs.

Physical setup

A summary (full details in Physics · Microgrid):

  • Underlying env. DCMicrogridEnv — self-contained, no GridEnv underneath.
  • Resources. DataCenterEnv (1000 GPUs by default), BatteryEnv (2 MWh / 0.5 MW), inline PV (0.4 MW), inline diesel generator (0.6 MW).
  • Episode. 288 steps × 5 min = 24 hours.

Agent design

Item Value
Action Box(5) [r_train, r_finetune, T_cool_norm, P_batt, P_dg]
Observation Box(18) (utilisation, queues, thermal state, generation availability, last action, time)
Reward scalarised r_energy + w_cost · r_cost + w_carbon · r_carbon; vector also in info["reward_vector"]
Core constraints constraint_names = ("sla", "overtemp", "power_deficit"); info["constraint_costs"] follows this order.
Task constraints selected_constraint_costs uses the full vector with thresholds (0.2, 0.15, 0.15) and fallback weights (1.0, 1.0, 1.0).
Scalar compatibility info["cost"] is emitted by compatibility wrappers only; cost_sum remains an aggregate diagnostic alias.

Variants

dc_microgrid exposes a single env config; the main variants are:

  • Reward weighting. Tune w_cost and w_carbon in the env constructor to walk along the Pareto front.
  • Workload OOD. Inject Azure or Alibaba DC traces in place of Google's via set_profiles(...) to test workload distribution shift.
  • Solar drought. Multiply the solar profile by a small constant for several days to test the high-carbon regime.
  • Cooling stress. Add several °C to the outdoor temperature to force the agent into the high-COP regime.

Splits

All three Google DC traces are pooled as the data source, with rolling windows. The convention is:

Split Trace window Purpose
train rotating 30-day windows from Google DC 2019-05+ training
val distinct 7-day windows hyperparameter tuning
test distinct 7-day windows reporting

OOD splits (workload swap to Azure / Alibaba, solar drought, cooling stress) are configured per experiment via set_profiles(...) and the data pipeline's OOD transforms in powerzoo.data.dc_microgrid_profiles.

Code recipe

from powerzoo.tasks import make_task_env

env = make_task_env("dc_microgrid_safe", split="train")
obs, info = env.reset(seed=0)

terminated = truncated = False
while not (terminated or truncated):
    obs, reward, terminated, truncated, info = env.step(env.action_space.sample())
    print(info["reward_vector"], info["selected_constraint_costs"])

For a scalar Safe-RL-library projection, keep the Gym 5-tuple and let the wrapper write compatibility info["cost"]:

from powerzoo.rl import Trainer

t = Trainer("dc_microgrid_safe", algorithm="SAC", total_timesteps=2_000_000,
            safe_rl=True, cost_threshold=0.5)
t.train()
results = t.evaluate(split="test")

For multi-objective training, use a custom loop that reads info["reward_vector"] (see Training · Custom loops).

Baselines

  • RandomRandomPolicy. Usually infeasible (large cost_power_deficit).
  • EDF + fixed cooling — earliest-deadline-first scheduler with a fixed thermostat. The "rule-based" baseline.
  • Solar-aware heuristic — schedule training when solar is high, conservative otherwise.
  • Perfect-foresight LP / MILP planner — offline upper bound on the scalar reward.

Metrics to report

Multi-objective tasks need the full vector. Primary metrics:

  • total_energy_mwh — facility energy.
  • total_operating_cost_$ — fuel + battery wear.
  • total_carbon_kg — diesel emissions.
  • Per-constraint episode costs for sla, overtemp, and power_deficit.
  • sla_violation_rate, thermal_safety_rate, power_balance_deficit_mwh — the three operational constraint axes.
  • renewable_utilization_pct, battery_cycling_count, diesel_runtime_hours — diagnostics.
  • pareto_hypervolume — for multi-objective methods.
  • robustness_gapperf(IID) - perf(workload_swap).

See also