Skip to content

Microgrid

DCMicrogridEnv (powerzoo/envs/microgrid/dc_microgrid_env.py) is a self-contained behind-the-meter microgrid environment. Unlike every other PowerZoo env, it has no external grid connection. Power balance is enforced internally; a power deficit is a real CMDP cost, not a setpoint that the bulk grid absorbs.

The env composes existing PowerZoo resources (DataCenterEnv, BatteryEnv) with inline solar PV and a diesel generator:

flowchart LR
    PV[Solar PV\nprofile-driven] --> BUS((DC bus))
    DG[Diesel generator\ndispatchable] --> BUS
    BAT[BatteryEnv\nSOC dynamics] <--> BUS
    BUS --> DC[DataCenterEnv\nIT + cooling + thermal]
    BUS --> RES["residual = p_pv + p_dg + p_batt − p_load"]
    RES -->|"residual < 0"| DEF["cost_power_deficit"]
    RES -->|"residual > 0"| SP["info: power_spill"]

Power balance

At every step, the DC bus enforces:

residual      = p_pv + p_dg + p_batt − p_load
power_deficit = max(−residual, 0)   → cost_power_deficit
power_spill   = max(+residual, 0)   → info only

There is no slack node. A naive policy that overcommits IT load while solar is low and the battery is empty pays the deficit cost directly.

Action and observation spaces

Dim Action Range Meaning
0 train_sched_rate [0, 1] Fraction of training jobs scheduled this step.
1 ft_sched_rate [0, 1] Fraction of finetuning jobs scheduled.
2 cooling_setpoint_norm [0, 1] Normalised cooling-setpoint thermostat.
3 battery_power_norm [-1, 1] Battery power; positive = discharge.
4 dg_power_norm [0, 1] Diesel generator output (off when 0).

The 18-D observation packs IT (CPU / memory utilisation), workload queues (training / finetuning fill, urgency), thermal state (zone temperature, outdoor temperature, COP ratio), generation (solar capacity factor, SOC, diesel headroom), the previous action vector and time encoding (sin / cos):

[cpu_util, mem_util,
 q_train_fill, q_ft_fill, queue_urgency,
 zone_temp_norm, outdoor_temp_norm, cop_ratio,
 solar_cf, soc, dg_margin_norm,
 last_action_norm[5],
 sin(t), cos(t)]

Reward and cost

The scalar reward is a scalarised three-term objective:

\[ r_t \;=\; r_{\text{energy}} \;+\; w_{\text{cost}} \cdot r_{\text{cost}} \;+\; w_{\text{carbon}} \cdot r_{\text{carbon}} \]

with the per-component vector also exposed in info["reward_vector"] = [r_energy, r_cost, r_carbon]. The components are:

  • r_energy = -(p_dc_mw * dt_h) — total IT energy in MWh (negative).
  • r_cost = -(fuel_cost + |p_batt| * dt_h * battery_deg_cost_per_mwh) — fuel + battery wear (negative).
  • r_carbon = -carbon_kg — diesel CO₂ emissions in kg (negative).

The CMDP cost channel uses three separated components:

Key Unit Meaning
info["cost_sla"] count Number of SLA violations this step.
info["cost_overtemp"] °C max(t_zone − t_critical, 0).
info["cost_power_deficit"] max(p_load − p_supply, 0) / max(p_load, 1e-6) (normalised).
info["cost"] Sum of the three components. info["cost_sum"] is a backwards-compat alias.

A typical episode is 288 steps × 5 min = 24 h.

Exogenous profiles

Profiles can be injected via set_profiles(cpu, solar, temp) or at construction time. All are 1-D NumPy float32 arrays cyclically indexed at each step. None falls back to a synthetic diurnal profile.

powerzoo.data.dc_microgrid_profiles provides the canonical profile loader, including OOD transforms used by the benchmark's evaluation splits — see Benchmarks · DC microgrid.

Why this is a distinct benchmark

  • No grid backstop. Power deficit is a hard cost, not a setpoint. Naive policies that overcommit IT load cannot fall back on the grid for the missing power.
  • Multi-objective by construction. info["reward_vector"] is a real three-vector — energy, monetary cost and carbon are not the same thing and cannot be tuned away by a single weight.
  • Heterogeneous action vector. One 5-D action mixes scheduling, thermostat and generation set-points, giving a small-scale benchmark for hybrid control architectures.

See also