Skip to content

Presets

This page collects ready-to-use RLConfig YAML templates, one per benchmark suite. Drop a template into your experiment folder, edit the few fields that matter, and run:

from powerzoo.rl import RLConfig, Trainer

cfg = RLConfig.from_yaml("battery_sac.yaml")
Trainer(cfg).train().evaluate(split="test")

All YAMLs share the same schema (task / wrappers / reward / trainer / framework / seed); see Trainers for the field semantics.

Single-agent — battery_arbitrage

task:
  name: battery_arbitrage
  split: train
wrappers:
  normalize: true
  forecast_horizon: 6
  safe_rl: false
trainer:
  algorithm: SAC
  total_timesteps: 200000
  hyperparams:
    learning_rate: 0.0003
    buffer_size: 200000
    batch_size: 256
    tau: 0.01
  save_path: ./results/battery_sac/
framework: auto
seed: 42

Single-agent — dc_scheduling

task:
  name: dc_scheduling
  split: train
wrappers:
  normalize: true
  safe_rl: true
  cost_threshold: 5.0
trainer:
  algorithm: PPO
  total_timesteps: 1000000
  hyperparams:
    learning_rate: 0.0003
    n_steps: 2048
    n_epochs: 10
  save_path: ./results/dc_scheduling_ppo/
framework: auto
seed: 0

DC microgrid — dc_microgrid_safe (Safe RL)

task:
  name: dc_microgrid_safe
  split: train
wrappers:
  normalize: true
  safe_rl: true
  cost_threshold: 0.5
trainer:
  algorithm: SAC
  total_timesteps: 2000000
  hyperparams:
    learning_rate: 0.0003
    gamma: 0.999
    buffer_size: 500000
  save_path: ./results/dc_microgrid_safe_sac/
framework: auto
seed: 0

TSO — marl_uc (independent learners)

task:
  name: marl_uc
  split: train
trainer:
  algorithm: PPO
  total_timesteps: 3000000
  hyperparams:
    learning_rate: 0.0003
    n_steps: 1024
  save_path: ./results/marl_uc_ippo/
framework: pettingzoo
seed: 0

Train with:

Trainer(cfg).train_il()

TSO — opf_118 (large-scale ED, IL)

task:
  name: opf_118
  split: train
trainer:
  algorithm: PPO
  total_timesteps: 10000000
  hyperparams:
    learning_rate: 0.0003
    n_steps: 2048
    batch_size: 256
  save_path: ./results/opf_118_ippo/
framework: pettingzoo
seed: 0

DERs — marl_der_arbitrage (simultaneous SAC)

task:
  name: marl_der_arbitrage
  split: train
wrappers:
  safe_rl: true
  cost_threshold: 0.5
trainer:
  algorithm: SAC
  total_timesteps: 1500000
  hyperparams:
    learning_rate: 0.0003
    buffer_size: 500000
  save_path: ./results/marl_der_sac/
framework: pettingzoo
seed: 0

Train with:

Trainer(cfg).train_marl_simultaneous()

DERs — marl_ev_v2g (long horizon)

task:
  name: marl_ev_v2g
  split: train
trainer:
  algorithm: SAC
  total_timesteps: 3000000
  hyperparams:
    gamma: 0.999
    buffer_size: 500000
  save_path: ./results/marl_ev_sac/
framework: pettingzoo
seed: 0

GenCos — gencos_bidding (independent PPO)

task:
  name: gencos_bidding
  split: train
trainer:
  algorithm: PPO
  total_timesteps: 5000000
  hyperparams:
    learning_rate: 0.0003
    n_steps: 1024
  save_path: ./results/gencos_ippo/
framework: pettingzoo
seed: 0

DSO — make_dso_env(...) (factory, no YAML loader)

The DSO benchmark uses a direct factory rather than RLConfig. The equivalent of a "preset" is a single Python function call:

from powerzoo.tasks.dso_task import make_dso_env
from powerzoo.data import DataLoader
from stable_baselines3 import PPO

env = make_dso_env(split="train", data_loader=DataLoader())
model = PPO(
    "MlpPolicy", env,
    learning_rate=3e-4, n_steps=2048, batch_size=64, verbose=1,
)
model.learn(total_timesteps=3_000_000)
model.save("results/dso_ppo")

For a Recurrent PPO baseline (handles non-stationarity better), wrap your own loop using the SB3 RecurrentPPO from sb3-contrib.

Common modifications

  • More episodes per training run. Tune total_timesteps. Single-agent tasks converge in 200k–2M steps; large MARL needs 5M–20M.
  • Different algorithm. Set trainer.algorithm to SAC, PPO or TD3. PPO is the safe default for MARL; SAC is usually best for single-agent continuous control.
  • Forecast study. Set wrappers.forecast_horizon to 0, 6 or 24 to compare no-forecast vs short-horizon vs full-day forecast policies on the same task.
  • Safe-RL study. Set wrappers.safe_rl: true and a cost_threshold consistent with the task's typical violation magnitude (read it off from a random rollout).

See also

  • TrainersTrainer and make_env reference.
  • Wrappers — what each wrapper field does.
  • Custom loops — when a YAML preset is not enough.
  • Benchmarks — per-suite hyperparameter recommendations.