Presets¶
This page collects ready-to-use RLConfig YAML templates, one per benchmark suite. Drop a template into your experiment folder, edit the few fields that matter, and run:
from powerzoo.rl import RLConfig, Trainer
cfg = RLConfig.from_yaml("battery_sac.yaml")
Trainer(cfg).train().evaluate(split="test")
All YAMLs share the same schema (task / wrappers / reward / trainer / framework / seed); see Trainers for the field semantics.
Single-agent — battery_arbitrage¶
task:
name: battery_arbitrage
split: train
wrappers:
normalize: true
forecast_horizon: 6
safe_rl: false
trainer:
algorithm: SAC
total_timesteps: 200000
hyperparams:
learning_rate: 0.0003
buffer_size: 200000
batch_size: 256
tau: 0.01
save_path: ./results/battery_sac/
framework: auto
seed: 42
Single-agent — dc_scheduling¶
task:
name: dc_scheduling
split: train
wrappers:
normalize: true
safe_rl: true
cost_threshold: 5.0
trainer:
algorithm: PPO
total_timesteps: 1000000
hyperparams:
learning_rate: 0.0003
n_steps: 2048
n_epochs: 10
save_path: ./results/dc_scheduling_ppo/
framework: auto
seed: 0
DC microgrid — dc_microgrid_safe (Safe RL)¶
task:
name: dc_microgrid_safe
split: train
wrappers:
normalize: true
safe_rl: true
cost_threshold: 0.5
trainer:
algorithm: SAC
total_timesteps: 2000000
hyperparams:
learning_rate: 0.0003
gamma: 0.999
buffer_size: 500000
save_path: ./results/dc_microgrid_safe_sac/
framework: auto
seed: 0
TSO — marl_uc (independent learners)¶
task:
name: marl_uc
split: train
trainer:
algorithm: PPO
total_timesteps: 3000000
hyperparams:
learning_rate: 0.0003
n_steps: 1024
save_path: ./results/marl_uc_ippo/
framework: pettingzoo
seed: 0
Train with:
TSO — opf_118 (large-scale ED, IL)¶
task:
name: opf_118
split: train
trainer:
algorithm: PPO
total_timesteps: 10000000
hyperparams:
learning_rate: 0.0003
n_steps: 2048
batch_size: 256
save_path: ./results/opf_118_ippo/
framework: pettingzoo
seed: 0
DERs — marl_der_arbitrage (simultaneous SAC)¶
task:
name: marl_der_arbitrage
split: train
wrappers:
safe_rl: true
cost_threshold: 0.5
trainer:
algorithm: SAC
total_timesteps: 1500000
hyperparams:
learning_rate: 0.0003
buffer_size: 500000
save_path: ./results/marl_der_sac/
framework: pettingzoo
seed: 0
Train with:
DERs — marl_ev_v2g (long horizon)¶
task:
name: marl_ev_v2g
split: train
trainer:
algorithm: SAC
total_timesteps: 3000000
hyperparams:
gamma: 0.999
buffer_size: 500000
save_path: ./results/marl_ev_sac/
framework: pettingzoo
seed: 0
GenCos — gencos_bidding (independent PPO)¶
task:
name: gencos_bidding
split: train
trainer:
algorithm: PPO
total_timesteps: 5000000
hyperparams:
learning_rate: 0.0003
n_steps: 1024
save_path: ./results/gencos_ippo/
framework: pettingzoo
seed: 0
DSO — make_dso_env(...) (factory, no YAML loader)¶
The DSO benchmark uses a direct factory rather than RLConfig. The equivalent of a "preset" is a single Python function call:
from powerzoo.tasks.dso_task import make_dso_env
from powerzoo.data import DataLoader
from stable_baselines3 import PPO
env = make_dso_env(split="train", data_loader=DataLoader())
model = PPO(
"MlpPolicy", env,
learning_rate=3e-4, n_steps=2048, batch_size=64, verbose=1,
)
model.learn(total_timesteps=3_000_000)
model.save("results/dso_ppo")
For a Recurrent PPO baseline (handles non-stationarity better), wrap your own loop using the SB3 RecurrentPPO from sb3-contrib.
Common modifications¶
- More episodes per training run. Tune
total_timesteps. Single-agent tasks converge in 200k–2M steps; large MARL needs 5M–20M. - Different algorithm. Set
trainer.algorithmtoSAC,PPOorTD3. PPO is the safe default for MARL; SAC is usually best for single-agent continuous control. - Forecast study. Set
wrappers.forecast_horizonto 0, 6 or 24 to compare no-forecast vs short-horizon vs full-day forecast policies on the same task. - Safe-RL study. Set
wrappers.safe_rl: trueand acost_thresholdconsistent with the task's typical violation magnitude (read it off from a random rollout).
See also¶
- Trainers —
Trainerandmake_envreference. - Wrappers — what each wrapper field does.
- Custom loops — when a YAML preset is not enough.
- Benchmarks — per-suite hyperparameter recommendations.