Transmission physics¶

TransGridEnv (powerzoo/envs/grid/trans.py) is the transmission-grid env. It supports four solver modes built from two orthogonal switches:

physics ∈ {'dc', 'ac'} — linearised vs full nonlinear AC equations.
solver_mode ∈ {'opf', 'pf'} — does the env optimise dispatch (opf) or only evaluate a given dispatch (pf)?

The four combinations cover the standard transmission tasks:

`physics`	`solver_mode`	Resulting solver	Typical RL use
`'dc'`	`'opf'`	DCOPF — linear program (Gurobi / SciPy / CVXPY)	Agent learns bidding or commitment; environment owns dispatch.
`'ac'`	`'opf'`	ACOPF — nonlinear program (cyipopt / SLSQP)	Agent learns bidding with reactive power and voltage.
`'dc'`	`'pf'`	DCPF — `line_flow = PTDF · injection` (matrix-vector mul)	Agent learns dispatch directly; environment evaluates feasibility.
`'ac'`	`'pf'`	ACPF — Newton-Raphson	Agent learns dispatch with full AC physics.

The OPF LP backend is selected by solver_type ∈ {'auto', 'gurobi', 'scipy', 'cvxpy'}. This is orthogonal to physics / solver_mode and only chooses the LP library used in OPF mode.

Vocabulary check. PF (Power Flow) — solve voltages and line flows given fixed injections. OPF (Optimal Power Flow) — also dispatch generators to minimise cost. DC in this context means linearised (constant 1 pu voltage, no reactive power, no losses), not direct-current. PTDF (Power Transfer Distribution Factor) — sensitivity matrix linking nodal injections to line flows. LMP (Locational Marginal Price) — dual variable of the nodal power balance.

DC power flow¶

DC PF assumes voltages are 1 pu, ignores reactive power, and approximates losses as zero:

\[ P_{\text{line}} \;=\; \text{PTDF} \cdot P_{\text{injection}} \]

The PTDF matrix is precomputed at case.init() from the line reactances. A DC PF call is a single matrix-vector multiply: fast, differentiable, and identical to what the DCOPF LP uses for its line-flow constraints.

AC power flow¶

AC PF solves the nonlinear nodal power balance at every bus:

\[ P_i = V_i \sum_j V_j (G_{ij} \cos\theta_{ij} + B_{ij} \sin\theta_{ij}) \]

\[ Q_i = V_i \sum_j V_j (G_{ij} \sin\theta_{ij} - B_{ij} \cos\theta_{ij}) \]

where $V_i$ is voltage magnitude, $\theta_{ij}$ is the voltage-angle difference and $G_{ij}$, $B_{ij}$ come from the admittance matrix. PowerZoo uses Newton-Raphson by default.

AC PF can fail to converge under infeasible dispatch. info['pf_converged'] reports the actual outcome; PF failure is a real cost (see Reward and cost split).

DCOPF¶

DCOPF solves a linear program:

\[ \min_{P_g} \; \sum_i C_i(P_{g,i}) \quad \text{s.t.} \quad \mathbf{1}^\top P_g = \mathbf{1}^\top D, \quad |\text{PTDF} \cdot (P_g - D)| \leq \overline{S}, \quad P_g^{\min} \leq P_g \leq P_g^{\max} \]

with quadratic cost $C_i(P) = mc\_a_i P^2 + mc\_b_i P + mc\_c_i$. For cases with mc_a = mc_b = 0 (Case5 and most IEEE cases), the cost is flat marginal: $C_i(P) = mc\_c_i \cdot P$. LMPs are recovered from the duals of the nodal balance constraint.

ACOPF¶

ACOPF solves the same objective subject to the AC equations and voltage limits. PowerZoo wraps cyipopt (preferred) or SLSQP. ACOPF is non-convex and may return a local optimum.

`physics` × `solver_mode` decision tree¶

flowchart TD
    Q1{Does the agent decide\nunit dispatch (P)?}
    Q1 -->|yes| Q2{Need voltage and Q?}
    Q1 -->|no, env optimises| Q3{Need voltage and Q?}
    Q2 -->|yes| ACPF["physics='ac' + solver_mode='pf'\n→ ACPF"]
    Q2 -->|no| DCPF["physics='dc' + solver_mode='pf'\n→ DCPF"]
    Q3 -->|yes| ACOPF["physics='ac' + solver_mode='opf'\n→ ACOPF"]
    Q3 -->|no| DCOPF["physics='dc' + solver_mode='opf'\n→ DCOPF"]

Case sizes available¶

Case	Buses	Lines	Generators	Default for
`Case5`	5	6	5	`marl_opf`, `marl_uc`, `gencos_bidding`
`Case14`	14	20	5	sandbox / OOD scale variant
`Case118`	118	186	54	`opf_118`, `opf_118_7d`
`Case300`, `Case1354pegase`, `Case2383wp`	300+	411+	80+	scalability stress tests
`Case29GB`	29	99	66	GB reduced transmission (MATPOWER)
`Case552GB`	552	673	2385	GB large-scale transmission

Case5 and Case118 are the two main public benchmark cases. Larger cases are available for scalability research but are not part of the standard task set.

What goes into `info`¶

Every TransGridEnv.step() populates the standard keys (see Python contract §4) plus, when applicable:

lmp (np.ndarray, $/MWh) — nodal LMPs from the OPF dual.
lmp_quality — 'gurobi_dual' / 'scipy_recovered' / 'cvxpy' — how the LMP was computed.
solver_backend — actual LP solver used (e.g. 'highs').
opf_cost ($/h) — total generation cost from the OPF objective.

Difficulty presets¶

TransGridEnv accepts difficulty='easy' / 'medium' / 'hard', which sets the load ratio and the time-step length:

Preset	Load ratio	`delta_t_minutes`	Effect
`easy`	0.7	60	Loose constraints, fewer steps.
`medium`	0.85	30	Standard benchmark setting.
`hard`	0.95	30	Many lines near their limits.

The presets are convenient for sanity testing; for benchmark experiments, prefer the explicit task names (marl_opf etc.), which fix every hyperparameter via SPLIT_DATES.