[academica_dat] Seminario AISAR – Mitigating Goal Misgeneralization via Minimax Regret

Thu Nov 6 16:20:06 -03 2025

Desde el Programa de Becas AISAR en AI Safety tenemos el placer de
invitarlos a la próxima charla de nuestro seminario online, con la
participación de investigadores del área.

📌 Fecha y hora: Jueves 13 de noviembre, 14:00 hs (ARG).
🎤 Oradora: Matthew Farrugia-Roberts – PhD Student @ University of Oxford
📖 Título: Mitigating Goal Misgeneralization via Minimax Regret
🔗 Charla online: Para asistir a la charla, registrate acá:
https://luma.com/b8e6smuj

Abstract: Safe generalization in reinforcement learning requires not only
that a learned policy acts capably in new situations, but also that it uses
its capabilities towards the pursuit of the designer's intended goal. The
latter requirement may fail when a proxy goal incentivizes similar behavior
to the intended goal within the training environment, but not in novel
deployment environments. This creates the risk that policies will behave as
if in pursuit of the proxy goal, rather than the intended goal, in
deployment -- a phenomenon known as goal misgeneralization. In this paper,
we formalize this problem setting in order to theoretically study the
possibility of goal misgeneralization under different training objectives.
We show that goal misgeneralization is possible under approximate
optimization of the maximum expected value (MEV) objective, but not the
minimax expected regret (MMER) objective. We then empirically show that the
standard MEV-based training method of domain randomization exhibits goal
misgeneralization in procedurally-generated grid-world environments,
whereas current regret-based unsupervised environment design (UED) methods
are more robust to goal misgeneralization (though they don't find MMER
policies in all cases). Our findings suggest that minimax expected regret
is a promising approach to mitigating goal misgeneralization.

El paper: https://arxiv.org/abs/2507.03068
Equipo AISAR
http://scholarship.aisafety.ar/
<http://scholarship.aisafety.ar/?utm_source=chatgpt.com>