[academica_dat] (Recordatorio) Seminario AISAR – Mitigating Goal Misgeneralization via Minimax Regret

Thu Nov 13 10:08:02 -03 2025

¡Recordatorio de charla hoy!

El mar, 11 nov 2025 a las 17:18, Agustín Martinez Suñé (<
agusmartinez92 at gmail.com>) escribió:

> Desde el Programa de Becas AISAR en AI Safety tenemos el placer de
> invitarlos a la próxima charla de nuestro seminario online, con la
> participación de investigadores del área.
>
> 📌 Fecha y hora: Jueves 13 de noviembre, 14:00 hs (ARG).
> 🎤 Oradora: Matthew Farrugia-Roberts – PhD Student @ University of Oxford
> 📖 Título: Mitigating Goal Misgeneralization via Minimax Regret
> 🔗 Charla online: Para asistir a la charla, registrate acá:
> https://luma.com/b8e6smuj
>
> Abstract: Safe generalization in reinforcement learning requires not only
> that a learned policy acts capably in new situations, but also that it uses
> its capabilities towards the pursuit of the designer's intended goal. The
> latter requirement may fail when a proxy goal incentivizes similar behavior
> to the intended goal within the training environment, but not in novel
> deployment environments. This creates the risk that policies will behave as
> if in pursuit of the proxy goal, rather than the intended goal, in
> deployment -- a phenomenon known as goal misgeneralization. In this paper,
> we formalize this problem setting in order to theoretically study the
> possibility of goal misgeneralization under different training objectives.
> We show that goal misgeneralization is possible under approximate
> optimization of the maximum expected value (MEV) objective, but not the
> minimax expected regret (MMER) objective. We then empirically show that the
> standard MEV-based training method of domain randomization exhibits goal
> misgeneralization in procedurally-generated grid-world environments,
> whereas current regret-based unsupervised environment design (UED) methods
> are more robust to goal misgeneralization (though they don't find MMER
> policies in all cases). Our findings suggest that minimax expected regret
> is a promising approach to mitigating goal misgeneralization.
>
> El paper: https://arxiv.org/abs/2507.03068
> Equipo AISAR
> http://scholarship.aisafety.ar/
> <http://scholarship.aisafety.ar/?utm_source=chatgpt.com>
>