[general_dat] Seminario AISAR – AI in a vat: Fundamental limits of efficient world modelling for agent sandboxing and interpretability
Agustín Martinez Suñé
agusmartinez92 at gmail.com
Wed Oct 15 10:33:09 -03 2025
Desde el Programa de Becas AISAR en AI Safety tenemos el placer de
invitarlos a la próxima charla de nuestro seminario online, con la
participación de investigadores del área.
📌 Fecha y hora: Miércoles 22 de octubre, 12:00 hs (ARG).
🎤 Orador: Fernando Rosas – Lecturer @ University of Sussex
📖 Título: AI in a vat: Fundamental limits of efficient world modelling for
agent sandboxing and interpretability
🔗 Charla online: Para asistir a la charla, registrate acá:
https://luma.com/dywugtbl
Abstract: Recent work proposes using world models to generate controlled
virtual environments in which AI agents can be tested before deployment to
ensure their reliability and safety. However, accurate world models often
have high computational demands that can severely restrict the scope and
depth of such assessments. Inspired by the classic `brain in a vat' thought
experiment, here we investigate ways of simplifying world models that
remain agnostic to the AI agent under evaluation. By following principles
from computational mechanics, our approach reveals a fundamental trade-off
in world model construction between efficiency and interpretability,
demonstrating that no single world model can optimise all desirable
characteristics. Building on this trade-off, we identify procedures to
build world models that either minimise memory requirements, delineate the
boundaries of what is learnable, or allow tracking causes of undesirable
outcomes. In doing so, this work establishes fundamental limits in world
modelling, leading to actionable guidelines that inform core design choices
related to effective agent evaluation.
El paper: https://arxiv.org/abs/2504.04608
Equipo AISAR
http://scholarship.aisafety.ar/
<http://scholarship.aisafety.ar/?utm_source=chatgpt.com>
Más información sobre la lista de distribución general_dat