[general_dat] Recordatorio: Seminario AISAR – AI in a vat: Fundamental limits of efficient world modelling for agent sandboxing and interpretability

Wed Oct 22 10:13:18 -03 2025

Recordatorio de esta charla hoy!

El mar, 21 oct 2025 a las 16:15, Agustín Martinez Suñé (<
agusmartinez92 at gmail.com>) escribió:

> Recordatorio de esta charla online de AISAR que sucederá mañana.
>
> Saludos,
> Agus.
>
>> Desde el Programa de Becas AISAR en AI Safety tenemos el placer de
>> invitarlos a la próxima charla de nuestro seminario online, con la
>> participación de investigadores del área.
>>
>> 📌 Fecha y hora: Miércoles 22 de octubre, 12:00 hs (ARG).
>> 🎤 Orador: Fernando Rosas – Lecturer @ University of Sussex
>> 📖 Título: AI in a vat: Fundamental limits of efficient world modelling
>> for agent sandboxing and interpretability
>>
>> 🔗 Charla online: Para asistir a la charla, registrate acá:
>> https://luma.com/dywugtbl
>>
>> Abstract: Recent work proposes using world models to generate controlled
>> virtual environments in which AI agents can be tested before deployment to
>> ensure their reliability and safety. However, accurate world models often
>> have high computational demands that can severely restrict the scope and
>> depth of such assessments. Inspired by the classic `brain in a vat' thought
>> experiment, here we investigate ways of simplifying world models that
>> remain agnostic to the AI agent under evaluation. By following principles
>> from computational mechanics, our approach reveals a fundamental trade-off
>> in world model construction between efficiency and interpretability,
>> demonstrating that no single world model can optimise all desirable
>> characteristics. Building on this trade-off, we identify procedures to
>> build world models that either minimise memory requirements, delineate the
>> boundaries of what is learnable, or allow tracking causes of undesirable
>> outcomes. In doing so, this work establishes fundamental limits in world
>> modelling, leading to actionable guidelines that inform core design choices
>> related to effective agent evaluation.
>>
>> El paper: https://arxiv.org/abs/2504.04608
>> Equipo AISAR
>> http://scholarship.aisafety.ar/
>> <http://scholarship.aisafety.ar/?utm_source=chatgpt.com>
>>
>