[general_comp] Recordatorio: HOY Seminario AISAR – Reverse-engineering a neural network that plans: a mesa-optimizer model organism

Agustín Martinez Suñé agusmartinez92 at gmail.com
Wed Sep 24 08:54:05 -03 2025


Desde el Programa de Becas AISAR en AI Safety tenemos el placer de
invitarlos a la próxima charla de nuestro seminario online, con la
participación de investigadores del área.

📌 Fecha y hora: Miércoles 24 de septiembre, 13:00 hs (ARG).
🎤 Orador: Adrià Garriga-Alonso – Research Scientist, FAR AI
<https://www.far.ai/>📖 Título: Reverse-engineering a neural network that
plans: a mesa-optimizer model organism

🌐 👉 *Charla online,* Inscripción: Para asistir a la charla, por favor
indicá tu nombre en el siguiente formulario (No es necesario que completes
este formulario si ya indicaste "Quiero que me avisen por correo
electrónico cuando haya nuevas charlas de AISAR" en un formulario previo):
https://forms.gle/XNDf9uskcRoZ6koW6

Abstract: We partially reverse-engineer a convolutional recurrent neural
network (RNN) trained to play the puzzle game Sokoban with model-free
reinforcement learning. Prior work found that this network solves more
levels with more test-time compute. Our analysis reveals several mechanisms
analogous to components of classic bidirectional search. For each square,
the RNN represents its plan in the activations of channels associated with
specific directions. These state-action activations are analogous to a
value function - their magnitudes determine when to backtrack and which
plan branch survives pruning. Specialized kernels extend these activations
(containing plan and value) forward and backward to create paths, forming a
transition model. The algorithm is also unlike classical search in some
ways. State representation is not unified; instead, the network considers
each box separately. Each layer has its own plan representation and value
function, increasing search depth. Far from being inscrutable, the
mechanisms leveraging test-time compute learned in this network by
model-free training can be understood in familiar terms.

Encontrá el paper acá: https://arxiv.org/abs/2506.10138

Equipo AISAR
http://scholarship.aisafety.ar/
<http://scholarship.aisafety.ar/?utm_source=chatgpt.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listas.exactas.uba.ar/pipermail/general_comp/attachments/20250924/4000b763/attachment.htm>


Más información sobre la lista de distribución general_comp