[general_comp] Seminario AISAR – Early Signs of Steganographic Capabilities in Frontier LLMs

Agustín Martinez Suñé agusmartinez92 at gmail.com
Tue Sep 9 06:55:18 -03 2025


Recordatorio de que este seminario es mañana!

Saludos

> Desde el Programa de Becas AISAR en AI Safety tenemos el placer de
> invitarlos a la próxima charla de nuestro seminario online, con la
> participación de investigadores del área.
>
> 📌 Fecha y hora: Miércoles 10 de septiembre, 10:00 hs (ARG).
> 🎤 Orador: Kei Nishimura-Gasparian – AI Safety Researcher, ML Alignment &
> Theory Scholars (MATS)
> 📖 Título: Early Signs of Steganographic Capabilities in Frontier LLMs
>
> 👉 Inscripción: Para asistir a la charla, por favor indicá tu nombre en
> el siguiente formulario: https://forms.gle/Y1Ryu16hyLpTAMNw5
>
> Abstract:
> Monitoring Large Language Model (LLM) outputs is crucial for mitigating
> risks from misuse and misalignment. However, LLMs could evade monitoring
> through steganography: Encoding hidden information within seemingly benign
> generations. In this paper, we evaluate the steganography capabilities in
> frontier LLMs to better understand the risk they pose. We focus on two
> types of steganography: passing encoded messages and performing encoded
> reasoning. We find that current models are unable to encode short messages
> in their outputs without a monitor noticing under standard affordances.
> They can succeed, however, if given additional affordances such as using an
> unmonitored scratchpad and coordinating on what encoding scheme to use. We
> additionally find early signs that models can perform basic encoded
> reasoning in a simple state-tracking problem. This includes some ability to
> reason with their own and pre-defined schemes, including encoding schemes
> such as Hexadecimal. Despite this, they can rarely hide reasoning subtly
> within a cover task to fool a monitor. Overall, our results indicate that
> current LLMs exhibit nascent steganographic capabilities. While these
> capabilities are likely insufficient to bypass well-designed monitors at
> present, this could change in the future.
>
> Encontrá el paper acá: https://arxiv.org/abs/2507.02737
> Equipo AISAR,
> http://scholarship.aisafety.ar/
> <http://scholarship.aisafety.ar/?utm_source=chatgpt.com>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listas.exactas.uba.ar/pipermail/general_comp/attachments/20250909/0a9d02bb/attachment.htm>


Más información sobre la lista de distribución general_comp