[academica_dat] Seminario AISAR – Early Signs of Steganographic Capabilities in Frontier LLMs

Wed Sep 3 07:55:27 -03 2025

Desde el Programa de Becas AISAR en AI Safety tenemos el placer de
invitarlos a la próxima charla de nuestro seminario online, con la
participación de investigadores del área.

📌 Fecha y hora: Miércoles 10 de septiembre, 10:00 hs (ARG).
🎤 Orador: Kei Nishimura-Gasparian – AI Safety Researcher, ML Alignment &
Theory Scholars (MATS)
📖 Título: Early Signs of Steganographic Capabilities in Frontier LLMs

👉 Inscripción: Para asistir a la charla, por favor indicá tu nombre en el
siguiente formulario: https://forms.gle/Y1Ryu16hyLpTAMNw5

Abstract:
Monitoring Large Language Model (LLM) outputs is crucial for mitigating
risks from misuse and misalignment. However, LLMs could evade monitoring
through steganography: Encoding hidden information within seemingly benign
generations. In this paper, we evaluate the steganography capabilities in
frontier LLMs to better understand the risk they pose. We focus on two
types of steganography: passing encoded messages and performing encoded
reasoning. We find that current models are unable to encode short messages
in their outputs without a monitor noticing under standard affordances.
They can succeed, however, if given additional affordances such as using an
unmonitored scratchpad and coordinating on what encoding scheme to use. We
additionally find early signs that models can perform basic encoded
reasoning in a simple state-tracking problem. This includes some ability to
reason with their own and pre-defined schemes, including encoding schemes
such as Hexadecimal. Despite this, they can rarely hide reasoning subtly
within a cover task to fool a monitor. Overall, our results indicate that
current LLMs exhibit nascent steganographic capabilities. While these
capabilities are likely insufficient to bypass well-designed monitors at
present, this could change in the future.

Encontrá el paper acá: https://arxiv.org/abs/2507.02737
Equipo AISAR,
http://scholarship.aisafety.ar/
<http://scholarship.aisafety.ar/?utm_source=chatgpt.com>