[academica_dat] Recordatorio: Seminario AISAR – Thought Anchors: Which LLM Reasoning Steps Matter?

Thu Sep 18 13:58:00 -03 2025

Desde el Programa de Becas AISAR en AI Safety tenemos el placer de
invitarlos a la próxima charla de nuestro seminario online, con la
participación de investigadores del área.

📌 Fecha y hora: Viernes 19 de septiembre, 09:00 hs (ARG).
🎤 Oradores:
Paul C. Bogdan – Postdoctoral Researcher, Duke University
Uzay Macar – AI Safety Researcher, ML Alignment & Theory Scholars (MATS)
📖 Título: Thought Anchors: Which LLM Reasoning Steps Matter?

👉 Inscripción: Para asistir a la charla, por favor indicá tu nombre en el
siguiente formulario (No es necesario que completes este formulario si ya
indicaste "Quiero que me avisen por correo electrónico cuando haya nuevas
charlas de AISAR" en un formulario previo):
https://forms.gle/wAyCczqeAH7WmwjXA

Abstract:
Reasoning large language models have recently achieved state-of-the-art
performance in many fields. However, their long-form chain-of-thought
reasoning creates interpretability challenges as each generated token
depends on all previous ones, making the computation harder to decompose.
We argue that analyzing reasoning traces at the sentence level is a
promising approach to understanding reasoning processes. We present three
complementary attribution methods: (1) a black-box method measuring each
sentence's counterfactual importance by comparing final answers across 100
rollouts conditioned on the model generating that sentence or one with a
different meaning; (2) a white-box method of aggregating attention patterns
between pairs of sentences, which identified "broadcasting" sentences that
receive disproportionate attention from all future sentences via "receiver"
attention heads; (3) a causal attribution method measuring logical
connections between sentences by suppressing attention toward one sentence
and measuring the effect on each future sentence's tokens. Each method
provides evidence for the existence of thought anchors, reasoning steps
that have outsized importance and that disproportionately influence the
subsequent reasoning process. These thought anchors are typically planning
or backtracking sentences. We provide an open-source tool (this http URL)
for visualizing the outputs of our methods, and present a case study
showing converging patterns across methods that map how a model performs
multi-step reasoning. The consistency across methods demonstrates the
potential of sentence-level analysis for a deeper understanding of
reasoning models.

Encontrá el paper acá: https://arxiv.org/abs/2506.19143
Equipo AISAR
http://scholarship.aisafety.ar/
<http://scholarship.aisafety.ar/?utm_source=chatgpt.com>

>