[general_comp] Seminario AISAR – Real-Time Detection of Hallucinated Entities in Long-Form Generation

Thu Sep 25 07:02:41 -03 2025

Desde el Programa de Becas AISAR en AI Safety tenemos el placer de
invitarlos a la próxima charla de nuestro seminario online, con la
participación de investigadores del área.

📌 Fecha y hora: Lunes 29 de septiembre, 9:00 hs (ARG).
🎤 Orador: Oscar Balcells Obeso – Ph.D. student, ETH Zurich
📖 Título: Real-Time Detection of Hallucinated Entities in Long-Form
Generation

🌐 👉 *Charla online,* Inscripción: Para asistir a la charla, por favor
indicá tu nombre en el siguiente formulario (No es necesario que completes
este formulario si ya indicaste "Quiero que me avisen por correo
electrónico cuando haya nuevas charlas de AISAR" en un formulario previo):
https://forms.gle/f127kJPZYDbhujaL8

Abstract: Large language models are now routinely used in high-stakes
applications where hallucinations can cause serious harm, such as medical
consultations or legal advice. Existing hallucination detection methods,
however, are impractical for real-world use, as they are either limited to
short factual queries or require costly external verification. We present a
cheap, scalable method for real-time identification of hallucinated tokens
in long-form generations, and scale it effectively to 70B parameter models.
Our approach targets \emph{entity-level hallucinations} -- e.g., fabricated
names, dates, citations -- rather than claim-level, thereby naturally
mapping to token-level labels and enabling streaming detection. We develop
an annotation methodology that leverages web search to annotate model
responses with grounded labels indicating which tokens correspond to
fabricated entities. This dataset enables us to train effective
hallucination classifiers with simple and efficient methods such as linear
probes. Evaluating across four model families, our classifiers consistently
outperform baselines on long-form responses, including more expensive
methods such as semantic entropy (e.g., AUC 0.90 vs 0.71 for
Llama-3.3-70B), and are also an improvement in short-form
question-answering settings. Moreover, despite being trained only with
entity-level labels, our probes effectively detect incorrect answers in
mathematical reasoning tasks, indicating generalization beyond entities.
While our annotation methodology is expensive, we find that annotated
responses from one model can be used to train effective classifiers on
other models; accordingly, we publicly release our datasets to facilitate
reuse. Overall, our work suggests a promising new approach for scalable,
real-world hallucination detection.

Encontrá el paper acá: https://arxiv.org/abs/2509.03531
Equipo AISAR
http://scholarship.aisafety.ar/
<http://scholarship.aisafety.ar/?utm_source=chatgpt.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listas.exactas.uba.ar/pipermail/general_comp/attachments/20250925/6e61b641/attachment.htm>