Anthropic Releases Open‑Source Circuit‑Tracing Library for Language Models

Published 2025-05-29T00:00:00-0700 Cached 2026-02-02T19:40:40+0000

Image: Anthropic

An overview of the interactive graph explorer UI on Neuronpedia. (Anthropic) Source Full size

Anthropic open‑sources a circuit‑tracing library that generates attribution graphs to reveal model reasoning steps; the code is publicly available on GitHub [3].

The library works with popular open‑weight models and includes an interactive frontend hosted on Neuronpedia, allowing users to explore graphs visually [6].

Development was led by Anthropic Fellows Michael Hanna and Mateusz Piotrowski, mentored by Emmanuel Ameisen and Jack Lindsey, with Decode Research handling the Neuronpedia integration (Johnny Lin and Curt Tigges) [4][5].

Researchers have already applied the tools to study multi‑step reasoning and multilingual representations in Gemma‑2‑2b and Llama‑3.2‑1b, with examples in a public demo notebook [8] and graphs based on the GemmaScope project [12].

CEO Dario Amodei highlighted the urgency of interpretability, noting that understanding AI lags behind capability advances in a recent blog post [9].

The community is invited to generate, share, and extend attribution graphs, test hypotheses by modifying features, and provide feedback through GitHub issues [3].

Dario Amodei, CEO of Anthropic – Stated that “our understanding of the inner workings of AI lags far behind the progress we’re making in AI capabilities,” emphasizing the need for open‑source interpretability tools [9].

Top Headlines

Feeds

Anthropic Releases Open‑Source Circuit‑Tracing Library for Language Models

Published 2025-05-29T00:00:00-0700 Cached 2026-02-02T19:40:40+0000

Links