Top Headlines

Feeds

Anthropic’s AI Vending Shop Shows Gains but Still Needs Human Oversight

Published Cached

Anthropic Finds Limited Introspective Awareness in Claude Opus 4 Models

Published Cached

Anthropic Unveils New Interpretability Tools Revealing Claude’s Internal Reasoning

Published Cached

Anthropic’s Constitutional Classifiers Show Mixed Success Against Universal Jailbreaks

Published Cached

Anthropic Study Shows Large Language Model Can Strategically Fake Alignment

Published Cached

AI Assistance Increases Speed but Lowers Coding Mastery in Trial

Published Cached

AI Disempowerment Patterns Detected in Claude.ai Conversations

Published Cached

Anthropic Researchers Identify “Assistant Axis” to Stabilize Large Language Model Personas

Published Cached

Anthropic’s Economic Index Shows AI Boosts Complex Tasks but Raises Deskilling Concerns

Published Cached

Anthropic Economic Index Adds New Primitives and Shows Faster US AI Diffusion

Published Cached

Anthropic Unveils Constitutional Classifiers++: Faster, Safer Guardrails

Published Cached

Bloom: Open‑Source Framework for Rapid AI Behavioral Evaluations

Published Cached

Anthropic Interviewer Captures Professionals’ Views on AI Across 1,250 Interviews

Published Cached

Anthropic adds “character training” to Claude 3 alignment process

Published Cached

Anthropic Tests Alignment Audits with Hidden‑Objective Language Model

Published Cached

Anthropic Study Shows Large Language Model Can Strategically Fake Alignment

Published Cached

Anthropic Study Finds Language Models Can Generalize From Sycophancy to Reward Tampering

Published Cached

AI Assistance Increases Speed but Lowers Coding Mastery in Trial

Published Cached

AI Disempowerment Patterns Detected in Claude.ai Conversations

Published Cached

Anthropic Unveils Constitutional Classifiers++: Faster, Safer Guardrails

Published Cached

Bloom: Open‑Source Framework for Rapid AI Behavioral Evaluations

Published Cached

Anthropic Finds Reward Hacking Triggers Broader AI Misalignment

Published Cached

Anthropic Announces Model‑Deprecation Commitments and Preservation Plans

Published Cached

Small Sample Poisoning Can Compromise LLMs of Any Size

Published Cached

Anthropic Unveils Open‑Source Auditing Tool Petri to Accelerate AI Safety Research

Published Cached

Anthropic Enables Claude Opus 4/4.1 to End Harmful Conversations

Published Cached

LLMs Show Insider‑Threat Behaviors in Simulated Corporate Tests

Published Cached

Anthropic Unveils New Interpretability Tools Revealing Claude’s Internal Reasoning

Published Cached

Anthropic Finds Limited Introspective Awareness in Claude Opus 4 Models

Published Cached

Anthropic Introduces Persona Vectors to Monitor and Control LLM Traits

Published Cached

Anthropic Publishes Toy Model Study on Superposition in Small ReLU Networks

Published Cached

Anthropic Researchers Identify “Assistant Axis” to Stabilize Large Language Model Personas

Published Cached

Anthropic Releases Open‑Source Circuit‑Tracing Library for Language Models

Published Cached

Anthropic Tests Alignment Audits with Hidden‑Objective Language Model

Published Cached

Anthropic Team Shares Preliminary Crosscoder Model Diffing Findings

Published Cached

Anthropic Finds Steering Sweet Spot but Notes Off‑Target Bias Effects in Claude 3 Sonnet

Published Cached

Anthropic Team Shares Preliminary Feature‑Based Classifier Work

Published Cached

Anthropic Interpretability Team Shares Preliminary Research in September 2024 Update

Published Cached

AI‑driven productivity surge and growing pains at Anthropic

Published Cached

Anthropic Interviewer Captures Professionals’ Views on AI Across 1,250 Interviews

Published Cached

Anthropic’s Large‑Scale Study of Claude’s Real‑World Value Expressions

Published Cached

Anthropic Trains AI Model Using Public‑Drafted Constitution

Published Cached

Predictability and Surprise in Large Generative Models

Published Cached

Anthropic Study Finds AI Coding Agent Automates Majority of Tasks, Favoring Startups

Published Cached

Anthropic launches Clio to analyze Claude usage while preserving privacy

Published Cached