Top Headlines

Feeds

Anthropic Trains AI Model Using Public‑Drafted Constitution

Published Cached

~1,000 Americans helped draft an AI constitution Anthropic partnered with the Collective Intelligence Project to run an online deliberation on the Polis platform, recruiting a representative U.S. sample through PureSpectrum. Participants contributed 1,127 statements and cast 38,252 votes, revealing high consensus but two opinion groups [1].

Public constitution overlaps ~50% with Anthropic’s internal version After processing, the publicly sourced constitution retained statements that achieved consensus in both opinion groups. It shares roughly half of its concepts with the Anthropic‑written constitution but places greater emphasis on objectivity, accessibility, and proactive promotion of desired behavior [1].

Two Claude Instant‑sized models were trained on different constitutions Anthropic trained a “Public” model using the new public constitution and a “Standard” model using the internal constitution, with Claude Instant 1.2 serving as a control to verify training effects [1].

Performance on language and math tasks was equivalent Both models achieved similar scores on MMLU and GSM8K benchmarks, and user‑rated helpfulness and harmlessness showed no significant differences across the Public, Standard, and control models [1].

Public model showed reduced bias across nine social dimensions The BBQ evaluation indicated lower stereotype bias for the Public model than the Standard model, especially for disability status and physical appearance, likely reflecting the public constitution’s stronger focus on accessibility [1].

Process highlighted technical and methodological challenges Mapping free‑form public statements to CAI‑ready principles required subjective editing; an over‑weighting of harmlessness data initially produced “annoying” responses, prompting a reduction in loss weight. The authors note the need for better prompt databases and more comprehensive evaluations for future democratic AI alignment work [1].

Links