Responsible Scaling Policy V3
Captured source
source ↗Responsible Scaling Policy Version 3.0 \ Anthropic Policy Announcements Anthropic’s Responsible Scaling Policy: Version 3.0 Feb 24, 2026 Read the Responsible Scaling Policy
We’re releasing the third version of our Responsible Scaling Policy (RSP), the voluntary framework we use to mitigate catastrophic risks from AI systems. Anthropic has now had an RSP for more than two years, and we’ve learned a great deal about its benefits and its shortcomings. We’re therefore updating the policy to reinforce what has worked well to date, improve the policy where necessary, and implement new measures to increase the transparency and accountability of our decision-making. You can read the new RSP in full here . In this post, we’ll discuss some of the thinking behind the changes. The original RSP and our theory of change The RSP is our attempt to solve the problem of how to address AI risks that are not present at the time the policy is written, but which could emerge rapidly as a result of an exponentially advancing technology. When we wrote the original RSP in September 2023, large language models were essentially chat interfaces. Today they can browse the web, write and run code, use computers, and take autonomous, multi-step actions. As each of these new capabilities have emerged, so have new risks. We expect this pattern to continue. We focused the RSP on the principle of conditional , or if-then , commitments. If a model exceeded certain capability levels (for example, biological science capabilities that could assist in the creation of dangerous weapons), then the policy stated that we should introduce a new and stricter set of safeguards (for example, against model misuse and the theft of model weights). Each set of safeguards corresponded to an “AI Safety Level” (ASL): for example, ASL-2 referred to one set of required safeguards, whereas ASL-3 referred to a more stringent set of safeguards needed for more capable AI models. Early ASLs (ASL-2 and ASL-3) were defined in significant detail, but it was more difficult to specify the correct safeguards for models that were still several generations away. We therefore intentionally left the later ASLs (ASL-4 and beyond) largely undefined, and hoped to develop them in more detail once we had a better picture of what higher AI capability levels would entail. The following is a rough description of our “theory of change”—that is, the mechanisms whereby we hoped to affect the ecosystem with the RSP: An internal forcing function. Within Anthropic, we hoped the RSP would compel us to treat important safeguards as requirements for launching (and training) new models. This made the importance of these safeguards clear to the large and growing organization, spurring us on to make faster progress. A race to the top. We hoped that announcing our RSP would encourage other AI companies to introduce similar policies. This is the idea of a “race to the top” (the converse of a “race to the bottom”), in which different industry players are incentivized to improve, rather than weaken, their models’ safeguards and their overall safety posture. Over time, we hoped RSPs, or similar policies, would become voluntary industry standards or go on to inform AI laws aimed at encouraging safety and transparency in AI model development. Creating more consensus about risks. We viewed the capability thresholds as potentially important moments for the industry. If we reached an important capability threshold (such as the ability of AI models to support the end-to-end production of bioweapons), we would institute the appropriate safeguards ourselves and use the evidence we’d obtained about AI capabilities to advocate to other companies and governments that they take action as well. In other words, we believed that the capability thresholds might be good points at which to go beyond unilateral action (Anthropic requiring safeguards for its own models) and encourage multilateral action (other AI companies, and/or governments also requiring such safeguards). Looking to the future . We recognized that, at some of the later capability thresholds, the intensity of countermeasures we were envisioning (for example, achieving high robustness against misuse of AI models by state-level actors) would likely be difficult or impossible for Anthropic to accomplish unilaterally. We hoped that by the time we reached these higher capabilities, the world would clearly see the dangers, and that we’d be able to coordinate with governments worldwide in implementing safeguards that are difficult for one company to achieve alone.
Assessing our theory of change Two and a half years later, our honest assessment is that some parts of this theory of change have played out as we hoped, but others have not. The following are the areas in which the RSP has been successful: Our RSP did incentivize us to develop stronger safeguards. For example, in order to comply with our ASL-3 deployment standard (which is primarily about risks from chemical and biological weapons from threat actors with relatively modest resources and expertise), we developed increasingly sophisticated and accurate methods (specifically, input and output classifiers ) to block content of concern. More broadly, the overall implementation of the ASL-3 standard did prove feasible. We activated ASL-3 safeguards for relevant models in May 2025 and have been working to improve them ever since. Our RSP did encourage other AI companies to adopt somewhat similar standards: within a few months of announcing our RSP, both OpenAI and Google DeepMind adopted broadly similar frameworks. Some companies have also implemented bioweapon-related classifiers in a similar vein to our ASL-3 defenses. The principles behind these voluntary standards, including those in the RSP, have helped to inform the development of early AI policy. We’ve seen governments around the world (for example in California with SB 53 , in New York with the RAISE Act , and with the EU AI Act’s Codes of Practice ) start to require frontier AI developers to create and publish frameworks for assessing and managing catastrophic risks—requirements Anthropic addresses through public documentation including its Frontier Compliance Framework . Encouraging these kinds of rigorous transparency frameworks for the industry was exactly what our RSP had set out to do.
Nevertheless, other parts of our theory of change have not panned out as we’d hoped: The idea of using the RSP thresholds to…
Excerpt shown — open the source for the full document.
Notability
Scored, but no written rationale attached yet.
Anthropic has a writing signal matching infrastructure, safety and policy.