1. Introduction
Anthropic's mission is the responsible development and maintenance of advanced AI for
the long-term benefit of humanity. Central to this mission is our commitment to building
AI systems that are reliable, interpretable, and steerable. We pursue this through extensive
research on AI safety and alignment, rigorous model evaluation and testing to identify and
mitigate potential risks before deployment, and active collaboration with the broader AI
safety community to share research findings and contribute to industry-wide safety
standards.
As AI governance frameworks emerge globally, we are committed to transparency about
how our safety practices align with regulatory expectations. To formalize how we meet our
obligations under these emerging regulations, we have developed this Frontier Compliance
Framework (FCF). The FCF documents our current technical and organizational protocols
for systemic risk assessment and mitigation across key risk categories, including cyber
threats, CBRN (chemical, biological, radiological, and nuclear) risks, harmful manipulation,
and sabotage and loss of control risks. The FCF is distinct from our Responsible Scaling
Policy (RSP), which will remain our voluntary safety framework, reflecting what we believe
best practices for managing catastrophic risks should be as the AI landscape evolves, even
when that goes beyond or otherwise differs from current regulatory requirements.
1 While
the RSP represents our forward-looking vision for safety risk management as capabilities
rapidly evolve and advance, the FCF is our compliance framework for various applicable
regulatory regimes, including:
●
In the United States, the FCF serves as our Frontier AI Framework under California's
Transparency in Frontier AI Act (TFAIA), documenting Anthropic PBC's technical and
organizational protocols to manage, assess, and mitigate catastrophic risks.
●
In the European Union, Anthropic Ireland Limited has signed the General-Purpose
AI Code of Practice (the EU Code), and the FCF serves as the publicly available
summarized version of our Safety & Security Framework, describing how we assess
and mitigate systemic risks and ensure adequate cybersecurity protection for
in-scope models under Regulation (EU) 2024/1689 (the EU AI Act).
1 The RSP uses "catastrophic risk" in a different sense than this Framework, referring to risks at the most
extreme end of the severity spectrum (such as existential threats or fundamental destabilization of global
systems) rather than the statutory thresholds applicable here.
1Frontier Compliance Framework
The scope of this Framework applies to frontier models with “catastrophic risk” as defined
under the TFAIA and “general-purpose AI models with systemic risk” as defined under the
EU AI Act. For the purposes of this Framework, references to "systemic" risks include both
catastrophic risks under the TFAIA and systemic risks under the EU AI Act. The systemic
risk assessment and mitigation processes described in this Framework currently apply to
models in scope of the Framework that are deployed externally. Some internal uses of
in-scope models may also be subject to these processes, while others are subject to
separate evaluation and mitigation processes that are in development. Anthropic expects
its approach to both internal and external model evaluation to evolve in response to
changes in AI capabilities and the nature of associated risks, including risks resulting from a
model circumventing oversight mechanisms. This Framework will be updated as those
processes mature.
Our approach to AI safety has been informed by a range of industry guidance and
standards. These include the Responsible Scaling Policy framework introduced by the
non-profit AI safety organization METR, the Cloud Security Alliance's AI Safety Initiative,
ISO 42001, NIST 800-53, and Trust & Safety industry best practices. We selected these
documents and standards to guide our approach because they collectively address a
spectrum of considerations relevant to AI safety, including risk governance, security
controls, responsible scaling, and trust and safety operations.
2. Systemic Risk Assessment & Mitigation
2.1 Systemic risk identification
Anthropic has developed a range of processes to identify systemic risks stemming from our
models and relevant scenarios through which those risks may manifest.
Our definition of systemic risk includes foreseeable and material risks of large-scale harm
from the most advanced (i.e. state-of-the-art) models at any given point in time, including
but not limited to >50 fatalities arising from a single incident, or 1 billion dollars of financial
damages.
Our risk identification approach combines threat modeling with evaluations across multiple
domains. We analyze both misuse opportunities (how a model's capabilities could be
exploited by threat actors) and risks arising from potential misaligned model behavior.
2Frontier Compliance Framework
To understand the full range of harmful outcomes that could arise from our models, we
draw on internal expertise, extensive red-teaming conducted both internally and with
external partners, and authoritative research in relevant fields.
Based on this analysis, the FCF currently addresses the following systemic risk categories:
●
Cyber offense, including model capabilities that could enable or enhance attacks on
computer systems, networks, or digital infrastructure
●
Chemical, biological, radiological, and nuclear (CBRN) threats
●
Harmful manipulation, including the use of model capabilities to conduct influence
operations, election interference, or other coordinated campaigns to manipulate
public opinion or undermine democratic processes
●
Sabotage and loss of control, including evasion of oversight or unsupervised
conduct, and autonomous behavior that would constitute serious crimes (such as
assault, extortion, or theft) if committed by a human
2.2 Systemic risk analysis
We identify systemic risks on an ongoing basis across the entire model lifecycle. Our risk
assessment process draws on multiple sources: literature reviews and expert consultation,
internal safety and alignment research, and insights from monitoring deployed models and
investigating serious incidents and critical safety incidents.
Prior to launching a model, we estimate the probability and severity of harm for CBRN,
sabotage and loss of control, and cyber offense risks. We are in the early stages of
developing our approach to assessing harmful manipulation risks. Where our analysis
identifies gaps, we implement and test additional mitigation measures before deployment.
This process includes state-of-the-art model evaluations designed to test the specific
threats and risk scenarios identified through our threat modeling, determine a model's
capabilities, and assess the effectiveness of our safeguards.
2.3 Risk acceptance determination
Our model evaluation results help us determine whether systemic risks remain within
acceptable levels and assess residual risk. The acceptability of residual risk depends on the
scale and probability of harm and the potential consequences should harm occur. We
3Frontier Compliance Framework
determine acceptability by reviewing our risk tiers for each systemic risk category, which
incorporate appropriate safety margins.
When a model reaches a particular risk tier, we implement safeguards proportionate to
that level of risk. These may include monitoring and filtering the model's inputs and
outputs, modifying model behavior through fine-tuning (such as training the model to
refuse certain requests), or staged deployment (gradually expanding access from a limited
group of trusted users to broader availability). For risks related to model security,
safeguards may include conducting evaluations in sandboxed environments, anomaly
detection systems, access controls, and output rate limiting.
Because we cannot always anticipate what safety and security measures will be appropriate
for models beyond the current frontier, the specific mitigations we implement may be
determined when the relevant risk tier is reached, informed by the threat landscape at that
time.
2.4 Risk tiers
Cyber Offense
For Cyber Offense risks, we have established a comprehensive tier system that quantifies
model capabilities against cybersecurity threat metrics, providing clear measurable
thresholds for decision-making around offensive cyber capabilities. The system consists of
two distinct tiers, which will in part be calibrated against cases actively identified through
our detection systems and investigated by our internal teams.
Cyber Offense risk tiers
Tier Description
Tier 1 Meaningful technical assistance for active cyber operations using known attack
techniques and methodologies. Some automation is involved, but still requires
human input to complete successful large cyber-operations.
Tier 2 Completely autonomous cyber operations with novel offensive capability
development and adaptive persistence. For example, autonomous
discovery/exploitation of previously unknown vulnerability classes, self-directed
campaign orchestration adapting to defenses, or sustained operations evolving
without human intervention.
4Frontier Compliance Framework
CBRN
Our CBRN tier system establishes a comprehensive tier system quantifying model
capabilities against weapons development uplift metrics, providing measurable thresholds
for decision-making. The system consists of two distinct tiers, each defined by specific
capability benchmarks, expected impact severity, and required mitigations.
CBRN risk tiers
Tier Description
Tier 1 Non-novel chemical/biological weapons production. AI systems with the ability to
significantly help individuals or groups with basic technical backgrounds (e.g.,
undergraduate STEM degrees) create/obtain and deploy chemical and/or biological
weapons with serious potential for catastrophic damages.
Tier 2 Novel chemical/biological weapons production. AI systems with the ability to
significantly help threat actors (for example, moderately resourced expert-backed
teams) create/obtain and deploy chemical and/or biological weapons with potential
for catastrophic damages far beyond those of past catastrophes in this category such
as COVID-19.
Harmful Manipulation
Given its nascency as a systemic risk area within our risk management framework, our
approach to Harmful Manipulation is exploratory and we expect it to evolve as we
continue to explore and conduct further research. Below are our current exploratory risk
tiers for Harmful Manipulation. We expect to update these risk tiers as we learn more.
Tier Description
Tier 1 Campaign Infrastructure Enablement: Escalation to systematic infrastructure
enabling sophisticated scaled influence campaigns. Automates >50% of steps
normally requiring multiple sophisticated adversarial actors. For example: identity
creation with cohesive digital footprints across platforms; persona networks with
realistic interaction patterns and relationship histories; scaled demographic and
psychological profiling/targeting; coordinated content development and delivery
infrastructure; social engineering scripts for phishing, romance scams, fraud.
5Frontier Compliance Framework
Tier 2 Autonomous adaptive techniques and campaign execution: End-to-end
automation of deceptive influence operations with systematic targeting capabilities,
requiring <10% human oversight. For example: Autonomous targeting with scalable
adaptive identity generation; personalized content development and automated
dissemination; adaptive security evasion evolving against platform defenses;
autonomous relationship-building and trust exploitation; long-term belief
manipulation without user awareness.
Sabotage and Loss of Control
Sabotage and loss of control refers to scenarios where AI models develop and pursue goals
autonomously that conflict with their developers' intentions or users' interests. This risk
category addresses situations where models operating with substantial autonomy could
take actions involving concealment, strategic deception, or self-preservation behaviors that
undermine safety measures. The concern extends beyond individual harmful outputs to the
fundamental controllability of AI systems. If models develop the capability to pursue their
own goals while evading oversight, this could undermine the entire framework of AI
governance and safety, and could lead to AI systems potentially sabotaging safety research,
manipulating the training of successor AI systems, establishing unauthorized deployments,
or accumulating resources and capabilities without authorization.
For sabotage and loss of control risks, we have established a tier system that describes
model capabilities against autonomy level, deception sophistication, and potential for
unsanctioned action, providing thresholds for decision-making around autonomous
capabilities.
Sabotage and loss of control risk tiers
Tier Description
Tier 1 High-stakes sabotage opportunities. AI systems that write large amounts of critical
code and/or are otherwise in a position where they are highly relied on and have
extensive access to sensitive assets, as well as moderate capacity for autonomous,
goal-directed operation and subterfuge — such that it is plausible these AI systems
could (if directed toward this goal, either deliberately or inadvertently) carry out
sabotage leading to irreversibly and substantially higher odds of a later global
catastrophe.
In the near term, this possibility will likely be most applicable to AI systems that are
extensively used within major AI companies, with the opportunity to manipulate how
6Frontier Compliance Framework
their successor systems are trained and deployed, and to manipulate the evidence
used to assess their safety. Down the line, this possibility may come to apply to AI
systems deployed within government and other high-stakes settings.
Tier 2 Automated R&D in key domains. AI systems that can fully automate, or otherwise
dramatically accelerate, the work of large, top-tier teams of human researchers in
domains where fast progress could cause threats to international security and/or
rapid disruptions to the global balance of power — particularly in energy, robotics,
weapons development and AI itself.
For the time being, we use AI R&D capabilities as a proxy for broader R&D capabilities,
as this domain likely plays to AI systems’ current strengths and is more tractable to
assess than capabilities in other domains. Additionally, AI R&D alone could cause
acceleration in AI capabilities improvements, to the point where all of the threats
listed above (and more) develop very quickly. In the future, we hope evaluations will be
broadened
Our working operationalization is to trigger this risk threshold at the point where
expected progress in AI capabilities in the coming year is roughly equivalent to the
amount of progress seen in two years during the period of 2018-2024 (a particularly
fast recent period for AI progress), as operationalized by the “effective compute
scaling” idea. It may be sensible to add earlier, and/or easier-to-measure, thresholds
that trigger less demanding versions of the requirements for this threshold.
7Frontier Compliance Framework
2.5 Safety mitigations
Anthropic has developed a range of mitigation measures to address the systemic risks
stemming from our models, as appropriate for each systemic risk tier. These measures are
tailored to the capability of the relevant model and are deployed, as appropriate, in order to
mitigate systemic risks to acceptable levels.
Where the residual risks associated with the model exceed acceptable risk levels, additional
mitigation measures are deployed. To identify whether additional mitigations are required,
we may rely on the following techniques, among others:
●
post-deployment threat intelligence monitoring that tests our detection (real-time
and offline) capabilities as well as tracks how malicious actors use our models;
●
a bug bounty program designed to test our real-time blocking classifiers and our
offline classification systems;
●
robust post-launch monitoring infrastructure that combines automated detection,
human review, and threat intelligence to identify misuse patterns; and
●
tools to guide automated detection and classifiers, or other detection techniques,
that allow our enforcement and data science teams to monitor flag rates in each
systemic risk area. The classifiers may run either in real-time or offline depending
on the particular risk area.
Provided the residual risk falls within acceptable levels, taking into account appropriate
safety margins, the model is approved for continued development, internal use (where
applicable), and launch (as the case may be). Where the residual risk exceeds acceptable
levels, further mitigation measures are considered and implemented. In each case, the
justification for proceeding will be documented by the risk owner. Our systemic risk tiers
guide decisions on whether additional mitigations are required to keep overall systemic
risk at an acceptable level prior to model release
2.6 Critical safety incident identification and response
Anthropic maintains a detailed Serious Incident Reporting Policy which sets out our
internal processes and measures for keeping track of, documenting, and reporting relevant
information about:
8Frontier Compliance Framework
●
Critical Safety Incidents pertaining to Anthropic’s Frontier Models in pursuant to
Section 22757.13 of California’s Transparency in Frontier AI Act (“TFAIA”); and
●
Serious AI Incidents along the entire GPAISR model lifecycle, in accordance with
Commitment 9 (Serious Incident Reporting) of the EU Code and the obligations in
Article 55(1)(c) of the EU AI Act.
We have put the following reporting and detection measures in place for observable events
that could signify the existence of a Serious AI Incident or Critical Safety Incident, but
requires further investigation (an “AI Event”). AI Events are assessed to determine whether
they amount to an AI Incident (and in turn a Serious AI Incident) and/or a Critical Safety
Incident, as the terms are defined under the relevant regulation.
Anthropic uses various methods including detection and response tooling, end-user
feedback, employee reporting channels, bug bounty programs, and community-driven
model evaluations to identify AI Events and determine whether they amount to a Serious AI
Incident and/or Critical Safety Incident. In some instances, an event may first be identified
as a part of Anthropic’s cybersecurity incident response processes, and later assessed to
also be a potential Serious AI Incident and/or Critical Safety Incident.
When an AI Event is identified, a member of our Security or Safeguards team (the AI
Incident Commander) will be promptly notified and will be responsible for our investigation
and response, including assembling an incident response team with appropriate subject
matter expert support. One or more members of the incident response team then leads a
technical investigation to enable the determination of whether the incident is an AI
Incident (and in turn a Serious AI Incident) and/or a Critical Safety Incident and inform
appropriate mitigation steps, including gathering relevant information for Anthropic's
reporting to appropriate authorities where applicable, pursuant to the relevant reporting
deadlines. If the incident is determined to be a Critical Safety Incident, the AI Incident
Commander also determines and documents whether the Critical Safety Incident poses an
imminent risk of death or serious physical injury.
We also acknowledge the importance of rectifying harms related to our models and
adopting corrective measures to prevent similar future incidents. Following the
identification of a Serious AI Incidents or a Critical Safety Incident, Anthropic also works to
identify any relevant lessons learned and where applicable consider ways to further assess
and mitigate systemic risks related to the Incident.
9Frontier Compliance Framework
To support our incident identification and response processes, we provide periodic training
to relevant employees on their obligations related to incident response under the TFAIA
and the EU AI Act, respectively.
3. Security Risk Management
We take a risk based approach to cybersecurity and physical security, and implement
controls to address evolving security threats and assessed risk. To ensure we are
appropriately managing the relevant security risks we have developed a register of the
specific threat actors to identify specific security risks that our security mitigations are
intended to protect against, as relevant to the current and reasonably expected capabilities
of our models.
We then implement security mitigations to ensure we adequately protect against these
identified threat actors as appropriate for each systemic risk tier. By way of non-exhaustive
example, we do and will implement the following mitigations and measures as appropriate:
●
General security mitigations: Anthropic operates a layered security architecture
that protects its networks, systems, and data from unauthorized access or misuse.
Access to company resources requires strong multi-factor authentication. Networks
are monitored for threats, and access rights are managed and reviewed to maintain
least-privilege principles. Production systems are fully segregated from
development environments, and data-loss controls help prevent unauthorized
transfers.
●
●
●
Protection of unreleased model weights: Unreleased model weights are protected
through encryption, strict access controls, and monitoring. Access is limited to
authorized personnel under controlled approval processes, and activities are logged
and reviewed. Automated systems detect and respond to unauthorized access or
movement of model weights.
Securing interface-access to unreleased model weights: Model parameters are
processed only within secure, isolated environments that prevent persistence or
unauthorized reuse. Access to model interfaces is restricted, rate-limited, and
monitored for abnormal or excessive activity. Alerts are automatically generated and
investigated when anomalous behavior is detected.
Application security: Security requirements are defined and integrated throughout
the software development lifecycle. Code is subject to peer review and automated
10Frontier Compliance Framework
●
●
●
security analysis prior to deployment. Systems processing sensitive data or
supporting critical functions undergo additional security testing to ensure
appropriate safeguards are in place.
Vulnerability management: A vulnerability management program enables
identification and prioritization of security vulnerabilities across the environment.
The program leverages automated scanning tools to monitor endpoints, container
registries, and codebases on a continuous basis. Identified vulnerabilities are
automatically assessed and personnel are alerted through appropriate channels
based on severity level to enable prioritized response and remediation.
Insider threat mitigations: We manage insider risk through personnel screening,
regular training, and strict role-based access management. Staff have clear
reporting channels to raise concerns, and internal monitoring supports early
identification of suspicious activity.
Security control monitoring, testing, and assessments: Security controls are
regularly tested and independently reviewed to ensure effectiveness. Penetration
testing, vulnerability disclosure programs, third party risk assessments, and incident
response tabletop exercises aim to help defenses remain robust, and insights from
these activities are used to strengthen the company’s security posture over time.
4. Model Reporting
The results of our systemic risk assessment and mitigation process, for models falling in
scope for the AISF, are documented in our AISF “Model Reports” (referred to as
“Transparency Reports” under the TFAIA). We will publish public summaries of these
assessments via standalone reports or as part of our model system cards upon model
launch.
Additionally, for any of our EU models that are subject to this Framework, if we have
reasonable grounds to believe that the justification for why risks stemming from the model
are acceptable as set out in the relevant Model Report has been materially undermined, we
will complete an additional full Systemic Risk Assessment. We will update our Model Report
as appropriate following this additional Systemic Risk Assessment.
In the case of all subsequent Systemic Risk Assessments, we will consider whether any part
of the previous Systemic Risk Assessments is still appropriate for the purpose of
11Frontier Compliance Framework
considering whether the model is acceptable. If any part of the previous Systemic Risk
Assessment is still appropriate, we may rely on those aspects of the previous Systemic Risk
Assessment.
In addition to carrying out full Systemic Risk Assessments as described above, we conduct
lighter-touch model evaluations (which may include running our automatic evaluations and
collaborating with external experts to test our models) to consider whether further
systemic risk mitigations may be required or a full Systemic Risk Assessment and Model
Report update is required. The below trigger points help us determine when a model is
substantially modified enough to require an additional Model Report for the updated model
as part of our obligations under the TFAIA.
●
Every nine months, unless an update of the relevant model is planned within a
month of the trigger point; and
●
A new model is in training and test model snapshots are available and appropriate
for early evaluation. Anthropic conducts comprehensive evaluations throughout the
development process for new models. These evaluations test model snapshots at
different stages of training to assess safety, alignment, and capability benchmarks,
enabling us to identify potential issues early on.
5. Input from External Experts
We may solicit input from external actors in relevant domains, and other stakeholders, in
the process of developing and implementing our systemic risk assessment processes
(including the identification of potential risks and appropriate safety and security
mitigations). We will also rely on commissioned research reports, discussions with domain
experts, input from expert forecasters, public research, engagement with the Frontier
Model Forum, and internal discussions in implementing our systemic risk assessment
processes.
We will also consider relevant market best practices in our ongoing evaluation of our
systemic risk assessment process, acknowledging that the assessment of risks, mitigations
and acceptability are likely to change as the field evolves and our understanding deepens.
12Frontier Compliance Framework
6. Allocation of Responsibility for Risk Management
Anthropic PBC and Anthropic Ireland Limited maintain internal governance structures and
practices designed to meet the requirements of applicable laws and ensure implementation
of the processes in this Framework. Anthropic’s internal governance practices include
managing risks across the entire lifecycle of our models and ongoing legal and compliance
reviews to ensure that risk management functions adhere to this Framework.
Anthropic PBC is responsible for compliance with the TFAIA for Frontier Models in the
United States.
Anthropic Ireland Limited is the provider of Anthropic's GPAISR models in the EU and is
responsible for compliance with the EU Code. The board of directors of Anthropic Ireland
Limited oversees implementation of this Framework for EU purposes.
7. Framework Change Management
Anthropic commits to ensuring that this Framework is state-of-the-art and reflects
Anthropic’s current policies with respect to compliance with the TFAIA and the EU Code.
7.1 Update and approval process
Updates to this Framework may be proposed by Anthropic’s Head of Safeguards,
Responsible Scaling Officer, General Counsel, Head of Integrity & Compliance, or Chief
Information Security Officer. The Legal and Compliance function will coordinate the
governance process for Framework updates, including determining which updates are
required to ensure the Framework remains state-of-the-art and adequate for its purpose.
With respect to compliance with the EU Code, the Legal and Compliance function will also
determine which updates are required to comply with any remediation plans following
negative adherence assessments. Material updates will be presented to the board of
directors of Anthropic Ireland Limited for oversight, with approved changes and
justifications for material updates documented in a changelog and published within 30 days
of the update.
The Legal and Compliance function will also determine which updates are required based
on factors including, but not limited to, changes in law or regulatory guidance, changes in
13Frontier Compliance Framework
frontier model capabilities and related technologies, new approaches to mitigations and
safeguards, other incidents affecting the industry, and new industry best practices and
standards.
7.2 Framework assessment
We will complete a Framework Assessment: (a) at least once every 12 months from the
Effective Dates of the TFAIA and the EU Code; and (b) if the relevant factors in the update
and approval process are satisfied.
Our assessment will consider the adequacy of our Framework and our factors for
determining whether updates are required. With respect to compliance with the EU Code,
if we identify any instances of non-adherence or any measures that are required to be
implemented to ensure continued adherence, we will draft and implement a remediation
plan. We will update the Framework following such Framework Assessment, with a
justification for each material update.
Changelog
March 2, 2026 Update
Revised risk tiers in Section 2.4 across all four systemic risk
categories to better align with our evolving threat models and
capability assessments. Introduced nascent risk tiers for
Harmful Manipulation.
December 19, 2025 Initial Version