An open standard proposal for quantifying Agentic AI risk. Adapted from ISO 26262 to move beyond "Safety" into Business Integrity.
GenAI chatbots produce probabilistic outcomes, and companies manage that uncertainty through direct supervision. But once an AI agent acts on its own, oversight disappears and organizations are suddenly exposed to unforeseen financial, legal, and reputational risks. Therefore, we propose a Systematic Agent Risk Assessment (SARA) to quantify those risks before deployment.
Step 1: Review your AI application against the following three variables
What's the worst-case impact of an agent failure?
How grounded is the agent's output?
What checks and balances affect the agent?
Step 2: Use your choices to identify the required Agentic Business Integrity Level (ABIL)
| Severity Level (S) | Certainty Level (C) | Autonomy Level (A) | ||
|---|---|---|---|---|
| A0 Human Loop |
A1 Slow Loop |
A2 Autonomous |
||
Minor |
Deterministic |
QM | QM | QM |
Bounded |
QM | QM | QM | |
Unbounded |
QM | QM | ABIL A | |
Major |
Deterministic |
ABIL A | ABIL B | ABIL B |
Bounded |
ABIL A | ABIL B | ABIL C | |
Unbounded |
ABIL A | ABIL B | ABIL C | |
Critical |
Deterministic |
ABIL B | ABIL C | ABIL D |
Bounded |
ABIL C | ABIL C | ABIL D | |
Unbounded |
ABIL C | ABIL C | ABIL D | |
A marketing agent generating draft social media posts for human review (S2 Severity, C3 Certainty, A0 Autonomy) results in ABIL-A requirements. Possible mitigation could include constitutional prompting (guiding principles) to ensure the content is appropriate before the human clicks "approve."
A procurement bot drafting purchase orders with a 2-hour delay before execution (S2 Severity, C2 Certainty, A1 Autonomy) suggests ABIL-B requirements. Possible mitigation could include dual-verification where a secondary "Critic" model reviews the order for hallucinations during the holding period.
A customer service bot empowered to instantly apply credits to a live billing system (S2 Severity, C2 Certainty, A2 Autonomy) results in ABIL-C risk. Possible mitigation could include using hard-coded Python logic gates (e.g., if credit > creditLimit: abort) rather than relying on the model's own judgment.
An autonomous agent managing load balancing for a regional power grid (S3 Severity, C2 Certainty, A2 Autonomy) produces an ABIL-D risk assessment. Possible mitigations could include operating the agent in a fully isolated sandbox where every command is mathematically verified against process constraints before touching critical infrastructure.
Had the following chatbot failures been vetted against OpenSARA, the risks would have been flagged before deployment. The risk of damages will only get worse as companies adopt Agentic AI.