Claude Adoption

Claude Trust & safety

T3 governs & tests your Claude Code configuration, for every firm, every sector.

Live threat · 2026

With Mythos, Anthropic demonstrated that AI can autonomously identify and exploit zero-day vulnerabilities across complex systems. Yours included. AI is no longer static - it continuously creates new risks and discovers new failure modes faster than organisations can respond.

Trust & Controls for Agentic AI

Human
accountability,
engineered.

Regulators, courts and boards will not accept “the supervising AI approved it” as a defence. T3 designs, documents and defends the human accountability architecture that makes agentic AI legally deployable - on any model, in any regulated context.

04
Layer assessment
05
Control domains
20+
Peer-reviewed sources
Model-agnostic
Any frontier LLM
§ 01  /  Risk Surface
What agentic AI does, and where it fails

An agentic system decomposes into four layers - each with its own failure modes, each with its own owner.

“The AI got it wrong” is almost never a useful diagnosis. The questions that matter are which layer, and who owns the control. T3 starts every engagement by mapping the system to the four layers below.

Fig. 02 - A single unaudited decision is enough to breach.§ 01 / RISK

4 AREAS WHERE AI SYSTEMS FAIL

04 layers  ·  02 views per layer
Layer
Primary failure modes
Typical audit findings
L · 01

The Model

The underlying LLM: what it knows, how it reasons, how it behaves under adversarial pressure.

Primary failure modes

Hallucination under uncertainty without a calibrated signal. Capability gains shifting the attack surface. Silent version changes breaking last quarter’s evaluation.

Typical audit findings
  • No calibrated confidence signal surfaced to the user
  • Adversarial robustness untested at the post-deployment model version
  • Training-data drift and bias reappearing in regulated outputs
L · 02

The Harness

The instructions, policies and guardrails wrapped around the model - how business rules are encoded.

Primary failure modes

Prompt injection from untrusted content overriding intent. Regulatory language translated imprecisely into system prompts. Policy drift without evidence of approval.

Typical audit findings
  • “Be fair” encoded as a control; no machine-checkable objective
  • Heuristic confidence thresholds (“ask if < 80%”) with no statistical basis
  • Indirect prompt injection via emails, documents and web pages
L · 03

The Tools

APIs, data sources and systems the agent can read from or act on - the blast radius of any decision.

Primary failure modes

Over-broad permissions. Tool misuse with no circuit-breaker. Sub-agent delegation chains without identity, authentication or audit trail.

Typical audit findings
  • Write access granted where read-only was sufficient
  • Reward hacking: the agent games a tool rather than solving the task
  • Sub-agent handoffs with no authentication between them
L · 04

The Environment

Where the agent runs, which users and data it can reach, and the controls around it.

Primary failure modes

Regulated data reachable by the agent context. Logging too shallow to reconstruct a decision for a regulator. Human reviewers miscalibrated, under-trained, or rubber-stamping.

Typical audit findings
  • PII or MNPI accessible to the agent context by default
  • No named accountable owner for the deployed system
  • “Human in the loop” present in diagram, absent in practice
§ 02  /  The T3 Framework
Five Control Domains

A methodology for designing and assuring Human on the Loop (HOTL) oversight.

Trust & Controls is T3’s methodology for conditional human involvement across agentic AI deployments - escalation, override, approval, fallback and post-hoc auditability - regardless of the underlying model.

D · 01

Human Oversight Architecture

Escalation points, override rights, approval gates and fallback paths - every checkpoint mapped to a named, accountable role under EU AI Act Art. 14, FCA SMCR and sector-specific rules.

Agentic specifics
  • Irreversibility classification for every action
  • Plan-level approval for multi-step tasks
  • Checkpoints calibrated to blast radius, not volume
D · 02

Objective Specification

Translate shifting regulatory intent - fair lending, appropriate advice, compliant disclosure - into machine-checkable objectives. Monitor for drift as case law and rules evolve.

Agentic specifics
  • Timestamped knowledge validation
  • Goal fidelity across sub-task decomposition
  • Reward hacking and specification gaming monitors
D · 03

Adversarial Resilience

Trajectory-based red-teaming, prompt-injection defence, tool-misuse testing and permission-boundary probes. Validated attack methods as a continuous control.

Agentic specifics
  • Multi-turn attacks exploiting agent memory
  • Tool-chain attacks pivoting across tools
  • Indirect prompt injection via read content
D · 04

Agent Evaluation

Delegation-chain audits, sub-agent coordination review, reward-hacking tests and calibrated deferral. Does the system know when to stop, ask, or escalate? Statistically grounded thresholds, not heuristics.

Agentic specifics
  • Multi-agent trust hierarchies
  • Conformal prediction bounds for deferral
  • Sub-agent identity and least-privilege
D · 05

Evidence & Auditability

Tamper-resistant logs, timestamped knowledge-update trails, reviewer calibration analytics and control-effectiveness measurement - the record that satisfies regulators, auditors and courts.

Agentic specifics
  • Full trajectory logging with reasoning trace
  • Agent/reviewer disagreement analytics
  • Decision trails mapped to named humans
Σ · Outcome

A defensible assurance record.

Each domain produces signed artefacts and measurable evidence. Together they form the oversight architecture you can put in front of the board, the auditor, and the regulator.

Deliverables include
  • Oversight architecture document
  • Control-effectiveness dashboard
  • Regulator-ready evidence pack
§ 03  /  Engagement Process
Five phases, one continuous loop

A T3 engagement runs through five sequential phases. Each has defined activities and named, signed-off outputs.

The evidence produced in each phase is the evidence that satisfies the auditor, the regulator, and the board. Phase 05 feeds continuously back into Phase 02 as the system and the attacker capability evolve.

1 Phase 01

Understand

Requirements, scope, risk appetite.

2 Phase 02

Map

Inventory the four layers.

3 Phase 03

Design

HOTL architecture and controls.

4 Phase 04

Test & Measure

Trajectory-based adversarial testing.

5 Phase 05

Manage & Assure

Continuous assurance, regulator-ready.

- Continuous assurance loop · Phase 05 → Phase 02 →
Phase · 01

Understand Requirements

Regulatory scope, use case, risk appetite and accountability lines.

Activities
  • Stakeholder interviews: legal, risk, product, compliance
  • Use-case scoping and boundary definition
  • Applicable-regulation mapping
  • Risk-appetite articulation with accountable exec
Outputs
  • Scoping & context memo
  • Regulatory obligations register
  • RACI for the deployed system
Phase · 02

Map the System

Inventory model, harness, tools and environment.

Activities
  • Architecture review of the four layers
  • Data flow and permission-boundary mapping
  • Delegation and sub-agent chain inventory
  • Existing-controls gap analysis
Outputs
  • Four-layer system map
  • Risk & control matrix
  • Gap-to-target assessment
Phase · 03

Design the Controls

HOTL architecture, escalation triggers, objective specification.

Activities
  • Escalation, override, approval and fallback design
  • Statistical (conformal) thresholds for deferral
  • Objective specification and drift-monitoring plan
  • Reviewer calibration and workload sizing
Outputs
  • Oversight architecture document
  • Signed control specifications
  • Reviewer playbooks
Phase · 04

Test & Measure

Trajectory-based adversarial testing and control-effectiveness measurement.

Activities
  • Multi-turn adversarial scenarios
  • Permission-boundary and delegation probes
  • Deferral-reliability and escalation testing
  • Meta-audit: was a validated attack suite used?
Outputs
  • Adversarial test report
  • Control-effectiveness metrics
  • Remediation plan
Phase · 05

Manage & Assure

Continuous monitoring and regulator-ready evidence into BAU.

Activities
  • Live monitoring of drift and escalation rates
  • Periodic re-testing on refreshed attacker capabilities
  • Incident, near-miss and override analytics
  • Regulator-ready evidence pack maintenance
Outputs
  • Continuous-assurance dashboard
  • Periodic attestation report
  • Board & regulator briefing pack
§ 04  /  Approach
Methodical, evidence-based, standards-anchored

Anchored to NIST AI RMF 1.0 - cross-walked to the EU AI Act, FCA and ISO/IEC 42001.

T3’s approach is anchored to the NIST AI Risk Management Framework, the most widely-adopted voluntary standard for trustworthy AI, cross-walked to binding obligations under the EU AI Act, UK FCA rules and US sector regulation - supplemented by the 2025-2026 peer-reviewed research that sets the current standard of care.

A world map composed of individual people figures
Fig. 03 - Binding regulation, cross-walked to voluntary standards.§ 04 / APPROACH
- Aligned to NIST AI RMF 1.0 Core Functions -
Cross-cutting function

Govern

Runs across every phase

Named-role RACI, risk-appetite statement, board reporting cadence, policy and standards set - the function that makes the other three defensible.

Function 01

Map

Four-layer system map, regulatory obligations register, risk and control matrix, go/no-go input.

Phases 01 & 02 - Understand & Map
Function 02

Measure

Trajectory-based adversarial tests, control-effectiveness metrics, meta-audit of the testing itself.

Phases 03 & 04 - Design & Test
Function 03

Manage

Continuous monitoring, override and appeal mechanisms, change management, decommissioning.

Phase 05 - Manage & Assure
P · 01

Standards-anchored

Every control traces to NIST AI RMF, ISO/IEC 42001, EU AI Act, GDPR, FCA, US state AI laws (Colorado SB24-205, NYC Local Law 144, California AB 2013 / SB 942) or a peer-reviewed method. No bespoke frameworks invented for the engagement.

P · 02

Continuous, not point-in-time

Attacker capability scales with model capability; objectives shift with case law. Controls are built for ongoing assurance, not a one-off audit.

P · 03

Evidence over opinion

Statistical thresholds replace heuristic confidence. Conformal prediction, validated benchmarks and tested attack suites produce numbers that stand up.

P · 04

Independent by design

T3 is not a vendor and does not resell models. Our assurance is separable from the supply chain - the independent layer boards increasingly require.

NIST AI RMF 1.0 is a voluntary framework. T3 uses it as the structural backbone for alignment with binding obligations under the EU AI Act, UK FCA rules, US sector regulation, and ISO/IEC 42001.

§ 05  /  Case Studies
Trust & Controls in practice

Illustrative engagements drawn from regulated-industry advisory work.

Names and specifics are generalised; the framework, controls and evidence are representative of what a T3 engagement delivers. Named client references available under NDA.

CS · 01  ·  Financial Services

Agentic advisor in a wealth-management workflow

Miniature workers at desks - supervised workflow
The challenge

A wealth manager deploying an LLM-based agent to prepare client suitability memos - reading CRM records, calling a risk-scoring API, drafting and routing to adviser. Regulator concern: FCA Consumer Duty and SMCR sign-off, evidenced.

What T3 did
  • Four-layer map; irreversibility classification of every agent action
  • HOTL checkpoints calibrated to client impact, with statistical deferral thresholds
  • Timestamped knowledge validation for FCA updates
  • Multi-turn adversarial tests covering prompt injection from client content
Outcome

Deployable control design with every irreversible action requiring named adviser approval. Regulator-ready evidence pack mapping each control to EU AI Act Art. 14 and Consumer Duty obligations.

CS · 02  ·  Insurance

Multi-agent claims triage with delegated sub-agents

Abstract brain architecture - multi-agent coordination
The challenge

A general insurer piloting an agentic claims workflow - primary agent routing work to specialist sub-agents for image analysis, policy interpretation and fraud signal review. Regulator concern: auditability and consumer fairness under EU AI Act Annex III.

What T3 did
  • Multi-agent trust hierarchy with explicit delegation boundaries and authentication
  • Full trajectory logging: every plan revision and sub-agent call reconstructable
  • Reward-hacking tests and disagreement analytics between agents and reviewers
  • Irreversibility classification: no auto-settlement without named sign-off
Outcome

Defensible oversight architecture for a multi-agent deployment, with disagreement analytics surfacing edge cases for continuous reviewer calibration. Evidence pack cross-walked to NIST AI RMF Measure and Manage.

§ 06  /  Evidence Base
Peer-reviewed research behind the framework

2025–2026 AI Research — grouped by the control domain each paper most directly informs.

This register is a research scan, not legal or regulatory advice. Before relying on any methodology for client work, T3 validates it against the specific regulatory context and performs client-specific testing.

Make your AI legally deployable

Not just technically impressive - defensible.

It’s not just compliance. It’s protecting the business, satisfying the customer, and having a defensible answer when the regulator asks who signed off. T3 designs the controls, documents the evidence, and stands behind the assurance.

Senior T3 advisor
Speak directly with a senior T3 advisor.
No sales funnel · 30-min slots · Calendly
T3 T3 Consultants Ltd
London San Francisco
© 2024-2026 T3 Consultants Ltd · Registered in England & Wales 13034838

Make your Claude Code deployment genuinely governed

Request the evaluation pack. Sample attestation, before/after demo, CI/CD reference config, under NDA, no commitment required. Every organisation using Claude Code should see what governed looks like.

Book a free AI Adoption Consultation

Discover Our Services

Serving Organisations Across the UK, EU, US and Beyond

STOP INVENTING
START IMPROVING

Contact

Contact Us