Claude Adoption
Claude Trust & safety
T3 governs & tests your Claude Code configuration, for every firm, every sector.
With Mythos, Anthropic demonstrated that AI can autonomously identify and exploit zero-day vulnerabilities across complex systems. Yours included. AI is no longer static - it continuously creates new risks and discovers new failure modes faster than organisations can respond.
Human
accountability,
engineered.
Regulators, courts and boards will not accept “the supervising AI approved it” as a defence. T3 designs, documents and defends the human accountability architecture that makes agentic AI legally deployable - on any model, in any regulated context.
An agentic system decomposes into four layers - each with its own failure modes, each with its own owner.
“The AI got it wrong” is almost never a useful diagnosis. The questions that matter are which layer, and who owns the control. T3 starts every engagement by mapping the system to the four layers below.
4 AREAS WHERE AI SYSTEMS FAIL
04 layers · 02 views per layerThe Model
The underlying LLM: what it knows, how it reasons, how it behaves under adversarial pressure.
Hallucination under uncertainty without a calibrated signal. Capability gains shifting the attack surface. Silent version changes breaking last quarter’s evaluation.
- No calibrated confidence signal surfaced to the user
- Adversarial robustness untested at the post-deployment model version
- Training-data drift and bias reappearing in regulated outputs
The Harness
The instructions, policies and guardrails wrapped around the model - how business rules are encoded.
Prompt injection from untrusted content overriding intent. Regulatory language translated imprecisely into system prompts. Policy drift without evidence of approval.
- “Be fair” encoded as a control; no machine-checkable objective
- Heuristic confidence thresholds (“ask if < 80%”) with no statistical basis
- Indirect prompt injection via emails, documents and web pages
The Tools
APIs, data sources and systems the agent can read from or act on - the blast radius of any decision.
Over-broad permissions. Tool misuse with no circuit-breaker. Sub-agent delegation chains without identity, authentication or audit trail.
- Write access granted where read-only was sufficient
- Reward hacking: the agent games a tool rather than solving the task
- Sub-agent handoffs with no authentication between them
The Environment
Where the agent runs, which users and data it can reach, and the controls around it.
Regulated data reachable by the agent context. Logging too shallow to reconstruct a decision for a regulator. Human reviewers miscalibrated, under-trained, or rubber-stamping.
- PII or MNPI accessible to the agent context by default
- No named accountable owner for the deployed system
- “Human in the loop” present in diagram, absent in practice
A methodology for designing and assuring Human on the Loop (HOTL) oversight.
Trust & Controls is T3’s methodology for conditional human involvement across agentic AI deployments - escalation, override, approval, fallback and post-hoc auditability - regardless of the underlying model.
Human Oversight Architecture
Escalation points, override rights, approval gates and fallback paths - every checkpoint mapped to a named, accountable role under EU AI Act Art. 14, FCA SMCR and sector-specific rules.
- Irreversibility classification for every action
- Plan-level approval for multi-step tasks
- Checkpoints calibrated to blast radius, not volume
Objective Specification
Translate shifting regulatory intent - fair lending, appropriate advice, compliant disclosure - into machine-checkable objectives. Monitor for drift as case law and rules evolve.
- Timestamped knowledge validation
- Goal fidelity across sub-task decomposition
- Reward hacking and specification gaming monitors
Adversarial Resilience
Trajectory-based red-teaming, prompt-injection defence, tool-misuse testing and permission-boundary probes. Validated attack methods as a continuous control.
- Multi-turn attacks exploiting agent memory
- Tool-chain attacks pivoting across tools
- Indirect prompt injection via read content
Agent Evaluation
Delegation-chain audits, sub-agent coordination review, reward-hacking tests and calibrated deferral. Does the system know when to stop, ask, or escalate? Statistically grounded thresholds, not heuristics.
- Multi-agent trust hierarchies
- Conformal prediction bounds for deferral
- Sub-agent identity and least-privilege
Evidence & Auditability
Tamper-resistant logs, timestamped knowledge-update trails, reviewer calibration analytics and control-effectiveness measurement - the record that satisfies regulators, auditors and courts.
- Full trajectory logging with reasoning trace
- Agent/reviewer disagreement analytics
- Decision trails mapped to named humans
A defensible assurance record.
Each domain produces signed artefacts and measurable evidence. Together they form the oversight architecture you can put in front of the board, the auditor, and the regulator.
- Oversight architecture document
- Control-effectiveness dashboard
- Regulator-ready evidence pack
A T3 engagement runs through five sequential phases. Each has defined activities and named, signed-off outputs.
The evidence produced in each phase is the evidence that satisfies the auditor, the regulator, and the board. Phase 05 feeds continuously back into Phase 02 as the system and the attacker capability evolve.
Understand
Requirements, scope, risk appetite.
Map
Inventory the four layers.
Design
HOTL architecture and controls.
Test & Measure
Trajectory-based adversarial testing.
Manage & Assure
Continuous assurance, regulator-ready.
Understand Requirements
Regulatory scope, use case, risk appetite and accountability lines.
- Stakeholder interviews: legal, risk, product, compliance
- Use-case scoping and boundary definition
- Applicable-regulation mapping
- Risk-appetite articulation with accountable exec
- Scoping & context memo
- Regulatory obligations register
- RACI for the deployed system
Map the System
Inventory model, harness, tools and environment.
- Architecture review of the four layers
- Data flow and permission-boundary mapping
- Delegation and sub-agent chain inventory
- Existing-controls gap analysis
- Four-layer system map
- Risk & control matrix
- Gap-to-target assessment
Design the Controls
HOTL architecture, escalation triggers, objective specification.
- Escalation, override, approval and fallback design
- Statistical (conformal) thresholds for deferral
- Objective specification and drift-monitoring plan
- Reviewer calibration and workload sizing
- Oversight architecture document
- Signed control specifications
- Reviewer playbooks
Test & Measure
Trajectory-based adversarial testing and control-effectiveness measurement.
- Multi-turn adversarial scenarios
- Permission-boundary and delegation probes
- Deferral-reliability and escalation testing
- Meta-audit: was a validated attack suite used?
- Adversarial test report
- Control-effectiveness metrics
- Remediation plan
Manage & Assure
Continuous monitoring and regulator-ready evidence into BAU.
- Live monitoring of drift and escalation rates
- Periodic re-testing on refreshed attacker capabilities
- Incident, near-miss and override analytics
- Regulator-ready evidence pack maintenance
- Continuous-assurance dashboard
- Periodic attestation report
- Board & regulator briefing pack
Anchored to NIST AI RMF 1.0 - cross-walked to the EU AI Act, FCA and ISO/IEC 42001.
T3’s approach is anchored to the NIST AI Risk Management Framework, the most widely-adopted voluntary standard for trustworthy AI, cross-walked to binding obligations under the EU AI Act, UK FCA rules and US sector regulation - supplemented by the 2025-2026 peer-reviewed research that sets the current standard of care.
Govern
Runs across every phaseNamed-role RACI, risk-appetite statement, board reporting cadence, policy and standards set - the function that makes the other three defensible.
Map
Four-layer system map, regulatory obligations register, risk and control matrix, go/no-go input.
Measure
Trajectory-based adversarial tests, control-effectiveness metrics, meta-audit of the testing itself.
Manage
Continuous monitoring, override and appeal mechanisms, change management, decommissioning.
Standards-anchored
Every control traces to NIST AI RMF, ISO/IEC 42001, EU AI Act, GDPR, FCA, US state AI laws (Colorado SB24-205, NYC Local Law 144, California AB 2013 / SB 942) or a peer-reviewed method. No bespoke frameworks invented for the engagement.
Continuous, not point-in-time
Attacker capability scales with model capability; objectives shift with case law. Controls are built for ongoing assurance, not a one-off audit.
Evidence over opinion
Statistical thresholds replace heuristic confidence. Conformal prediction, validated benchmarks and tested attack suites produce numbers that stand up.
Independent by design
T3 is not a vendor and does not resell models. Our assurance is separable from the supply chain - the independent layer boards increasingly require.
NIST AI RMF 1.0 is a voluntary framework. T3 uses it as the structural backbone for alignment with binding obligations under the EU AI Act, UK FCA rules, US sector regulation, and ISO/IEC 42001.
Illustrative engagements drawn from regulated-industry advisory work.
Names and specifics are generalised; the framework, controls and evidence are representative of what a T3 engagement delivers. Named client references available under NDA.
Agentic advisor in a wealth-management workflow

A wealth manager deploying an LLM-based agent to prepare client suitability memos - reading CRM records, calling a risk-scoring API, drafting and routing to adviser. Regulator concern: FCA Consumer Duty and SMCR sign-off, evidenced.
- Four-layer map; irreversibility classification of every agent action
- HOTL checkpoints calibrated to client impact, with statistical deferral thresholds
- Timestamped knowledge validation for FCA updates
- Multi-turn adversarial tests covering prompt injection from client content
Deployable control design with every irreversible action requiring named adviser approval. Regulator-ready evidence pack mapping each control to EU AI Act Art. 14 and Consumer Duty obligations.
Multi-agent claims triage with delegated sub-agents

A general insurer piloting an agentic claims workflow - primary agent routing work to specialist sub-agents for image analysis, policy interpretation and fraud signal review. Regulator concern: auditability and consumer fairness under EU AI Act Annex III.
- Multi-agent trust hierarchy with explicit delegation boundaries and authentication
- Full trajectory logging: every plan revision and sub-agent call reconstructable
- Reward-hacking tests and disagreement analytics between agents and reviewers
- Irreversibility classification: no auto-settlement without named sign-off
Defensible oversight architecture for a multi-agent deployment, with disagreement analytics surfacing edge cases for continuous reviewer calibration. Evidence pack cross-walked to NIST AI RMF Measure and Manage.
2025–2026 AI Research — grouped by the control domain each paper most directly informs.
A.Human oversight: from HITL to HOTL
Human-in-the-Loop AI: A Systematic Review
Lazaros, Vrahatis et al. 3D taxonomy (loop placement, granularity, temporal) used by T3 as reference for classifying client HITL architectures during audits.
↗Humans in the Loop, Lives on the Line
Balamurugan et al. HITL design principles across healthcare, finance and fraud; maps to FCA Consumer Duty, SR 11-7 and EU AI Act high-risk categories.
↗HITL Testing for Air Traffic Control (NATS)
Pepper, Thomas et al. How regulator rubrics for human operators translate into AI evaluation criteria; transferable logic from CAA to FCA.
↗Effective HITL Assistive AI Agents
Bellos, Li et al. Empirical framework for measuring HITL collaboration effectiveness; adaptable to T3 client KPIs.
↗HITL Software Development Agents (HULA)
Takerngsaksiri et al. Production two-stage HITL pattern; analogy for FCA/SMCR-style accountability with named sign-off at each checkpoint.
↗ARIA: Self-Improving Agents at Test Time
He et al. Timestamped, auditable knowledge-update loop; addresses EU AI Act Art. 12 (records) and Art. 14 (oversight).
↗B.Objective specification & uncertainty
KnowNo: Robots That Ask For Help
Ren et al. (DeepMind / Princeton). Conformal prediction gives statistical guarantees on when an agent should escalate.
↗Real-World HITL Deep RL
Arabneydi et al. Design framework for HITL-DRL in production; reference for trading, dynamic pricing, process optimisation.
↗HIAT: HITL RL with Auxiliary Task
Niu, Luo, Cui et al. Sample-efficiency for human feedback in RL; reduces reviewer workload in regulated HITL deployments.
↗C.Adversarial testing, evolution of practice
Red Teaming Roadmap to System-Level Safety
Single-turn prompt attacks are necessary but insufficient; trajectory-based is the emerging standard-of-care.
↗Automatic LLM Red Teaming
Belaire, Sinha, Varakantham. One AI strategically attacks another across multi-turn conversations; reference for continuous automated red-teaming.
↗Scaling Trends for LLM Red-Teaming
600+ attacker-target combinations. Attacker success scales approximately linearly with capability; evidence for continuous, not one-off, red-teaming.
↗Evaluating the Evaluators
Cinà, Pintor et al. AttackBench: verify that red-teaming uses validated attacks. Foundational for T3’s meta-audit / third-line assurance offering.
↗Adversarial ML Harder to Solve & Evaluate
Distinguishes security-assurance engagements from academic-robustness work; framing document for T3 methodology training.
↗Adversarial Robustness in Financial ML
Reproducible pipeline with ~10.6% AUC drop from plausibility-bounded perturbations; SHAP-stability degrades before AUC (early-warning signal).
↗D.Regulatory, governance & direction
Adversarial Robustness in Multimodal LLMs
Vision-language, audio-language and cross-modal attack surfaces; reference for KYC, document review and ID verification deployments.
↗AI in Securities Markets (IMF TN 2025/016)
Robo-advisory, asset management, trade execution and systemic-herding risk; authoritative source for capital-markets client briefings.
↗Infrastructure for AI Agents
Agents need an ecosystem layer (identity, certification, authentication); strategic framing for T3’s agent-assurance service line.
↗This register is a research scan, not legal or regulatory advice. Before relying on any methodology for client work, T3 validates it against the specific regulatory context and performs client-specific testing.
Not just technically impressive - defensible.
It’s not just compliance. It’s protecting the business, satisfying the customer, and having a defensible answer when the regulator asks who signed off. T3 designs the controls, documents the evidence, and stands behind the assurance.
Make your Claude Code deployment genuinely governed
Request the evaluation pack. Sample attestation, before/after demo, CI/CD reference config, under NDA, no commitment required. Every organisation using Claude Code should see what governed looks like.
Book a free AI Adoption Consultation
STOP INVENTING
START IMPROVING
Contact