Unit 03

How we could actually do this

This is the engineering unit. It is organised by mechanism family rather than author or year, so you can move from the policy motivation into the practical question: what would actually let a third party verify claims about AI development.

Hardware-rooted approaches

Hardware-rooted approaches try to build tamper evidence and governance hooks into chips, packaging, memory, or the cluster surrounding them. They are physically grounded, but their deployment cycles are slow and their threat models have to be unusually explicit.

READ 3.1

Secure, Governable Chips

Aarne, Fist, Withers (CNAS, 2024)

A policy-forward case for embedding governance mechanisms in chips. Read for the design constraints this would place on chip vendors and deployers.

Available via cnas.org.
READ 3.2

Hardware-Enabled Governance Mechanisms

Kulp et al. (RAND, 2024)

A concrete catalogue of possible on-chip governance mechanisms and deployment paths. Useful for understanding vendor incentives and engineering trade-offs.

Available via rand.org.
READ 3.3

Flexible Hardware-Enabled Guarantees (flexHEG)

Petrie, Hodgkins et al. (2025)

A detailed proposal for a tamper-resistant compute-governance device. Read for the architecture and for the threat-model assumptions it depends on.

→ arXiv:2506.15093
READ 3.4

Guaranteeable Memory: An HBM-Based Chiplet for Verifiable AI Workloads

Petrie (2025)

A chiplet-level proposal for making memory participate in verifiable AI workloads. Read as a lower-level counterpart to flexHEG-style system designs.

Available via the AI Security Forum.
READ 3.5

Guardain: Protecting Emerging Generative AI Workloads on Heterogeneous NPU

Dhar et al. (IEEE S&P, 2025)

A hardware security paper focused on protecting generative workloads on NPUs. Read for the accelerator-engineering view of where protection can sit.

Available via IEEE Xplore.

Inference verification

Inference verification asks whether a deployed model is the model that was safety-tested, without revealing weights or disrupting production workloads. The approaches range from practical statistical checks to heavier cryptographic proofs.

READ 3.6

Verifying LLM Inference to Detect Model Weight Exfiltration

Rinberg, Karvonen, Hoover, Reuter, Warr (2025)

The practical statistical framing of inference verification. Read for FSSL, Token-DiFR, and the limits of detecting model-weight exfiltration from outputs.

→ arXiv:2511.02620
READ 3.7

DiFR: Inference Verification Despite Nondeterminism

Karvonen et al. (2025)

The companion paper on verification despite nondeterminism. Read alongside Rinberg et al. for the current practical picture of API inference checks.

→ arXiv:2511.20621
READ 3.8

zkLLM: Zero Knowledge Proofs for Large Language Models

Sun et al. (CCS '24)

The zero-knowledge route for LLM inference. Read for what ZK proofs can guarantee, and the overheads that still make the approach difficult.

Available via the ACM Digital Library.
READ 3.9

Verifiable evaluations of machine learning models using zkSNARKs

South et al. (2024)

A lighter-weight use of zkSNARKs for verifiable model evaluation. Useful for thinking about audits where the model or data cannot be revealed.

→ arXiv:2402.02675
READ 3.10

ZKML: An Optimizing System for ML Inference in Zero-Knowledge Proofs

Chen et al. (EuroSys '24)

The systems side of ZK inference: compilers, runtime optimisations, and the engineering work needed to make proofs less expensive.

Available via the ACM Digital Library.
READ 3.11

Trustless Audits without Revealing Data or Models

Waiwitlikhit et al. (ICML, 2024)

A privacy-preserving audit scheme that lets claims about a model and dataset be checked without revealing either. Read for the threat model.

Available via the ICML proceedings.

Telemetry & detection

Telemetry approaches infer what a system is doing from network traffic, timing, memory access, power, or other observable signals. They matter when verification must work at cluster scale or without chip-vendor cooperation.

READ 3.12

The Fundamentals and Feasibility of Secure Network Taps for Verifying AI Datacenter Use

Cankaya et al. (2026)

A network-layer treatment of workload classification in AI datacentres. Read for what secure taps can and cannot reveal at scale.

Forthcoming.
READ 3.13

Timing and Memory Telemetry on GPUs for AI Governance

Monfared et al. (2026)

The chip-level analogue to network telemetry. Read for how timing and memory signals might support governance claims about GPU workloads.

Forthcoming.
READ 3.14

What does it take to catch a Chinchilla? Verifying Rules on Large-Scale Neural Network Training

Shavit (2023)

A training-compute verification proposal. Read for the mechanics of detecting whether large-scale training rules are being followed.

→ arXiv:2303.11341

Attestation & audit

Attestation and audit approaches create institutional or technical routes for a third party to trust specific claims: what ran, where it ran, who certified it, and which intermediary can be held responsible.

READ 3.15

Attestable Audits: Verifiable AI Safety Benchmarks Using Trusted Execution Environments

Schabl et al. (2025)

A TEE-based approach to verifiable AI safety benchmarks. Read for the promise and fragility of trusting secure execution environments.

→ arXiv:2506.23706
READ 3.16

International Governance of Civilian AI: A Jurisdictional Certification Approach

Trager (2023)

An alternative trust model based on jurisdictional certification. Useful when hardware alone cannot carry the verification burden.

→ arXiv:2308.15514
READ 3.17

Governing Through the Cloud: The Intermediary Role of Compute Providers in AI Regulation

Heim et al. (Oxford Martin, 2024)

The cloud-provider chokepoint argument. Read for why compute providers may be central institutional actors in any realistic verification regime.

Available via Oxford Martin.
READ 3.18

Verification Methods for International AI Agreements

Wasil et al. (2024)

A comparative survey across hardware, cryptographic, inspection, and institutional mechanisms. Read for deciding which mechanism fits which agreement.

→ arXiv:2408.16074

Cross-cutting components

Some work contributes a component rather than a whole verification regime. These pieces are useful for project ideas that combine hardware, software, cryptography, and audit design.

READ 3.19

Tools for verifying neural models' training data

Choi et al. (NeurIPS '23)

A training-data verification problem: how can a verifier check what data a model was trained on. Read as a counterpoint to inference-focused mechanisms.

Available via the NeurIPS proceedings.
READ 3.20

Software-Based Memory Erasure with relaxed isolation requirements

Bursuc et al. (2024)

A memory-erasure protocol relevant to audits that require proof that sensitive data, weights, or checkpoints have actually been destroyed.

→ arXiv:2401.06626
READ 3.21

Off-Chip Compute Verification

Baker (2026)

The cluster-level version of compute verification. Useful for regimes that cannot rely on modified chips or vendor cooperation.

Forthcoming.
READ 3.22

Verification Mechanisms

Peigne et al. (2026)

A broad survey of verification mechanisms. Read alongside the other taxonomies to compare how researchers are carving up the space.

Forthcoming.
READ 3.23

AI Security Forum — Requests for Discussion

AI Security Forum (2024-2025)

Short problem-shaped requests for discussion. Treat these as project seeds: each one points at a concrete verification gap.

Available via aisecurityforum.org.