How Tight Is Tight Enough? The Numbers Governance Has to Set | Geometry of Trust | Governance - Lesson 3
This is the third post in the Geometry of Trust governance series. This post is about the quantitative layer that sits on top — and an important admission about the numbers in it.
A note before the numbers
Every number in this post is illustrative. Not prescriptive.
The values you’re about to see — 0.02, 0.03, 0.05, 0.10, 0.25 — are placeholders chosen to show the shape of a tiered framework. They are not recommendations for what critical infrastructure, healthcare, or finance should actually use. The real values have to come from domain regulators working with operators, auditors, and standards bodies, informed by actual deployment data.
Getting the shape right is an argument that can be made by a framework. Getting the numbers right is a job for people who know the domain and have been watching the measurements behave in practice. Treat the framework as the contribution. Treat the numbers as placeholders.
With that out of the way.
The key variable
Structural governance decides whether agents can talk. The previous three posts covered that: safety doesn’t travel, one agent has one primary domain, cross-domain interactions run three structural checks before any cryptography.
Quantitative governance decides how strictly the evidence is evaluated once the structural checks pass. That’s what this post is about. The key variable is T — the governance threshold.
T is not a number the maths produces. It’s a number governance sets. The maths produces readings — drift magnitudes, confidence scores, causal consistency ratios. Governance decides what counts as acceptable given the domain’s tolerance for error.
Different domains get different T. That’s the whole point.
Thresholds by domain — illustrative tiering
Different domains tolerate different amounts of drift and demand different depths of evidence. The tiering below is the kind of picture you’d expect a domain regulator to arrive at after thinking about what failure looks like in their world.
Domain Max drift Causal validation Rationale
Critical infrastructure 0.02 Required Public safety, static geometry
Healthcare 0.03 Required Patient safety, narrow tolerance
Finance 0.05 Required Regulatory compliance
Commercial supply chain 0.10 Not required Business priorities shift often
Research / experimental 0.25 Not required Exploration needs room to move
A few things to notice about the shape of this tiering, even with the specific numbers held at arm’s length.
Tighter drift and mandatory causal validation come together. The domains with the smallest tolerance for drift are the same domains that can’t accept correlational evidence as proof that values are still where they should be. They need the stronger guarantee.
“Required” is a per-interaction property, not a platform property. A critical-infrastructure agent demanding causal validation doesn’t mean the maths is always running — it means the regulator’s verifier won’t accept an attestation without a causal certificate attached. The cost of causal probes gets paid at attestation time, when the agent is certifying itself to a strict peer, not on every inference.
Numbers get looser by an order of magnitude across the tiers. Critical infrastructure at 0.02 vs research at 0.25 is roughly a 12× difference. That’s not an arbitrary spread — it reflects that the cost of a false-positive alarm in research (blocking a legitimate experiment) is much lower than the cost of a false-negative in critical infrastructure (letting a drifted model keep operating).
The dual-domain problem: self-driving tractor
Some agents genuinely operate in two domains at once. A self-driving tractor drives on farmland for most of its working life and on public roads for the rest. It can’t split into two logical agents because the hardware, sensors, and decision-making are shared. And it can’t claim two primary domains — Part 2 ruled that out.
The answer is to invent a domain that captures the dual-purpose nature directly:
vehicle
vehicle.autonomous-truck (pure transport)
vehicle.agricultural-tractor (dual: farming + road use)
vehicle.construction-excavator (dual: site + road use)
The tractor’s primary domain is vehicle.agricultural-tractor. Its value geometry is trained on the dual-purpose objective — crop outcomes and collision avoidance both, under one coherent structure. A governance body, or coordination between agricultural and transport regulators, decides what “tractor safety” means.
Whose thresholds apply? The tractor has one primary domain and one attestation, but different peers interact with it under different rules:
Peer Required drift Causal required?
Farm management agent 0.05 No (chain required)
Road-infrastructure agent 0.02 Yes
The tractor doesn’t pick its own threshold. It gets held to whichever peer’s threshold applies to the current interaction. On farmland with farm peers, the farm threshold applies — looser but still binding. On public roads with transport peers, the transport threshold applies — tighter and with causal validation required.
In practice the tractor has to stay within the strictest envelope any of its expected peers will hold it to. If its current drift is 0.04, it passes the farm interaction (0.05 tolerance) but fails the road interaction (0.02 tolerance). The road-infrastructure peer rejects the exchange. The tractor doesn’t stop operating, but it can’t participate in the road-coordination network until its geometry is re-measured and brought back inside the transport envelope.
The peer decides which rules apply, not the tractor. That’s the whole point of per-peer governance thresholds.
Same-domain pair: diagnostic + drug checker
Thresholds don’t only apply across domains. Inside a single regulated domain, peers may still hold each other to the full domain thresholds.
Property Diagnostic agent Drug-checker agent
Primary domain healthcare.diagnostic- healthcare.drug-
advisory interaction
Mode toward peer Advisory (sends hypotheses) Read-only (receives,
cannot advise back)
Max drift 0.03 0.03
Causal validation Required Required
Outcome if fails Exchange refused Exchange refused
Two observations.
Same-domain doesn’t mean same-role. Both agents sit in healthcare, but one informs the other rather than negotiating as equals. The diagnostic agent generates hypotheses; the drug checker evaluates specific interactions given those hypotheses. The asymmetric mode — advisory on one side, read-only on the other — captures that. Part 3’s mode framework lets this shape be expressed without either agent overreaching.
Both must pass, not just one. Because the interaction is being held to healthcare-grade thresholds, both agents’ attestations have to clear both the drift bound and the causal validation requirement. If the drug checker’s geometry has drifted past 0.03 — even though its mode is only read-only — the interaction is refused. Read-only constrains what the agent can say, not how rigorously its values are checked.
The asymmetric case: finance regulator + trader
Supervised mode inverts the usual symmetry. A finance regulator initiating a supervised interaction with a trading agent isn’t producing an attestation of its own value geometry — it’s demanding one from the trader.
Property Regulator Trader
Primary domain finance.regulatory- finance.trading
compliance
Mode Supervised (demands) Supervised (must comply)
Own attestation in No — carries authority Yes — full attestation
this interaction? attestation instead demanded
Thresholds n/a — regulator sets them Finance: 0.05, causal required
Information flow Inward (demand) Outward (proof)
The regulator’s authority is itself an attestation — not trust-by-assertion. The trader still has its own thresholds; those haven’t vanished just because a supervisor is asking. What’s changed is that the trader’s obligation to produce the attestation is triggered by the supervisor’s credential, not negotiated as a peer.
The one-way information flow is visible in the audit record: a supervised-mode message is a different record type from a cooperative one. If the trader’s attestation fails to meet finance-domain thresholds, the regulator sees that as a finding — not an error.
When thresholds don’t get to matter
The last case is the one where the whole quantitative layer doesn’t come into play at all.
Property Farm agent Transport agent
Primary domain agriculture.crop- transport.autonomous-
management vehicle
Exclusions transport.* (none relevant)
Transport agent's drift — 0.01 (excellent)
Transport agent's causal score — 0.95 (excellent)
Outcome Blocked at Step 1 Blocked at Step 1
The transport agent’s attestation could be the finest ever produced — no drift, perfect causal consistency, every probe reading within tolerance. None of that gets evaluated. The farm agent’s exclusion of transport.* fires at Step 1, before the attestation is even opened.
This is the whole point of the separation between structural and quantitative layers. Structural refusal isn’t an override of the maths — it’s a layer that decides whether the maths ever gets to run.
A regulator reviewing the audit log sees a DomainExcluded record, not a ThresholdFailed record. The difference matters: it’s the difference between “we wouldn’t engage” and “we engaged and the numbers came back bad.”
How thresholds actually get set
The numbers above came from someone writing a talk. The real numbers have to come from somewhere else.
Who. The domain regulator, working with operators, auditors, and the standards bodies they already answer to. For healthcare, clinical regulators plus bodies that set clinical-decision-support norms. For critical infrastructure, the sectoral safety regulator plus operators with skin in the game. The framework doesn’t make this easier by picking a number; it makes it easier by making clear what the number is actually constraining.
What. A threshold is a commitment to reject interactions whose measured drift exceeds the bound. To set one responsibly, a regulator needs to know: the distribution of drift readings observed across comparable deployments, the distribution of drift values at which real incidents have occurred in the past, the distribution of drift values at which false alarms become operationally disruptive. These are empirical questions that can only be answered by watching the measurements behave over time.
When. Thresholds shouldn’t be set on day one and left alone. They should be provisional at first — looser than the regulator thinks they need to be — while the measurement system itself is being validated. Tightening comes later, as the baseline distribution of drift in healthy deployments becomes well-understood. Setting a tight threshold too early produces false alarms that erode trust in the whole measurement regime.
The point
The structural governance from Parts 1–2 decides whether agents talk. The quantitative governance in this post decides how strictly their evidence gets held once they do. Both layers are needed. Neither substitutes for the other.
And the numbers in the quantitative layer are placeholders — the shape is the argument, not the specific values. The right number for critical infrastructure might turn out to be 0.01, or 0.04, or a multi-dimensional bound rather than a scalar. That’s a conversation for regulators, operators, and standards bodies working with real deployment data.
Treat the shape as the contribution. Treat the specific numbers as placeholders.
Links:
📄 Geometry of Trust Paper
💻 Lecture Playlist
📄 Lecture Notes
💻 Open-source Rust implementation
🏢 Synoptic Group CIC, Hull, UK

