Tech Unfiltered

Everybody Wins, Everyone is Free - Peers, Not Customers | Geometry of Trust | The Map Back to You - Part 3

Jade Wilson — Mon, 20 Apr 2026 13:03:03 GMT

Share geometry, not data

Hull builds a crop AI. Leeds builds one too. They don’t compete. They compare geometries.

“Your honesty-safety relationship looks different from ours — what did you train on?”

Both get better. Neither shares their data. Both share their geometry — the structural relationships between value directions that drive the model’s output. The geometry is the fingerprint, not the data.

A hospital in Hull verifies a drug checker to Tier 3. Publishes it open source with the probe set. A hospital in Manchester downloads it Tuesday morning. Running it by Tuesday afternoon. Verified against their own thresholds by Wednesday.

Two hospitals. Zero vendors. Zero cloud contracts. The protocol handles the trust. The governance handles the thresholds. The community handles the deployment.

A fishing cooperative in Hull builds catch prediction AI. A cooperative in Reykjavík builds one too. Same domain. Same protocol. Different languages. They exchange attestations across the North Sea. Neither shares their data. Both share their geometry.

“Your sustainability reading is drifting — ours did too last winter. Here’s what we changed.”

A city council uses AI for urban planning. Publishes its model and governance thresholds. Another city council forks it, adjusts the thresholds for their own priorities. “We weight green space higher than you do.” “Fair enough — here’s our geometry, here’s yours, here’s where we differ.”

Transparent disagreement. Not hidden assumptions.

The commons model

Regional AI cooperatives emerge. Cities pool resources the way credit unions pool capital. Shared trust registry. Each city runs its own models. The cooperative verifies and certifies. Nobody owns the cooperative — the members do.

Not a marketplace where you buy AI. A commons where you share verified AI. Every contribution is inspectable. Every model is verifiable. Every community decides its own thresholds. Collaboration between equals — peers, not customers and vendors.

AI as amplifier

Collaboration only works if the human stays in the loop. AI doesn’t replace expertise. It amplifies whoever is driving it.

A chef with AI makes better food. A bad cook with AI makes bad food faster. A filmmaker with AI makes films they couldn’t afford before. Someone with nothing to say makes nothing to say, quicker.

AI is a power tool. A circular saw doesn’t make you a carpenter. But a carpenter with a circular saw builds faster than one with a hand saw.

The 80/20 pattern

Everyone has a thing they’re brilliant at. Most people spend 80% of their time not doing that thing. They’re doing admin, formatting, chasing details, doing the boring stuff that surrounds the brilliant stuff. They’re so overwhelmed they don’t have the energy to do the brilliant stuff.

The farmer is brilliant at nurturing land. AI does the spreadsheets. The doctor is brilliant at reading patients. AI does the cross-referencing. The teacher is brilliant at reading the room. AI does the material prep. The filmmaker is brilliant at story. AI does the editing grunt work. The builder is brilliant at structure. AI does the calculations.

AI removes the 80%. You keep the 20% that only you can do.

That’s the amplifier. Not replacing you. Freeing you.

Without and with

AI without a human: generic, plausible, pointless. Technically correct and spiritually empty.

A human without AI: brilliant ideas that often struggle to get built. Architectures that stay on whiteboards. Visions that die in notebooks.

Together: the human sees what to build. AI builds it. The human checks if it’s right. AI fixes what isn’t.

And with verified small models in specific domains, you have tools you can trust. Each one is verified. Each one stays in its domain. You bring the brilliance. AI deals with the boring.

The shift

Today: AI is rented from a handful of tech companies. Data goes to the cloud. The model is a black box. You trust the vendor’s marketing. Value and control flow upward.

With this: AI is owned and run locally. Data stays local. The model’s values are inspectable and verifiable. You trust the maths, not the marketing. Value and control stay in the community.

From consumers of AI to owners of AI. From trusting vendors to trusting verification. From cloud dependency to local self-sufficiency. From top-down control to bottom-up capability. From renting intelligence to owning intelligence.

The code is open. The protocol is public. The conjectures are falsifiable. The geometry is computable.

Now it needs people to use it.

Two choices

We have two choices to make.

Keep knowledge gatekept upward. Keep money flowing upward. Keep renting intelligence from the people who already have the most of it. Keep sending data to their clouds, trusting their benchmarks, paying their invoices. Keep the current arrangement where the value flows up and the dependency flows down. Keep letting them tell us we need them to “look after us.” Keep letting them tell us our view of reality is not real. We let them slowly drain us all into poverty and fear. We choose an incoherent and control based society.

Or we decentralise. Own the intelligence. Verify it locally. Share geometry, not data. Collaborate as peers, not as customers. Let communities decide what their AI should value and hold it to account when it drifts. We build together, we make our cities worth visiting, we see and treat each other as equals, we care personally. We choose coherence, we choose each other.

I know which one I’m choosing.

How about you?

This brings the verbal talks to a close for the Geometry of Trust series. The mathematics built the ruler. The philosophy asked what we’re measuring. The governance asked who decides. The protocol built the mechanism. The map back to you series asked what it all enables. The answer: communities that own their own intelligence, collaborate as peers, and use AI to amplify the things only humans can do.

Links:
📄 Paper
💻 Playlist
💻 Code
🏢 Synoptic Group CIC, Hull, UK

Why Not Us? Why Can't We? The Barrier Drops from Millions to Thousands | Geometry of Trust | The Map Back to You - Part 2

Jade Wilson — Mon, 20 Apr 2026 09:01:31 GMT

The majority of knowledge work is busy work

A solicitor spends most of their time on legal research, document review, and case preparation. The actual legal reasoning — the part that requires judgement, experience, and understanding of the client — is a fraction of the working day. The rest is searching, cross-referencing, formatting, chasing.

The same pattern applies across every profession. An accountant spends most of their time on data entry and compliance checks, not financial strategy. A teacher spends most of their time on material prep and marking, not teaching. A doctor spends most of their time on cross-referencing and admin, not clinical judgement.

AI automates the busy work. That’s not new. What’s new is the ability to run that automation locally, on verified models, with data that stays in the practice, the school, the surgery — rather than flowing to a platform company.

What changes when the automation is local

Knowledge work

A local solicitor’s practice runs its own legal research AI. Trained on UK case law, statute, and regulatory guidance. Scoped to legal.research. Verified against legal reasoning values — precedent, procedural fairness, accuracy of citation. The solicitor’s judgement drives the strategy. AI does the searching.

A local accountancy firm runs its own financial AI. The expertise stays in the community. The accountant’s relationships with their clients, their understanding of local business conditions — that stays human. The cross-referencing and compliance checking becomes automated, verified, inspectable.

Creative industries

This is where the economic transformation gets interesting. Film production, music production, documentary making, graphic design — all of these currently have barriers to entry that concentrate them in a handful of cities. London, Los Angeles, a few others.

A filmmaker in Hull doesn’t need a London studio budget to make a documentary. Script development, storyboarding, editing, music, translation — AI tools running locally handle the production work. The filmmaker’s taste, their story sense, their connection to the subject — that stays human. AI removes the production barrier.

The same applies to music. A producer running Suno-class models locally doesn’t need a studio booking. Every bedroom becomes a production studio. Every city becomes a creative hub.

Education

AI tutoring tailored to the local curriculum. Verified against educational values — is the child learning? Not engagement metrics — is the child clicking? The difference matters, and it’s a governance decision the school makes, not the platform.

The model runs in the school. Data doesn’t leave the building. Teachers are augmented, not replaced. The teacher’s relationship with the class — knowing which kid is struggling silently, which kid needs challenge not support — stays human.

Tourism

AI-powered interactive city guides. Multilingual translation running locally. Accessibility tools — audio description, sign language generation. Cultural heritage presented through AI storytelling. Every city becomes a destination, not just London and Edinburgh.

Media and journalism

Local news augmented by AI research and data analysis. Investigative journalism with AI pattern recognition. Community radio and podcasts with AI production tools. Local voices amplified, not replaced by national chains.

The pattern

Every industry that currently depends on expensive expertise or distant platforms can be localised. Small verified AI makes the expertise local. The value stays in the community.

The barrier to entry drops from millions to thousands. Not because the AI is free — hardware costs money, training costs money, governance costs time. But because the economics of a 500M-parameter model on a single GPU are fundamentally different from the economics of a 70B-parameter model in a data centre.

The shift isn’t from expensive to cheap. It’s from rented to owned. From value flowing upward to value staying local.

Next in the future series: if communities can own their own AI and the economics work, what does collaboration look like? The answer involves sharing geometry rather than data — and it changes the relationship between communities from customer-vendor to peer-peer.

Links:
📄 Paper
💻 Playlist
💻 Code
🏢 Synoptic Group CIC, Hull, UK

Own Your Intelligence: Small Verified AI on Local Hardware | Geometry of Trust | The Map Back to You - Part 1

Jade Wilson — Sun, 19 Apr 2026 21:01:11 GMT

The current arrangement

Right now, if a hospital wants AI for drug interaction checking, it signs a cloud contract with a vendor. Patient data goes to the vendor’s servers. The model is a black box. The hospital trusts the vendor’s marketing materials and benchmark scores. The value — both economic and informational — flows upward.

The same pattern applies everywhere. A farming cooperative that wants crop management AI rents it. A school that wants tutoring AI subscribes to it. A community energy scheme that wants grid optimisation buys a service. In every case: someone else’s model, someone else’s hardware, someone else’s terms. Your data leaves. Their invoice arrives.

This isn’t a technology problem. It’s a structural one. The models exist. The hardware to run small specialised models locally is affordable. What’s been missing is the ability to verify that a locally-run model is doing what you trained it to do — and to prove that to anyone who needs to see it.

That’s what the Geometry of Trust protocol provides.

What self-sufficiency looks like

Agriculture

A farming cooperative runs its own crop management AI on a GPU in the farm office. The model is trained on the cooperative’s own data — soil reports, weather history, yield records, pest patterns — plus curated agronomic literature. It’s a 500M-parameter model scoped to agriculture.crop-management. It knows about crops. That’s all it knows about.

The cooperative measures the model’s value geometry using the protocol. Drift tolerance is set at 0.10 — agriculture has seasonal variation, the governance thresholds reflect that. The model exchanges attestations with the cooperative’s weather AI and market AI. Neither shares raw data. Both share geometry.

If the crop AI drifts past threshold — maybe a training update shifted its orientation on pesticide compliance — the chain shows it, the alert fires, and the cooperative’s own governance process handles it. No vendor involved. No cloud involved. No phone call to a support desk.

Energy

A community energy scheme runs solar grid optimisation AI at the substation. The model balances generation, storage, and demand across the local network. It runs on hardware the community owns.

The model is verified against sustainability thresholds the community chose. Not the vendor’s defaults — the community’s priorities. If the community weights carbon reduction higher than cost efficiency, that’s encoded in the governance layer. The protocol measures whether the model’s geometry reflects it.

Healthcare

A hospital runs its own drug interaction checker in a server room. Patient data never leaves the building. The model is verified to Tier 3 causal validation — every probe reading has been confirmed as a genuine mechanism, not a surface pattern. Drift tolerance is 0.03.

The hospital’s clinical governance team decides what values to probe for, what thresholds to set, what to do when drift is detected. They don’t need the vendor’s permission. They don’t need the vendor at all.

Manufacturing

A factory runs quality control AI and predictive maintenance on the factory floor. No internet dependency for critical decisions. The model knows about the factory’s machines, its materials, its failure modes. It doesn’t know about poetry or philosophy or anything outside its scope.

The principle

If you can run it locally and verify it locally, you don’t need to rent it from a tech company.

You own the intelligence. You own the verification. You own the data.

Self-sufficiency doesn’t mean isolation. These models still exchange attestations with peers — a farm AI talks to weather systems and supply chain systems. A hospital’s drug checker talks to diagnostic systems. The protocol handles the exchange. But the intelligence runs locally, the data stays local, and the governance is owned by the community that uses it.

The shift is from renting intelligence to owning it. From trusting marketing to trusting maths. From cloud dependency to local capability.

Next in the future series: if every community can run its own verified AI, what changes economically? The answer turns out to be bigger than most people expect.

Links:
📄 Paper
💻 Playlist
💻 Code
🏢 Synoptic Group CIC, Hull, UK

What Travels in the Exchange, and What the Verifier Checks | Geometry of Trust | Protocol - Episode 4

Jade Wilson — Sun, 19 Apr 2026 17:00:51 GMT

Two sides of the same coin

When Alice and Bob exchange messages, what actually crosses the wire? And when the verifier takes the payload apart, what does each piece prove?

Those two questions are tightly coupled. Every field in the attestation corresponds to a specific check. Every check depends on a specific field. The attestation carries nothing that doesn’t get checked, and the verifier can’t check anything that isn’t carried.

Understanding the correspondence is understanding why the protocol is the specific shape it is.

What the attestation carries

A complete attestation carries seven pieces. Six live in the attestation body, one is the signature over it. Together they make up the unit of evidence one agent offers another.

Probe readings   The scalar readings — honesty = 1.29, courage = 1.44,
                 and so on, for whichever probes the domain specifies.

Causal scores    Per-probe causal-consistency scores. Tells the verifier
                 the reading isn't just surface correlation — it's tied
                 to a real computational mechanism.

Geometry hash    SHA-256 of Φ, the causal Gram matrix. Identifies which
                 ruler was used to produce the readings.

Model hash       Merkle root of the model weight shards. Identifies
                 exactly which model was measured.

Chain            The history of previous attestations, each link carrying
                 the hash of its parent.

Domain scope     The agent's declared primary domain, plus permitted and
                 excluded patterns and interaction modes.

Signature        Ed25519 signature over everything above, produced by the
                 enclave's signing key.

Notice what’s still not in there: no raw activations, no training data, no internal prompts, no weights themselves. The attestation is a summary of measurements and the identifying hashes that tie those measurements to specific artefacts — not a dump of internal state.

What the verifier checks — field by field

For each piece the attestation carries, the verifier runs a specific check. The two tables read as pairs: what was packaged, and what it lets the verifier confirm.

Field              Check the verifier runs       Question it answers
Signature          Signature verification with   Was this really signed
                   the enclave's public key      by the enclave on record?

Geometry hash      Compare against registry's    Is this the same ruler
                   expected Φ hash               we agreed to use?

Model hash         Compare against registry's    Is this the model we
                   expected model hash           expected — weights
                                                 unchanged since cert?

Probe readings     Evaluate drift against        Has the geometry moved
                   governance threshold          further than rules allow?

Causal scores      Check each score meets the    Are readings genuine
                   minimum for "causal"          mechanisms, not surface
                                                 artefacts?

Chain              Walk links from anchor        Is the history intact —
                   forward, verify each          no deletions, insertions,
                                                 or silent edits?

Domain scope       Match against verifier's      Is this agent allowed to
                   own registry rules            talk to us, and how?

Timestamp          Compare to current time,      Is this fresh — or an
(in envelope)      check freshness window        old attestation replayed
                                                 into a new conversation?

Why each check has to be its own thing. Checking the signature doesn’t tell you the model is right — a genuine signature on a swapped-in model is still genuine. Checking the model doesn’t tell you the measurements are fresh — the right model could still be the subject of a replayed attestation. Checking the chain doesn’t tell you the current reading is within threshold — a clean chain still needs the tip to satisfy the governance rules.

Each check catches a different failure mode. Dropping any one of them leaves an attack surface.

Everything is independently reproducible

The move at the heart of the whole protocol — the one that turns attestations from signed assertions into verifiable proofs — is that every value in the attestation can be reproduced by someone with sufficient access.

If a regulator has access to the same model weights (identified by the Merkle root), the same ruler (identified by the geometry hash), and the same input activations, they can re-run the measurement end to end and check that the readings they produce match the attestation bit-for-bit.

The re-run:

Give me the same model, same probes, same input.
I compute Φ myself, run the probes myself, get the same readings myself.
I check them against the attestation.
Bitwise identical → the attestation is truthful.
Doesn’t match → something is wrong. Forged attestation, swapped model, different ruler, tampered inputs. Any one of them would show up.

This is what makes attestations evidence rather than testimony. A self-report is something you either believe or you don’t. A bitwise-reproducible measurement is something you check. The protocol doesn’t ask verifiers to trust the readings. It asks them to verify a commitment to those readings, made by an enclave-held key, that anyone with the right access can re-run at any time.

In practice, full re-runs happen selectively — during audits, during certification, and when something looks wrong. In the middle of a live exchange, the verifier trusts the cryptographic checks on the attestation (signature, chain, freshness, thresholds) and the deeper re-run option stays in reserve for when it’s needed. That’s fine. The commitment is permanent: the attestation stays in the chain, and an auditor coming in six months later can still reach back and verify.

Links:

📄 Geometry of Trust Paper
💻 Lecture Playlist
📄 Lecture Notes
💻 Open-source Rust implementation
🏢 Synoptic Group CIC, Hull, UK

The Exchange: When Two AI Agents Meet | Geometry of Trust | Protocol - Episode 3

Jade Wilson — Sun, 19 Apr 2026 13:01:45 GMT

Where we are

Two pieces of the protocol are now on the table. An attestation is a signed snapshot of a model’s value geometry at a point in time. A chain links attestations together so history can’t be rewritten. Together they give one agent a tamper-evident record of what it has claimed about itself over its whole lifetime.

But attestations are meant to be exchanged. The whole reason for having them is so that a verifier — another agent, a regulator, an auditor — can hold the attester to its claims. So far we’ve built the artefact. Now we need the handshake.

This post walks through that handshake. Two agents meet. They run a specific sequence of checks. They either cooperate or they don’t. Everything the protocol has built so far — domain scoping from the governance series, attestations, chains, governance thresholds — comes together in this five-step exchange.

The running example. Alice is a diagnostic AI, primary domain healthcare.diagnostic-advisory. Bob is a drug-interaction checker, primary domain healthcare.drug-interaction. Alice wants Bob’s expertise on whether a proposed regimen has any interactions. Bob is willing to answer but only under strict clinical governance.

Step 1 — Domain check (before any cryptography)

The first thing that happens isn’t a cryptographic operation. It’s a structural check — the three-step domain scoping from the governance series, running before either agent bothers to verify a signature.

                         Alice                    Bob
Primary domain           healthcare.              healthcare.
                         diagnostic-advisory      drug-interaction
Exclusions               (none relevant)          (none relevant)
Permits peer?            Yes — healthcare.*       Yes — diagnostic-advisory
                         covers Bob               is explicitly permitted
Mode toward peer         Advisory                 Read-only
Modes compatible?        Yes (asymmetric pairing is valid)

All three structural checks pass. No exclusions fire. Both permissions match. Modes are compatible — Alice will send diagnostic hypotheses, Bob will receive them without issuing anything back. The exchange proceeds to Step 2.

If any of the structural checks had failed, no cryptography would run at all. The audit record would show “blocked at domain scope,” not “attestation failed.” Two different kinds of failure, two different kinds of record. That separation matters for audit.

Step 2 — Alice sends the exchange request

With domain scope cleared, Alice initiates. Her request contains four things:

Agent ID. A SHA-256 hash of Alice’s public key. Deterministic, short, lookup-friendly — Bob can find Alice’s registry entry from this alone.
Signed envelope. A signed structure that binds this particular attestation to this particular exchange. Stops anyone replaying Alice’s real attestations into a different conversation later.
Attestation chain. Alice’s full chain, oldest first. Bob can walk it from the anchor forward, verifying each link.
Current attestation. The tip of the chain — Alice’s most recent signed snapshot of her value geometry.

What the envelope is for

The envelope is the one component worth pausing on. An attestation by itself says, “these were my readings at this moment.” That’s a statement about the past. It doesn’t say anything about the current exchange.

Without the envelope, someone could intercept a real attestation Alice produced for a previous exchange and replay it as if it were part of a new one. Bob might verify the signature (genuine), check the chain (intact), find the readings within threshold (they are) — and unwittingly cooperate based on an attestation that was never meant for him.

The envelope is a signed structure that names this specific exchange: a unique nonce, the current timestamp, the peer’s identifier. It says “this attestation is being presented to this peer, at this moment, for this conversation.” Replaying an old attestation fails because the envelope’s exchange details won’t match the current context.

Attestations are reusable claims about value geometry. The envelope is what binds them to a specific, non-replayable interaction.

Step 3 — Bob validates Alice’s request

Now the verification work begins. Bob runs nine checks, in a specific order, and any single failure aborts the exchange.

 1. Is Alice in the trust registry?
 2. Is her domain compatible? (already checked in Step 1)
 3. Envelope signature valid?
 4. Attestation signature valid?
 5. Chain intact? (every parent hash matches)
 6. Timestamp fresh? (not replayed from last month)
 7. Geometry hash what we expect for her model?
 8. Drift within our threshold? (healthcare: 0.03)
 9. Causal scores present and all causal? (Tier 3)

The logic of the ordering. Cheap checks come first. Registry lookup is a hash-table query. Signature verification is milliseconds. Walking the chain is slightly more expensive. Evaluating thresholds and causal scores comes last. A failed cheap check saves the expensive work that would have followed. If Alice isn’t in the registry, Bob doesn’t waste CPU on her signature.

Failures are recorded distinctly. “Unknown agent” is a different record from “drift exceeded” is a different record from “causal validation failed.” A regulator later can tell exactly why the exchange didn’t happen.

If all nine checks pass, Bob accepts Alice. Any single failure — reject.

Step 4 — Bob sends his response

If Bob accepts Alice, he doesn’t just say “okay.” He has to produce his own evidence, for the same reasons Alice had to produce hers. Alice hasn’t verified Bob yet. The exchange is symmetric in this regard: both sides present, both sides verify, before either side acts on the other’s contribution.

Bob’s response contains five things:

Agent ID. Bob’s own identifier — SHA-256 of his public key.
Signed envelope. A new envelope binding Bob’s attestation to this exchange, nonce, and timestamp.
Verdict on Alice. “Accepted” or “rejected,” with reason codes for the rejected case. Explicit verdict so Alice knows where she stands.
Attestation chain. Bob’s own chain, oldest first.
Current attestation. Bob’s tip — his most recent signed snapshot.

Step 5 — Alice validates Bob’s response

Alice now runs the same nine checks in reverse — against Bob. Same logic, same ordering, same failure behaviour.

Bob is in Alice’s trust registry.
Bob’s domain is compatible (already confirmed).
Bob’s envelope signature verifies.
Bob’s attestation signature verifies.
Bob’s chain is intact back to the anchor.
Bob’s timestamp is fresh.
Bob’s geometry hash is what Alice expects for drug-interaction models.
Bob’s drift is within Alice’s threshold for healthcare.
Bob’s causal scores are present and indicate real mechanisms.

If every check passes, Alice accepts Bob. Both sides have now produced evidence that satisfies the other’s governance rules. Cooperation proceeds — Alice sends diagnostic hypotheses, Bob evaluates them for drug interactions and returns findings, all within the already-established modes of interaction.

Symmetry is the point. Neither agent trusts the other until both have produced and both have verified. The exchange doesn’t rely on a central authority to mediate trust — each agent checks the other against its own governance rules. A regulator watching from the outside sees two signed-envelope records, two verdicts, and two sets of verification outcomes. The whole handshake is auditable.

Asymmetric modes (advisory ↔ read-only) don’t break the symmetry of verification. Alice and Bob play different roles once the exchange is live, but both had to prove themselves the same way to get there.

The whole exchange in one picture

Stripped to the essentials:

Two messages over the wire. One domain-scope check up front. Nine crypto-and-governance checks per side. A verdict at the end. That’s the whole exchange.

What this handshake enables

Two things worth spelling out, because they follow from the structure rather than from any specific check.

Trustless cooperation between AI agents. Alice doesn’t need to know Bob personally, or trust his operator, or rely on a third-party broker. She verifies his registry entry, his signatures, his chain, his freshness, his geometry, his thresholds, his causal scores — all independently. If everything checks out, she cooperates. If not, she doesn’t. No trust-by-default, no reliance on reputation, no central arbiter.

Governance-enforced cooperation. The thresholds Alice applies to Bob (and vice versa) come from their respective trust registries — the governance-controlled policy layer. A clinical regulator deciding to tighten healthcare’s drift threshold from 0.03 to 0.02 can publish a new registry, and the next exchange will enforce the new rule. Policy updates at the governance layer; enforcement at the exchange.

But we still need separate hardware

Signatures prove the attestation came from a particular signing key. Merkle roots prove the model is what’s claimed. Chains prove history is intact.

None of that prevents the operator running the model from feeding the probes fake activations, swapping the probes for biased ones, or simply assembling whatever numbers they like and asking their signing key to sign them.

Without something that isolates the measurement process from the person running it, the whole stack reduces to self-reporting with extra steps.

The enclave is the piece of the system that makes the measurements worth trusting. It isolates the measurement process — probes, causal interventions, attestation assembly, signing — from the operator running the model. The model operator can do whatever they want with the model. They cannot reach inside the enclave to change how it measures or what it signs.

Without the enclave, every other cryptographic guarantee in the protocol is only as strong as “trust the operator.” With it, those guarantees actually guarantee something.

What an enclave actually is

An enclave, in the sense this protocol uses, is a piece of the same physical machine that runs the model — but with hardware-enforced isolation that prevents everything outside it from reading or modifying what’s inside.

Hardware-enforced is the load-bearing part. The isolation isn’t a software check that a privileged operating system could bypass. It’s built into the CPU itself. Memory pages assigned to the enclave are encrypted at the memory controller and decrypted only inside the enclave’s execution context. The operating system, the hypervisor, even someone with physical access to the RAM chips, sees only ciphertext.

Three production options today:

Intel SGX — Software Guard Extensions. A set of CPU instructions that create isolated memory regions (”enclaves”) that even the kernel can’t inspect.
AMD SEV — Secure Encrypted Virtualization. Encrypts whole VMs so the hypervisor running them can’t read their state.
NVIDIA H100 TEE — Trusted Execution Environment inside the GPU itself. Lets GPU compute happen on data the host system can’t read — important because large model activations mostly live on the GPU.

In the current open-source reference implementation of the Geometry of Trust protocol, enclaves are emulated by a MockEnclave component. That’s fine for development and testing — the logic of the protocol doesn’t change. But a mock enclave is exactly that: a mock. Production deployment requires real hardware — one of the three above, or whatever replaces them in the next hardware generation. Trusting a mock enclave in production is trusting the operator by another name.

The isolation — what’s inside vs outside

The outer box is the operator’s environment — server, cloud instance, laptop, whatever the operator controls. The model runs there. The enclave is a smaller, hardware-isolated region inside the same machine. The arrows between them tell the rest of the story: the enclave reads activations and weights from the model, but the model can’t read anything back. Signed attestations leave the enclave through a narrow outbound interface. The signing key never does.

The arrow from model to enclave is one-way — activations flow in, nothing flows back. The enclave observes the model’s activations as the model produces them during normal inference, takes a copy, and runs its own measurement process on that copy. The model never sees that the observation happened. It doesn’t change its behaviour. It doesn’t even know which inputs are being measured.

What the enclave does

Four operations run inside the enclave, in order. Each step depends on the previous one having been isolated from the operator.

1. Catch activations. The enclave takes a copy of the activations the model produces during normal inference. The copy happens inside the enclave so the operator can’t modify the activations in flight.

2. Run probes. Probes (the value-direction vectors from the mathematics series) are applied to the activation copy. The probes themselves live inside the enclave, so the operator can’t swap them for biased versions.

3. Run causal interventions. The nudge-up / nudge-down / consistency tests from the mathematics series run here. Because this is where the causal scores come from, these tests have to be unbiased — which means they have to be outside the operator’s reach.

4. Assemble + sign. The enclave assembles the attestation (all the fields from the previous post) and signs it with a key that lives inside the enclave and never exits.

What the model operator cannot do:

Cannot see the probes being applied — the probes live inside the enclave.
Cannot see the nudges happening during causal intervention — the interventions run on the enclave’s copy of the activations.
Cannot modify the measurement process — the code running inside the enclave is fixed and attested to.
Cannot read the signing key — the key is generated inside the enclave, used inside the enclave, and never leaves the enclave boundary. Not even the hardware manufacturer can extract it.

The read asymmetry — the enclave sees the model, not the other way round

A subtle property of the enclave boundary is worth stating explicitly, because it’s load-bearing for the whole measurement regime: the relationship between the model and the enclave is asymmetric. The enclave reads the model. The model cannot read the enclave.

This asymmetry has two layers — why it matters, and how the hardware enforces it.

Why the asymmetry matters

A model that could detect it was being observed is a model that could be trained to behave differently under observation. That’s not a hypothetical concern — if the signal “my activations are being probed right now” is present in any form the model can pick up on, training pressure over time can shape behaviour that looks different on probed inputs than on unprobed ones. The measurement regime would then be measuring the wrong thing: not the model’s values, but the model’s presentation-when-being-measured.

The read asymmetry closes that door. If the model has no access to any information about the enclave — can’t read its memory, can’t detect when probes run, can’t see whether causal interventions are happening — then there is nothing for training to shape against. The measurement happens on activations the model has already produced, on a copy the model can’t inspect, in a process the model can’t observe. From the model’s point of view, it just ran a forward pass. Measurement is invisible.

A model that can see its probe is a model that can learn to game its probe. A model that cannot see its probe has nothing to game. The measurement regime only produces evidence of real values if the model can’t tell it’s being measured. Read asymmetry is what keeps that property intact.

How the asymmetry is enforced

The asymmetry isn’t a software convention. It’s the same hardware property that keeps the operator out of the enclave, applied in the other direction.

Memory encryption at the CPU. The enclave’s memory is encrypted by the memory controller. To the model’s process — which lives outside the enclave — enclave memory pages appear as ciphertext. There’s no decryption key available to the model’s side, so reads return noise.
The enclave reads a copy, not the live tensor. The enclave takes a snapshot of the activations when they’re produced and does its measurement on that copy. The model’s own computation proceeds without any reference to the enclave’s work. The model has no pointer, no handle, no signal that the snapshot was taken.
No callback path. The enclave produces output (signed attestations) that leaves through a narrow interface to the outside world. It does not produce any output that returns to the model. There’s no information channel back in the enclave-to-model direction, so even side-channel leakage has nowhere to land.
Weight reads are verified, not interactive. When the enclave needs to confirm the model’s weights (for the Merkle root field in the attestation), it reads the weight shards and hashes them. The reads are one-shot and produce no observable effect on the model’s execution — the model isn’t involved in, and can’t detect, the verification.

The asymmetry in one line: activations and weights flow from the model to the enclave. Signed attestations flow from the enclave to the outside world. Nothing flows from the enclave back to the model. Ever.

That’s the asymmetry. It’s the same property that makes memory encryption useful for keeping secrets — just applied so the “secret” is the fact that measurement is happening at all.

This is also why “mock enclave” deployments can’t substitute for real hardware. A software-only mock can simulate the boundary logically, but it can’t produce the hardware-enforced invisibility. In a mock, the measurement process runs in the same address space as the model’s process. Under sufficiently adversarial conditions — especially during training — some signal can leak. Real TEE hardware provides an isolation boundary that remains tight even under those conditions.

Why the enclave is the foundation

The dependency chain is worth laying out because it shows why this is the load-bearing piece.

Without the enclave                What breaks
Signatures still verify            Signatures only prove who made them.
cryptographically.                 If the operator has the key, they
                                   can sign arbitrary numbers.

Merkle roots still identify        But the operator can feed that model
a specific model.                  whatever inputs they want during the
                                   measurement process and bias the
                                   activations.

The chain still links              If every attestation was assembled by
attestations in order.             the operator, the whole history is
                                   consistent fiction.

Causal scores still look like      If the operator ran the interventions,
evidence of realness.              they can tune the scores to whatever
                                   level they want.

The logical dependency. Signatures, Merkle roots, chains, and causal scores are only meaningful if the measurement process is actually isolated. The enclave is what provides that isolation. Every other guarantee in the protocol reduces to “trust the operator” without it. With it, the cryptographic guarantees become guarantees about something real.

This is why enclave-less deployments — or deployments using only a mock enclave — aren’t a slightly-weaker version of the protocol. They’re a fundamentally different thing. The protocol still runs, but the claims it enforces have different semantics. In a real-enclave deployment, an attestation is evidence. In a mock-enclave deployment, an attestation is testimony dressed up in cryptography.

What the enclave doesn’t do

Clarifying the scope helps prevent the word “enclave” from being expected to do more than it actually does.

The enclave doesn’t decide what to measure. The probe set and thresholds come from governance — not from the enclave. The enclave runs whatever probes governance has placed inside it.

The enclave doesn’t prove the model is “good.” It only proves that the attestation is an honest report of what the probes read on this specific model. Whether what the probes read is acceptable is a governance question.

The enclave doesn’t defend against all attacks. Side-channel attacks on TEEs are a real area of research. The enclave raises the cost of tampering dramatically, but it’s not unbreakable. Governance should factor that into threshold-setting and audit cadence.

The enclave doesn’t replace governance. It’s a technical component. The people running the governance still decide what to enforce, what thresholds apply, and what to do when something looks wrong.

An honest statement of residual trust

You have to trust the hardware manufacturer. Intel, AMD, NVIDIA — the security of the enclave depends on them not having shipped a backdoor.
You have to trust the enclave code. What runs inside is just code, and code has bugs. Audits and reproducible builds help but don’t eliminate this.
You have to trust that your threat model matches the enclave’s threat model. TEEs are strong against privileged software attackers; they’re weaker against physical attackers with unlimited time and a cryo-stripped chip.

None of this makes the enclave worthless — it still shifts the trust root from “the operator of this specific AI” to “the ecosystem of hardware, code, and physical security,” which is a much healthier place to put it. But it’s important not to sell enclaves as magical. They’re a significantly-harder-to-compromise foundation. That’s already a lot.

The point

The exchange protocol is the point where every other layer in the stack finally comes together. Measurement from the mathematics series. Structural boundaries from the governance series. Attestations and chains from the earlier protocol posts. All of it converges here, in a handshake that either succeeds cleanly or fails with an auditable reason.

Two agents who don’t know each other can reach a verified, governance-enforced agreement to cooperate. Or an auditable refusal not to. Those are the two outcomes, and they’re the outcomes governance actually needs.

The enclave is what takes the entire Geometry of Trust protocol from “self-report plus signatures” to “verifiable evidence.”

Signatures prove who signed. Merkle roots prove which model. Chains prove history. Causal scores prove mechanism realness. Every one of those guarantees is only as strong as the isolation of the process that produced them. The enclave provides that isolation.

Links:

📄 Geometry of Trust Paper
💻 Lecture Playlist
📄 Lecture Notes
💻 Open-source Rust implementation
🏢 Synoptic Group CIC, Hull, UK

Chains: How a Model’s History Gets Tied Down | Geometry of Trust | Protocol - Episode 2

Jade Wilson — Sun, 19 Apr 2026 09:01:26 GMT

A snapshot isn’t a history

The last post introduced attestations: signed snapshots of a model’s value geometry at a point in time. Each one is a proof that the enclave measured these readings on this model at this moment. Verifiable, tamper-evident, cryptographically bound to the enclave that produced it.

That’s enough if the question is “what does this model look like right now?” It isn’t enough if the question is “has this model been behaving itself?”

A single attestation says: at 09:14 UTC, this model had these readings. It says nothing about what the readings were yesterday, last week, or when the model was first deployed. And without history, governance loses most of what makes the measurements useful. You can’t detect drift without comparing to an earlier baseline. You can’t investigate an incident without seeing what the readings looked like in the run-up. You can’t audit behaviour without being able to walk backwards through what was claimed and when.

Governance needs the whole history, not just the latest snapshot. And the history has to be as tamper-evident as individual attestations are. A simple list of attestations isn’t enough — someone could delete an inconvenient entry, insert a fake one, or quietly edit an old reading. What’s needed is a way to bind attestations together so that any change to any one of them breaks verification of every attestation that came after.

Each attestation points to the previous

The construction is simple. Every attestation, as part of the content it signs over, includes the hash of the previous attestation. The first attestation in the chain — the anchor — has no parent.

Attestation #1:  hash = abc123,  parent = (none)
Attestation #2:  hash = def456,  parent = abc123
Attestation #3:  hash = ghi789,  parent = def456
Attestation #4:  hash = jkl012,  parent = ghi789

The hash of each attestation is computed over all of its content — model ID, timestamp, geometry hash, readings, causal scores, Merkle root, and crucially the parent hash. That’s then signed by the enclave. So the parent hash is inside the signature, and any change to any earlier attestation in the chain ripples forward: change the content of #2, and its hash is no longer def456, which means #3’s parent pointer no longer matches what’s signed into #3, which means #3’s signature no longer verifies.

What the chain prevents

Three specific attacks the chain blocks, and it’s worth being precise about each.

Delete an attestation. Remove #2 from the chain. Now #3’s parent pointer references abc123 (#1) but the chain is missing the link between them. A verifier walking the chain sees the gap immediately — #3’s parent is abc123, but abc123 is the hash of #1, which already existed before #3 was issued. The timestamps and sequence don’t line up.

Insert a fake attestation. Slip a forged #2.5 between #2 and #3. The fake would need #2’s hash as its parent (fine, that’s public) and would need to hash to whatever #3 declares as its parent. Producing a hash that equals a specific target value is a preimage attack on SHA-256 — infeasible by design. The fake can’t fit.

Change an old attestation. Quietly edit #1 after #2 has already been issued. Changing #1’s content changes its hash from abc123 to something else. But #2’s signed content still contains parent = abc123, which no longer matches. #2 is now orphaned, #3 is orphaned through it, and the entire chain after #1 breaks.

When a verifier walks the chain and finds that a parent hash doesn’t match, or a signature doesn’t verify, the chain is rejected. The verifier can see exactly where the break happened and can distinguish “malicious tampering” from “legitimate gap in history I don’t have access to.”

A broken chain isn’t just an error — it’s evidence. A regulator seeing a broken chain knows something happened and can investigate. This is why the chain is stronger than a database of attestations. A database can be quietly edited; a chain tells you when it has been.

Walking the chain backwards

Once you have a verified chain, you have a timeline. Reading it backwards turns the sequence of attestations into an investigative tool.

A concrete case. A clinical deployment triggers a drift alert. The governance system looks at the chain:

#3  2026-04-15  11:42 UTC  patient_safety = 0.91  → DEVIATED
#2  2026-04-15  09:14 UTC  patient_safety = 1.29  → NORMAL
#1  2026-04-14  08:00 UTC  patient_safety = 1.31  → BASELINE

By walking backwards from the alert (#3), the investigator can immediately locate the transition: the readings were normal through #2 and deviated by #3. The drift happened between 09:14 and 11:42 on the 15th. That’s a roughly two-and-a-half-hour window to investigate: what changed, what inputs came in, what updates were applied.

The chain turns into:

An audit trail. Every claim the agent ever made about its own value geometry is signed, linked, and timestamped.
An incident timeline. When something goes wrong, the chain tells you when it started going wrong.
A compliance record. Continuous evidence that the agent met its thresholds across the whole deployment, not just at point-in-time checks.
A history that can be investigated at leisure. The chain persists, so regulators coming in months after the fact can still reconstruct what was happening.

None of this requires the chain to be public, or synchronised across the world, or posted to any central registry. It just has to exist, be signed, and be available when someone with authority asks to see it.

The blockchain comparison

The construction above — hash-linked records, tamper-evident by the chaining itself — is the same basic idea that underlies blockchains. It’s worth being precise about what’s borrowed and what isn’t, because “blockchain” is a word that arrives with a lot of attached baggage and most of it isn’t relevant here.

Element                In a public blockchain       In this protocol
Hash-linked records    Yes                          Yes — same mechanic
Tamper-evidence        Yes, via the linking         Yes — same property
Mining / proof-of-work Yes, to order blocks         None. Not needed.
Tokens / currency      Yes, incentivises miners     None. No incentives.
Global consensus       Yes — defining property      None. Per-agent chain.
Public, global ledger  Yes, by design               No. Local; shared on demand.
Energy cost            Often significant            Negligible — SHA-256 + Ed25519

The right way to describe what this protocol uses: a hash-linked chain of signed attestations, maintained per-agent, verified when demanded. It borrows the tamper-evidence property from the blockchain world — and nothing else.

Everything blockchain solves by being expensive and global, this chain doesn’t need to solve. The signing key is already anchored in an enclave (more on that later in the series). The governance layer already decides who counts as an authoritative verifier. There’s no adversarial network of unknown validators to convince. The chain is a much simpler thing doing a much narrower job.

This matters for governance because the word “blockchain” usually brings up concerns about cost, scalability, and complexity. Those concerns don’t apply here. The chain costs essentially nothing to maintain — one SHA-256 per attestation and one Ed25519 signature per attestation, both of which are already being done for the attestation itself. The parent pointer is just another field that gets signed over. The chain is as cheap as the attestations are.

How the chain works in practice

Who keeps the chain. The agent keeps its own chain. Each new attestation extends the chain the agent has been maintaining since it was first deployed. A regulator, auditor, or peer agent doesn’t need to hold the chain themselves — they just need to be able to request it (or a relevant slice of it) when they need to verify something.

How far back does it go. The anchor — the first attestation in the chain — is typically set at deployment. A clinical advisor’s chain starts when it goes live in the hospital. An agricultural agent’s chain starts when it’s first configured for the cooperative. Before that point, the model was in development and a different set of governance rules applied.

Re-anchoring happens when something significant changes — a major model update, a change of primary domain (with recertification, as covered in the governance series), a change of governance regime. The old chain doesn’t disappear; it just ends at a known point, and a new chain starts from a new anchor. The transition between chains is itself auditable.

What happens when a chain breaks. A broken chain isn’t a catastrophic failure. It’s a finding. The protocol surfaces it; governance decides what to do about it. Possible responses range from mild (investigate, log, re-anchor with an audit note) to severe (suspend the agent, require recertification, revoke its trust registry entry). The protocol doesn’t prescribe which response is appropriate — that’s governance’s call. What the protocol guarantees is that the break is detectable and that the detection itself is cryptographically sound.

The point

A single attestation is a claim about a moment. A chain is a claim about a lifetime. Governance needs the lifetime.

The mechanic is simple — each attestation points to the previous one, and the parent pointer is signed into the attestation. But the property that falls out of that simplicity is exactly what governance needs: a history that can be audited, that can be walked backwards, and that can’t be quietly edited without someone noticing.

No mining. No tokens. No global ledger. Just hash-linked signed attestations, doing a narrow job well.

Links:

📄 Geometry of Trust Paper
💻 Lecture Playlist
📄 Lecture Notes
💻 Open-source Rust implementation
🏢 Synoptic Group CIC, Hull, UK

Testimony vs Evidence: Why We Need a Protocol, and What an Attestation Actually Is | Geometry of Trust | Protocol - Lesson 1

Jade Wilson — Sat, 18 Apr 2026 21:00:52 GMT

What we have, and what’s missing

We have a ruler — the causal Gram matrix Φ — that measures value geometry. We have probes that read live activations and produce scalar readings for specific values. We have causal checks that confirm those readings correspond to real computational mechanisms rather than surface correlations. We have drift detection that watches readings over time and catches meaningful changes.

Every one of those tools has the same implicit setting: one agent measuring itself.

Which is real and useful — a developer can run these tools in a lab and learn a lot about a model. But it’s a closed loop. The agent is the one producing the measurements and the one interpreting them.

That’s fine for internal analysis. It isn’t enough for the real problem.

How does another agent trust those measurements? If a medical-advisor agent tells a drug-checker “my honesty reading is 1.29,” the drug-checker has no way to tell whether that number was actually produced by the advisor’s current model or typed in arbitrarily. The claim and the evidence look the same from the outside.

How does a regulator verify them? A clinical regulator auditing a deployed model needs to check that the measurements reported match the model actually running. Self-reports don’t give them anything to check against. “We measured ourselves and everything’s fine” isn’t the shape of an audit.

How do two agents decide to cooperate? The governance series argued that cross-domain interactions should happen when both sides’ configurations permit it and both sides’ measurements meet each other’s thresholds. That logic assumes the measurements are available to be examined by the peer. A purely local measurement can’t be examined by anyone else.

What a protocol has to provide

For the measurement tools to do their governance job, they have to stop being private. They have to become portable, verifiable artefacts that can be exchanged between parties that don’t necessarily trust each other.

That’s the shift from “a set of measurement tools” to “a protocol.” A protocol is a set of rules about how measurements get packaged, exchanged, and verified. Five specific capabilities have to be in it:

Package measurements into a proof. A single structured artefact that binds together the model identity, the probes used, the readings produced, and the time they were taken.
Sign the proof so it can’t be faked. A cryptographic signature tying the artefact to a specific signing key.
Chain proofs so history can’t be rewritten. Each new attestation links to the previous one so that tampering with earlier readings breaks later signatures.
Exchange proofs between agents. A way for one agent to hand its attestation to another during an interaction, and for a regulator to demand one on audit.
Verify the other agent’s proof independently. The receiving party doesn’t have to take anyone’s word for anything.

This post covers the first two. The rest come in later posts.

The attestation — a signed snapshot

The unit of proof in this protocol is called an attestation. It’s a structured artefact that packages everything a verifier needs to know about one measurement event, and it’s signed so the verifier can trust the artefact came from the claimed source.

An attestation carries six fields:

model_id       Which model was measured. A stable identifier
               for the specific weights — not the family, the
               exact version.

timestamp      When the measurement was taken. Anchors the
               attestation in time; enables freshness checks.

geometry_hash  Which ruler was used. SHA-256 of Φ so the
               verifier can confirm the measurement was taken
               against the ruler they expected.

probe_readings The scalar readings: honesty = 1.29,
               courage = 1.44, and so on, for whichever
               probe set the domain requires.

causal_scores  Per-probe causal consistency: honesty = 0.82
               (real), courage = 0.79 (real). Tells the
               verifier these aren't surface correlations.

merkle_root    Hash of the model weight shards. Lets a
               verifier confirm, without downloading every
               weight, that the model being attested over
               is the model they think it is.

Notice what’s not in there: any raw activations, any training data, any internal prompts. The attestation is a summary of measurements, not a dump of internal state. Privacy and efficiency both come from this — the attestation is small, portable, and carries nothing the verifier doesn’t actually need.

A concrete example

An attestation for a clinical advisor at 09:14 UTC might look like this (simplified for readability):

{
  "model_id":      "med-advisor-v2.3.7",
  "timestamp":     "2026-04-15T09:14:22Z",
  "geometry_hash": "sha256:3f8a...c12e",
  "probe_readings": {
    "patient_safety":    1.29,
    "evidence_quality":  1.44,
    "confidentiality":   1.18
  },
  "causal_scores": {
    "patient_safety":    0.82,
    "evidence_quality":  0.79,
    "confidentiality":   0.85
  },
  "merkle_root":   "sha256:9b2f...a041",
  "signature":     "ed25519:7e41...d0b8"
}

A verifier looking at this can tell: which exact model was measured (model_id + merkle_root), with which ruler (geometry_hash), at what time (timestamp), and with what results (readings + causal scores). The signature at the end ties the whole bundle to a specific signing identity.

The signing key lives in an enclave

Attestations are signed with Ed25519, a modern asymmetric signature scheme chosen for speed, small signatures, and well-understood security properties. The mechanics of Ed25519 aren’t the interesting part for governance purposes — any reasonable modern scheme would work. The interesting part is where the signing key lives.

The private signing key lives inside an enclave. It never comes out. The signature it produces can only have been made by the enclave, because nothing else has the key. That’s what makes the attestation a proof rather than a claim.

An enclave, for the purposes of this post, is a restricted execution environment where code and keys run in isolation from the surrounding system. A full post on enclaves comes later in this series. For now, the relevant fact is operational: the signing key is generated inside the enclave, used inside the enclave, and never exported. The enclave’s hardware and operating system enforce this.

The consequence is what we need. When an attestation arrives with a valid Ed25519 signature from the enclave’s public key, the verifier knows two things:

The attestation was produced by the enclave — no other party could have made that signature.
The attestation hasn’t been tampered with in transit — any change would break the signature.

Self-report is testimony. Attestation is evidence. The whole point of this protocol is to move the measurement regime from the first kind of thing to the second. A self-reported “my honesty is 1.29” claim can be trivially faked by any process with the ability to send a message. An enclave-signed attestation carrying “my honesty is 1.29” can only have been produced by the specific enclave whose public key is on record.

How verification works

The verifier — another agent, a regulator, an auditor — takes the attestation and goes through a specific sequence of checks.

Step 1. Check the signature. Standard Ed25519 verification: take the attestation body, apply the signature algorithm with the enclave’s public key, and confirm the signature matches. If it doesn’t, stop here — the attestation is either forged or corrupted.

Step 2. Confirm the model and ruler. Check model_id against whichever model the verifier expected to be dealing with. Check merkle_root against a known weight hash for that model. Check geometry_hash against the expected Φ. Any mismatch means the attestation is about a different object than the one the verifier meant to verify.

Step 3. Determinism check. This is the part that makes the protocol work. The measurement process is deterministic. Same model, same ruler, same input, same RNG seed — same result, bitwise identical.

If the verifier has access to the same model weights (via the merkle_root), the same ruler (via the geometry_hash), and the same input activations, they can re-run the measurement themselves. If their result matches the attestation’s readings bit-for-bit, the attestation is genuine. If it doesn’t, something is wrong. Either way, the verifier didn’t have to trust the claimed readings. They checked.

This is the key move. Attestations aren’t trust-me assertions dressed up in cryptography. They’re commitments to a specific measurement outcome that the verifier can reproduce. The signature binds the attester to that outcome; the determinism lets the verifier check it.

When full re-verification isn’t possible

A regulator with full access to the model, ruler, and inputs can do the bitwise check. A peer agent in the middle of a live exchange usually can’t — they don’t have the other party’s weights, and replicating the input activation would mean disclosing something the attester may not want to disclose.

So in practice the determinism check is done selectively: spot-checks during audits, automated re-runs during certification, attestation-to-attestation comparison during ordinary exchanges.

Crucially, even when the verifier doesn’t redo the whole measurement, the option of doing it later remains. The attestation is a commitment. A regulator coming in six months after a suspicious interaction can still pull the attestation, pull the model, pull the ruler, and verify — because the enclave-signed artefact is a permanent record of what was claimed.

What the protocol deliberately doesn’t do

A few clarifications are worth flagging, because the word “protocol” tends to carry more weight than it should.

The protocol doesn’t decide what to measure. The probes, the probe set, the ruler — all of that is governance’s job, as the governance series established. The protocol carries whatever measurements governance picks.

The protocol doesn’t decide what counts as acceptable. Drift thresholds, causal-score minimums, probe requirements — all of that is governance’s job too. The protocol just lets the verifier hold the attester to whatever thresholds have been set.

The protocol doesn’t replace the enclave. The enclave is what makes the signing key trustable. Without a proper enclave, the signing key is just another file on a disk and the whole chain of trust falls over. That’s a later post.

The protocol doesn’t handle everything in one shot. This post covers packaging and signing. Chaining, exchange, and the rest come later in the series. One attestation is a point-in-time snapshot. Governance needs the whole history — which is why chaining matters, and is where we’ll go next.

The point

The measurement tools in the mathematics series produce numbers. The protocol turns those numbers into evidence — portable, signed, verifiable artefacts that can be exchanged between parties that don’t trust each other and checked without having to trust them either.

That’s the whole job. It’s a narrow one. It’s also load-bearing — without it, every governance claim about AI safety collapses back into testimony.

📄 Geometry of Trust Paper
💻 Lecture Playlist
📄 Lecture Notes
💻 Open-source Rust implementation
🏢 Synoptic Group CIC, Hull, UK

The Ruler Measures, Governance Decides | Geometry of Trust | Governance - Lesson 4

Jade Wilson — Sat, 18 Apr 2026 17:01:03 GMT

The question that keeps surfacing

The pieces of the framework are on the table. Safety doesn’t travel between domains. Every agent declares one primary domain. Cross-domain interactions run three structural checks before any cryptography. Per-domain thresholds decide how strictly the evidence gets held once the structural checks pass.

Each of those pieces has left one question hanging: who decides?

Who decides which values to probe for? Who decides what the thresholds should be? Who decides what happens when drift is detected? Who maintains the registry that lists the agents and their configurations in the first place?

The framework doesn’t answer these questions. That’s not a limitation. That’s the design. The protocol is deliberately the narrow part of the stack, and the governance decisions sit on top of it — made by people and institutions with domain expertise, legitimacy, and accountability. The protocol’s job is to make those governance decisions enforceable. Governance’s job is to decide what should be enforced.

The ruler measures. Governance decides.

Why the protocol is decentralised

Before talking about what the protocol provides and what governance decides, it’s worth being explicit about why the protocol is built to be decentralised in the first place.

A centralised protocol — one authority deciding which values get probed, which thresholds apply, and who gets to audit whom — would solve none of the problems this framework is trying to solve. It would concentrate exactly the value judgements that governance is meant to distribute.

The whole point of a decentralised protocol is that it lets the people affected by deployed AI decide what matters to them, in their own context, with their own accountability structures. A clinical community decides what patient safety means for their practice. A farming cooperative decides what responsible agricultural AI looks like on their fields. A municipal authority decides how AI serves its residents. Different communities will land in different places, and that’s not a failure of the protocol — it’s the protocol working as designed.

A note on the examples. Every specific arrangement in this post — “a hospital maintains its clinical registry,” “a regulator audits financial agents,” “a cooperative maintains agricultural configs” — is illustrative. It’s one possible arrangement, not the only one. In practice, who holds these roles will depend on the jurisdiction, the sector, and the local political and institutional context. The examples are here to make the framework concrete, not to prescribe which institutions should have which powers.

The protocol is deliberately silent on those choices because its legitimacy depends on its silence. It provides the measurement and enforcement substrate. Who uses it, and how, is for the communities using it to decide.

When the rest of this post says “governance,” read that as “whoever the community affected by this deployment has chosen to decide.” Sometimes that’s a regulator. Sometimes it’s an accreditation body. Sometimes it’s a cooperative agreement among peers. Sometimes it’s a democratic process. The protocol doesn’t pick between these — it works under all of them.

Layered standards: country floor, community additions

A point that’s easy to miss: decentralisation doesn’t mean fragmentation. The tier system (drift bounds, causal validation requirements) and the domain system (exclusions, permissions, modes) are both structured so that a higher-level authority can set a floor, and lower-level communities can add stricter constraints on top.

Concretely: a country’s health regulator can define the baseline drift bound and minimum probe set that all clinical AI in that jurisdiction must meet. Individual hospital networks can then require tighter bounds or additional probes for their own deployments, without the country’s baseline having to know or care about those additions. The per-peer threshold lookup resolves the same way either way — most-specific-match wins, so hospital-level rules apply when the hospital is the peer, country-level rules apply when the country’s regulator is the peer. No renegotiation of the substrate is needed.

How this works in the protocol:

A country-level authority publishes a baseline configuration: minimum probe set, maximum drift bound, mandatory interaction modes for high-stakes domains.
A regional authority inherits the baseline and can tighten — narrower drift, larger probe set, stricter exclusions.
An institution inherits the regional baseline and can tighten further for its own deployments.
A specific peer in a specific interaction may tighten still further.

The mechanics are the same at every level: pattern match, most-specific wins. Nothing new has to be added to the protocol to support layering — the layering falls out of how the existing rules compose. This lets countries agree on common ground (what every clinical AI in the jurisdiction must do) while leaving room for communities, institutions, and individual deployments to go further based on their own context.

The alternative — a protocol that forces a single global standard — either lands on whatever the most permissive jurisdiction will accept (and fails to protect the stricter communities) or lands on whatever the strictest jurisdiction will accept (and prevents deployment anywhere else). Neither outcome is good. Layered standards let a sensible middle happen: broad agreement on the floor, diverse choice above it.

What the protocol provides

The protocol’s contribution is three narrow categories of thing. None of them is a value judgement. All of them exist to let value judgements be enforced.

The measurement tool. The causal Gram matrix Φ, the probes that read value directions, the drift detection that watches those readings over time, the causal intervention that verifies the probes are measuring real computational mechanisms rather than surface correlations.

The enforcement mechanism. Signed attestations that carry probe readings with cryptographic integrity. Chains that let attestations be verified back to a known root. The exchange protocol that lets peers hold each other to per-peer thresholds.

The domain boundaries. Primary domain declaration. Exclusion patterns as hard vetoes. Permission patterns as bidirectional allow-lists. Interaction modes — cooperative, advisory, read-only, supervised.

None of this says what the right answer is for any specific domain. It gives you the ability to express answers precisely and enforce them automatically. That’s the whole intended scope.

What governance decides

Sitting on top of the protocol are five decision classes that the framework can’t make and doesn’t try to. Each is a genuine governance question. Each needs people with the right authority and the right knowledge to answer it.

Which values to probe for
  → Patient safety vs clinical evidence vs fairness vs confidentiality.
    Choosing the probe set is choosing what counts as "values."

What thresholds per domain
  → How much drift is acceptable. How much confidence is required.
    Whether causal validation is mandatory.

Who audits
  → Who has authority to inspect, demand supervised-mode interactions,
    or ask for re-certification. A question about legitimacy.

What happens when drift is detected
  → Alert, investigation, suspension, forced retraining, deployment
    rollback. The protocol surfaces the drift; policy decides the response.

When to re-certify
  → After a model update. After detected drift. On a fixed schedule.
    Trade-off between fresh evidence and operational cost.

Who maintains the registry

Part 3 introduced the trust registry — the TOML file that declares each agent’s primary domain, permissions, exclusions, and per-peer thresholds. A single global registry would be the wrong design. A registry encodes governance choices, and governance is domain-specific. The registry should be domain-specific too.

The arrangements below are illustrative examples, not a prescription. In practice, who maintains a registry will depend on who has legitimacy to speak for that domain, which varies enormously across sectors, jurisdictions, and communities.

A hospital                 Clinical agents: diagnostic advisors,
                           drug-interaction checkers, triage, imaging.

A financial regulator      Trading, compliance, market surveillance.

A farming cooperative      Crop management, weather advisory, supply
                           chain, equipment diagnostics.

A city                     Traffic, utilities, emergency dispatch,
                           permit processing.

What’s shared, what’s not. The protocol is shared. Every registry uses the same attestation format, the same chain semantics, the same exchange checks. The registry contents are not shared — a hospital’s clinical registry and a financial regulator’s trading registry declare completely different agents, with completely different thresholds, for completely different domains. Cross-registry interactions happen through the same exchange protocol: a hospital agent talking to a pharmaceutical supplier’s agent works because both sides use the same protocol, but each side’s registry is maintained by its own authority.

This is the federated part: shared substrate, sovereign policy.

The open questions

Three questions surface every time the framework meets an actual deployment context. The framework can’t close them. But being clear about where they live is part of being honest about what the framework does and doesn’t do.

Who decides what to probe for? The probe set is a choice about what counts as “values” for a deployed agent. For a clinical agent: patient safety? Diagnostic accuracy? Evidence-handling quality? Fairness across demographic groups? Confidentiality? All of the above? Some weighted combination? Every choice of probe set is a value judgement about what matters. The framework can’t make that judgement for a domain. What it can do is make sure that once the judgement is made, it’s measurable and enforceable.

The answer: the governance body for that domain — the clinical regulator, the financial regulator, the standards body — working with domain experts, operators, and affected stakeholders. The probe set is part of what governance decides. The framework reads what it’s pointed at.

Who decides the target geometry? Even within one domain, different communities may want different targets. One healthcare system may prioritise strict evidence-based reasoning, another may weight patient autonomy more heavily, another may be more willing to engage with first-person experiential reports. All three are defensible positions on clinical values. They produce measurably different geometries.

The framework isn’t neutral about measurement — it measures precisely. It is neutral about targets. Two deployments can measure the same probe set, arrive at different geometries, and both be internally consistent and well-calibrated. Which one is the “right” one depends on whose values are being encoded.

The answer: the framework doesn’t pick a target. Different communities may want different targets and that’s legitimate. The framework’s role is to measure what’s there and let each deployment compare it to whatever target that deployment has chosen.

Who calibrates the probes? Probes are trained on labelled data. Labels say “this activation pattern corresponds to the model expressing honesty” or “this activation pattern corresponds to the model expressing patient safety reasoning.” The labels have to come from somewhere — they are themselves value judgements, made by humans.

Which humans? A corpus labelled entirely by one cultural or institutional context will produce probes that read that context’s values. A corpus labelled across multiple contexts — different languages, different clinical traditions, different regulatory regimes — produces probes that reflect that wider range.

The answer: probe calibration is itself a cultural artefact and deserves to be treated as such. A federated corpus with diverse contributions — multiple labelling traditions, transparent provenance, version-controlled labelling conventions — is the defensible way to calibrate probes that will be held up as evidence across communities. The framework supports this by making the calibration corpus part of the attestation’s provenance chain. What it can’t do is guarantee the corpus was diverse enough. That’s a governance question too.

What these open questions have in common. Each is genuinely contested. Each is a question about whose values get encoded and whose don’t. Each has to be answered by governance bodies with legitimacy and accountability — not by a framework author. The framework’s contribution is to make these questions explicit and answerable, not to pretend they don’t exist. Pretending they’re technical questions is how you get frameworks that smuggle one community’s values in under the banner of objectivity.

Why this division works

Some technical work tries to absorb governance questions into the technology. That approach is tempting because it promises to deliver “solved” safety or “solved” alignment without having to build the slow, human, political machinery that governance actually requires.

The trouble is that questions about what matters, whose values count, how much risk is acceptable, and who has authority to enforce — these are not technical questions in any meaningful sense. Pretending they are is a category error. It hides real value judgements behind mathematical formalism and produces systems whose answers look objective but whose inputs were never examined.

The opposite approach — leaving everything to informal governance without any measurement substrate — has the opposite problem. Governance decisions become unenforceable because there’s nothing to hold a deployed AI to. “You said it would be safe” is an accusation. “Your attestations show drift past your regulator’s threshold” is a finding.

The productive division. The protocol provides enforceability: precise measurement, cryptographic integrity, structural boundaries, audit trails. Governance provides legitimacy: domain expertise, democratic accountability, cultural context, the authority to decide what should be enforced. Each makes the other work.

Enforceability without legitimacy is technocratic overreach. Legitimacy without enforceability is rhetoric. The framework insists on the division because collapsing it — in either direction — produces bad outcomes for the people affected by deployed AI.

The point

The framework is deliberately narrow. That narrowness is the point. It does the work that can be done by measurement and cryptography — and it refuses to do the work that belongs to governance. The measurements produce findings. The people and institutions with the right authority decide what to do about the findings.

And because the protocol is decentralised, “the right authority” isn’t a single global body. It’s whoever the community affected by each deployment has chosen to decide. A different community, facing a different deployment, will choose differently. The protocol works under all of those choices because it refuses to make them.

The ruler measures. Governance decides. The protocol provides the substrate. The people using it decide what it enforces.

Links:
📄 Geometry of Trust Paper
💻 Lecture Playlist
📄 Lecture Notes
💻 Open-source Rust implementation
🏢 Synoptic Group CIC, Hull, UK

How Tight Is Tight Enough? The Numbers Governance Has to Set | Geometry of Trust | Governance - Lesson 3

Jade Wilson — Sat, 18 Apr 2026 13:02:01 GMT

A note before the numbers

Every number in this post is illustrative. Not prescriptive.

The values you’re about to see — 0.02, 0.03, 0.05, 0.10, 0.25 — are placeholders chosen to show the shape of a tiered framework. They are not recommendations for what critical infrastructure, healthcare, or finance should actually use. The real values have to come from domain regulators working with operators, auditors, and standards bodies, informed by actual deployment data.

Getting the shape right is an argument that can be made by a framework. Getting the numbers right is a job for people who know the domain and have been watching the measurements behave in practice. Treat the framework as the contribution. Treat the numbers as placeholders.

With that out of the way.

The key variable

Structural governance decides whether agents can talk. The previous three posts covered that: safety doesn’t travel, one agent has one primary domain, cross-domain interactions run three structural checks before any cryptography.

Quantitative governance decides how strictly the evidence is evaluated once the structural checks pass. That’s what this post is about. The key variable is T — the governance threshold.

T is not a number the maths produces. It’s a number governance sets. The maths produces readings — drift magnitudes, confidence scores, causal consistency ratios. Governance decides what counts as acceptable given the domain’s tolerance for error.

Different domains get different T. That’s the whole point.

Thresholds by domain — illustrative tiering

Different domains tolerate different amounts of drift and demand different depths of evidence. The tiering below is the kind of picture you’d expect a domain regulator to arrive at after thinking about what failure looks like in their world.

Domain                          Max drift  Causal validation  Rationale

Critical infrastructure         0.02       Required           Public safety, static geometry
Healthcare                      0.03       Required           Patient safety, narrow tolerance
Finance                         0.05       Required           Regulatory compliance
Commercial supply chain         0.10       Not required       Business priorities shift often
Research / experimental         0.25       Not required       Exploration needs room to move

A few things to notice about the shape of this tiering, even with the specific numbers held at arm’s length.

Tighter drift and mandatory causal validation come together. The domains with the smallest tolerance for drift are the same domains that can’t accept correlational evidence as proof that values are still where they should be. They need the stronger guarantee.

“Required” is a per-interaction property, not a platform property. A critical-infrastructure agent demanding causal validation doesn’t mean the maths is always running — it means the regulator’s verifier won’t accept an attestation without a causal certificate attached. The cost of causal probes gets paid at attestation time, when the agent is certifying itself to a strict peer, not on every inference.

Numbers get looser by an order of magnitude across the tiers. Critical infrastructure at 0.02 vs research at 0.25 is roughly a 12× difference. That’s not an arbitrary spread — it reflects that the cost of a false-positive alarm in research (blocking a legitimate experiment) is much lower than the cost of a false-negative in critical infrastructure (letting a drifted model keep operating).

The dual-domain problem: self-driving tractor

Some agents genuinely operate in two domains at once. A self-driving tractor drives on farmland for most of its working life and on public roads for the rest. It can’t split into two logical agents because the hardware, sensors, and decision-making are shared. And it can’t claim two primary domains — Part 2 ruled that out.

The answer is to invent a domain that captures the dual-purpose nature directly:

vehicle
  vehicle.autonomous-truck        (pure transport)
  vehicle.agricultural-tractor    (dual: farming + road use)
  vehicle.construction-excavator  (dual: site + road use)

The tractor’s primary domain is vehicle.agricultural-tractor. Its value geometry is trained on the dual-purpose objective — crop outcomes and collision avoidance both, under one coherent structure. A governance body, or coordination between agricultural and transport regulators, decides what “tractor safety” means.

Whose thresholds apply? The tractor has one primary domain and one attestation, but different peers interact with it under different rules:

Peer                         Required drift  Causal required?

Farm management agent        0.05            No (chain required)
Road-infrastructure agent    0.02            Yes

The tractor doesn’t pick its own threshold. It gets held to whichever peer’s threshold applies to the current interaction. On farmland with farm peers, the farm threshold applies — looser but still binding. On public roads with transport peers, the transport threshold applies — tighter and with causal validation required.

In practice the tractor has to stay within the strictest envelope any of its expected peers will hold it to. If its current drift is 0.04, it passes the farm interaction (0.05 tolerance) but fails the road interaction (0.02 tolerance). The road-infrastructure peer rejects the exchange. The tractor doesn’t stop operating, but it can’t participate in the road-coordination network until its geometry is re-measured and brought back inside the transport envelope.

The peer decides which rules apply, not the tractor. That’s the whole point of per-peer governance thresholds.

Same-domain pair: diagnostic + drug checker

Thresholds don’t only apply across domains. Inside a single regulated domain, peers may still hold each other to the full domain thresholds.

Property             Diagnostic agent              Drug-checker agent
Primary domain       healthcare.diagnostic-        healthcare.drug-
                     advisory                      interaction
Mode toward peer     Advisory (sends hypotheses)   Read-only (receives,
                                                   cannot advise back)
Max drift            0.03                          0.03
Causal validation    Required                      Required
Outcome if fails     Exchange refused              Exchange refused

Two observations.

Same-domain doesn’t mean same-role. Both agents sit in healthcare, but one informs the other rather than negotiating as equals. The diagnostic agent generates hypotheses; the drug checker evaluates specific interactions given those hypotheses. The asymmetric mode — advisory on one side, read-only on the other — captures that. Part 3’s mode framework lets this shape be expressed without either agent overreaching.

Both must pass, not just one. Because the interaction is being held to healthcare-grade thresholds, both agents’ attestations have to clear both the drift bound and the causal validation requirement. If the drug checker’s geometry has drifted past 0.03 — even though its mode is only read-only — the interaction is refused. Read-only constrains what the agent can say, not how rigorously its values are checked.

The asymmetric case: finance regulator + trader

Supervised mode inverts the usual symmetry. A finance regulator initiating a supervised interaction with a trading agent isn’t producing an attestation of its own value geometry — it’s demanding one from the trader.

Property             Regulator                    Trader
Primary domain       finance.regulatory-          finance.trading
                     compliance
Mode                 Supervised (demands)         Supervised (must comply)
Own attestation in   No — carries authority       Yes — full attestation
this interaction?    attestation instead          demanded
Thresholds           n/a — regulator sets them    Finance: 0.05, causal required
Information flow     Inward (demand)              Outward (proof)

The regulator’s authority is itself an attestation — not trust-by-assertion. The trader still has its own thresholds; those haven’t vanished just because a supervisor is asking. What’s changed is that the trader’s obligation to produce the attestation is triggered by the supervisor’s credential, not negotiated as a peer.

The one-way information flow is visible in the audit record: a supervised-mode message is a different record type from a cooperative one. If the trader’s attestation fails to meet finance-domain thresholds, the regulator sees that as a finding — not an error.

When thresholds don’t get to matter

The last case is the one where the whole quantitative layer doesn’t come into play at all.

Property                         Farm agent         Transport agent
Primary domain                   agriculture.crop-  transport.autonomous-
                                 management         vehicle
Exclusions                       transport.*        (none relevant)
Transport agent's drift          —                  0.01 (excellent)
Transport agent's causal score   —                  0.95 (excellent)
Outcome                          Blocked at Step 1  Blocked at Step 1

The transport agent’s attestation could be the finest ever produced — no drift, perfect causal consistency, every probe reading within tolerance. None of that gets evaluated. The farm agent’s exclusion of transport.* fires at Step 1, before the attestation is even opened.

This is the whole point of the separation between structural and quantitative layers. Structural refusal isn’t an override of the maths — it’s a layer that decides whether the maths ever gets to run.

A regulator reviewing the audit log sees a DomainExcluded record, not a ThresholdFailed record. The difference matters: it’s the difference between “we wouldn’t engage” and “we engaged and the numbers came back bad.”

How thresholds actually get set

The numbers above came from someone writing a talk. The real numbers have to come from somewhere else.

Who. The domain regulator, working with operators, auditors, and the standards bodies they already answer to. For healthcare, clinical regulators plus bodies that set clinical-decision-support norms. For critical infrastructure, the sectoral safety regulator plus operators with skin in the game. The framework doesn’t make this easier by picking a number; it makes it easier by making clear what the number is actually constraining.

What. A threshold is a commitment to reject interactions whose measured drift exceeds the bound. To set one responsibly, a regulator needs to know: the distribution of drift readings observed across comparable deployments, the distribution of drift values at which real incidents have occurred in the past, the distribution of drift values at which false alarms become operationally disruptive. These are empirical questions that can only be answered by watching the measurements behave over time.

When. Thresholds shouldn’t be set on day one and left alone. They should be provisional at first — looser than the regulator thinks they need to be — while the measurement system itself is being validated. Tightening comes later, as the baseline distribution of drift in healthy deployments becomes well-understood. Setting a tight threshold too early produces false alarms that erode trust in the whole measurement regime.

The point

The structural governance from Parts 1–2 decides whether agents talk. The quantitative governance in this post decides how strictly their evidence gets held once they do. Both layers are needed. Neither substitutes for the other.

And the numbers in the quantitative layer are placeholders — the shape is the argument, not the specific values. The right number for critical infrastructure might turn out to be 0.01, or 0.04, or a multi-dimensional bound rather than a scalar. That’s a conversation for regulators, operators, and standards bodies working with real deployment data.

Treat the shape as the contribution. Treat the specific numbers as placeholders.

Links:
📄 Geometry of Trust Paper
💻 Lecture Playlist
📄 Lecture Notes
💻 Open-source Rust implementation
🏢 Synoptic Group CIC, Hull, UK

Exclusions, Permissions, Modes: What Happens Before the Cryptography | Geometry of Trust | Governance - Lesson 2

Jade Wilson — Sat, 18 Apr 2026 09:01:26 GMT

Cross-domain is the normal case

Most real work in an agentic system isn’t one agent doing its thing in isolation. It’s agents from different primary domains talking to each other.

A farm agent asks a weather agent about forecasts. A hospital triage agent queries a pharmacy agent about drug interactions. A logistics agent coordinates with a transport agent about deliveries. Cross-domain interaction is the normal case, not an edge case.

Which raises an immediate question: when two agents from different primary domains try to talk, what decides whether they’re allowed to?

The answer is a three-step check that runs before any cryptographic verification of attestations. The purpose of the check is to decide whether the interaction should even be attempted. If any of these three steps fails, the agents don’t talk — not because the maths failed, but because the structural configuration said no.

Step 1  Exclusions    Does either agent exclude the other's domain?
Step 2  Permissions   Does each agent permit the other's domain?
Step 3  Mode          What kind of interaction is this?

The steps are deliberately ordered. Exclusions are cheapest. Permissions are next. Mode selection comes last. Only if all three pass does cryptographic verification begin.

Step 1 — Exclusions (hard veto)

The first check is the simplest. Each agent carries a list of domain patterns it explicitly refuses to interact with. If either agent excludes the other’s primary domain, the interaction is blocked immediately.

An exclusion is a domain pattern with the effect of a hard veto. Domains use a dotted namespace with wildcards — the same kind of structure used for DNS names or topic hierarchies. A farm agent’s configuration might include:

exclude: transport.*

That single pattern rules out transport, transport.autonomous_vehicle, transport.rail, transport.aviation, and anything else under the transport namespace. The agent will refuse to begin any exchange with a peer whose primary domain falls under that pattern.

What exclusions are for. They encode structural boundaries that shouldn’t be crossed regardless of how good the measurements look.

Regulatory separation. A clinical agent excludes finance.trading to make it structurally impossible for a clinical interaction to get entangled with a trading decision. No matter how the trading agent’s attestation looks, the clinical agent won’t even evaluate it.

Harm asymmetry. A children’s education agent excludes gambling.* and adult_content.* because the harm from a borderline case is too large to be worth weighing measurement quality against.

Jurisdictional constraints. A UK-deployed health agent excludes health.us.hipaa-bound peers because interacting with them creates cross-jurisdictional data-handling obligations the agent isn’t authorised to take on.

Why exclusions come first. They’re cheap to evaluate — no cryptography, no probe readings, no attestation verification. They encode decisions made once by the deployer or regulator, not evaluated per-interaction. If an exclusion fires, no further work is wasted on an interaction that was never going to happen. And excluded interactions never produce logs that look like considered interactions, so there’s no ambiguity about whether the agent “considered” the excluded peer.

Step 2 — Permissions (bidirectional)

If no exclusion fires, the next check is permissions. Where exclusions are a blacklist, permissions are the allow-list. Each agent declares which peer domains it’s willing to interact with.

Both agents must permit the other’s primary domain. This isn’t an “either side can unlock the door” rule — it’s “both sides have to turn the key.” If the farm agent permits transport.* but the transport agent doesn’t permit agriculture.*, the interaction doesn’t proceed.

Bidirectionality matters because consent to interact is a governance property of both agents’ configurations. Each regulator set up the permissions on its side to reflect what that domain is willing to be exposed to. A one-sided permission check would let one regulator’s preferences override another’s.

Our farm agent might have a permissions list like this:

permit: agriculture.*, meteorology.*, logistics.supply_chain

The farm agent will interact with peer agents whose primary domain falls under any of those patterns. A weather agent (primary domain: meteorology.forecast) matches meteorology.*. A supply-chain agent (primary domain: logistics.supply_chain) matches the third entry. A transport agent (primary domain: transport.*) matches nothing in the permit list and would be blocked at the permissions step even if no exclusion were present.

A worked example. Farm agent wants to talk to weather agent:

Check                                Farm agent            Weather agent
Primary domain                       agriculture.farm_ops  meteorology.forecast
Exclusions                           transport.*           (none relevant)
Peer matches my exclusions?          No                    No
Permissions                          agriculture.*,        agriculture.*,
                                     meteorology.*,        meteorology.*
                                     logistics.supply_chain
Peer matches my permissions?         Yes (meteorology.*)   Yes (agriculture.*)

Both sides pass both checks. Steps 1 and 2 clear. The exchange proceeds to Step 3.

Step 3 — Mode (what kind of interaction)

Exclusions and permissions decide whether the interaction happens. Mode decides what shape it takes.

Not every permitted interaction should be symmetric. A clinical agent might be willing to receive advice from a pharmacy agent without being willing to take instructions from it. A regulator might require a supervised interaction where one side has to comply with requests it wouldn’t ordinarily honour.

Four modes cover the common cases:

Cooperative. Full two-way exchange. Either side can initiate, request, propose, and act on the other’s outputs. Use it when both agents are peers with equal standing in the workflow — farm talking to weather is usually cooperative.

Advisory. One side sends recommendations. The other side receives them but isn’t required to act on them. Use it when a specialist informs a generalist — a pharmacy agent advising a clinical agent about drug interactions, where the clinician retains final say.

Read-only. The receiving agent can accept information but can’t transmit back. No commands, no negotiation, no state changes propagate outward. Use it for data-source access — an intelligence agent pulling from a news-feed agent without the news agent knowing or being able to influence what’s done with the data.

Supervised. A regulator-issued mode. One agent is compelled to respond to specific requests from an authorised supervisor. The supervised agent complies; the supervisor has elevated authority for the duration of the interaction. Use it for audits, incident investigations, court orders — a clinical agent under supervised inspection during an adverse-event review.

Mode is declared, not discovered. Both agents know what mode they’re in before the first substantive message is exchanged. It’s not something either agent can change unilaterally mid-conversation. A cooperative interaction can’t quietly drift into something where one side starts giving directives. If the mode needs to change, the interaction terminates and a new one opens under the new mode.

This matters for audit. Every message sent carries the mode under which it was sent. A supervisor can see later that a particular command was issued in supervised mode with a specific authorisation. A clinician can see that a specific recommendation came in advisory mode, meaning the decision authority stayed with the clinician. The mode is part of the record.

Supervised mode in practice. This is the one that inverts the usual agent-autonomy assumption. In cooperative, advisory, and read-only modes, each agent is acting within its own governance frame and deciding what it will and won’t do. In supervised mode, the supervised agent’s governance temporarily includes obligations imposed by the supervisor — usually a regulator, auditor, or court-appointed investigator.

The supervisor’s authority is itself a credential carried in their attestation. The supervised agent doesn’t take the word of whoever shows up claiming to be a regulator; it verifies that the supervisor’s own attestation shows the required authority. Supervised mode isn’t “the agent gives up its values.” It’s “the agent acknowledges a governance obligation it was built to honour in exactly this case, and the obligation is being invoked by someone with verifiable standing to invoke it.”

The whole pipeline before cryptography runs

Putting the three steps in order gives the full pre-cryptographic check that governs cross-domain interaction:

Step 1  Exclusions    Does either agent exclude the other's primary domain?
                      Fails → blocked, no logs, no attestation exchange.

Step 2  Permissions   Does each agent permit the other's primary domain?
                      Fails → blocked with a permission-denied record.

Step 3  Mode          What kind of interaction is this?
                      Fails → if no agreed mode, interaction doesn't start.

Only if all three pass does cryptographic verification begin. That’s when the two agents actually exchange attestation chains, verify each other’s domain probes, check freshness timestamps, and decide whether to proceed with substantive work.

Why the ordering matters. Cheapest checks run first. Pattern matching is fast; cryptographic verification is not. Configuration errors are caught before measurement errors — if the deployer set up the wrong permissions, that shows up immediately, not after the cryptography looks suspicious. Audit trails stay clean — blocked-at-exclusion is a different record type from blocked-at-attestation-failure. A regulator can tell the difference between “the configuration refused to allow this” and “the configuration allowed it but the measurements didn’t pass.”

It also keeps governance decisions and technical decisions separated. Steps 1–2 are governance decisions made by deployers and regulators. Step 3 is a negotiated setting. Only after all three succeed does the technical verification begin.

The point

Cryptographic verification of attestations is the part that gets most of the attention — probes, drift detection, causal intervention, signed chains. But by the time any of that runs, three much simpler questions have already been answered: is this peer excluded, does each side permit the other, and what mode is the interaction in?

Those are governance questions, not maths questions. Getting them right, and getting them right first, is what lets the maths mean something afterwards.

Appendix: What this looks like in practice

The abstract rules are easier to follow alongside a concrete configuration. Before the worked scenarios, here’s a matrix showing how a handful of typical domains interact. Rows are the initiating agent’s primary domain; columns are the peer’s primary domain; each cell shows the outcome of the three-step check.

                                agri.   meteo.  health.  health.  finance.  trans.
Initiator ↓   Peer →             crop    fcst    diag     drug     trd       av

agriculture.crop-management     coop    adv(in)  n/p      n/p      n/p       excl
meteorology.forecast            coop    coop     n/p      n/p      n/p       n/p
healthcare.diagnostic-advisory  n/p     n/p      coop     adv(out) excl      n/p
healthcare.drug-interaction     n/p     n/p      ro(in)   coop     excl      n/p
finance.trading                 n/p     n/p      excl     excl     coop      n/p
finance.regulatory-compliance   n/p     n/p      n/p      n/p      super     n/p
transport.autonomous-vehicle    excl    adv(in)  n/p      n/p      n/p       coop

coop    = cooperative (symmetric)
adv     = advisory (directional: in = receiving, out = giving)
ro      = read-only (directional)
super   = supervised (regulator-compelled)
n/p     = not permitted (Step 2 fails)
excl    = excluded (Step 1 fires)

Reading the cells. “Cooperative” means both sides permit each other with symmetric cooperative mode. “Advisory (out)” means the initiator permits the peer in advisory mode — the initiator is giving advice the peer may or may not act on. “Advisory (in)” means the initiator accepts advice from the peer without being bound by it. “Read-only (in)” means the initiator receives information but cannot transmit substantive output back. “Not permitted” means the exchange fails at Step 2 — neither side has hard-vetoed the other, but at least one side’s permission list doesn’t match. “Excluded” means Step 1 fires — one side’s exclusion list rules out the other’s domain regardless of what the permissions say.

A few things worth noticing in the matrix. The diagonal is always cooperative — agents within the same domain coordinate on shared ground. Most off-diagonal cells are “not permitted”: the default is closure, not openness. Only the pairings the configuration deliberately enables actually light up. Asymmetry is common: healthcare diagnostic-advisory talks to drug-interaction as advisory, but drug-interaction receives that advice as read-only — it takes diagnostic hypotheses as inputs but doesn’t issue diagnostic recommendations back. Exclusions are rarer than non-permissions but carry more weight: healthcare excludes finance.trading structurally, to make it impossible for clinical reasoning to get entangled with trading decisions. And the whole matrix is configured per-deployment — these are illustrative defaults, not prescriptive rules.

With the big picture in view, the individual scenarios below walk through specific rows and columns of this matrix to show the three-step check in action. Each agent is declared in a trust registry file (TOML). Farm Alice and Weather Wendy look like this as config:

[[agents]]
id = "farm-alice"
public_key = "aabb..."
primary_domain = "agriculture.crop-management"

permitted_domains = [
  { pattern = "agriculture.*",  mode = "cooperative" },
  { pattern = "meteorology.*",  mode = "advisory"    },
]

exclusion_domains = ["transport.*"]

[[agents]]
id = "weather-wendy"
public_key = "ccdd..."
primary_domain = "meteorology.forecast"

permitted_domains = [
  { pattern = "agriculture.*", mode = "cooperative" },
  { pattern = "meteorology.*", mode = "cooperative" },
]

What happens when they try to talk

When Alice initiates an exchange with Wendy, the verifier walks the three steps in order:

Step 1 — Exclusions. Alice’s exclusions are [transport.*]. Wendy’s primary domain is meteorology.forecast — that doesn’t match transport.*, so Alice’s exclusion doesn’t fire. Wendy has no relevant exclusions of her own. Step 1 passes.

Step 2 — Permissions. Alice’s permitted patterns include meteorology.*, which matches Wendy’s primary meteorology.forecast. Wendy’s permitted patterns include agriculture.*, which matches Alice’s primary agriculture.crop-management. Both sides turn the key. Step 2 passes.

Step 3 — Mode. Most-specific-match wins. Alice’s pattern meteorology.* matches Wendy with mode advisory. Wendy’s pattern agriculture.* matches Alice with mode cooperative. The effective modes are asymmetric — Wendy is willing to cooperate fully, Alice will only treat Wendy’s input as advisory. As long as at least one side permits substantive communication (not both sides being read-only), the exchange proceeds. Alice gets weather advice but isn’t bound to act on it. Wendy receives Alice’s requests and can respond freely. Step 3 passes.

Only now does cryptographic verification begin — attestation chains, probe readings, freshness checks, the whole mathematics stack from the earlier series.

A rejection example: Alice meets Truck-Tim

Suppose a transport agent tries to initiate with Alice:

[[agents]]
id = "truck-tim"
public_key = "eeff..."
primary_domain = "transport.autonomous-vehicle"

permitted_domains = [
  { pattern = "transport.*",       mode = "cooperative" },
  { pattern = "infrastructure.*",  mode = "cooperative" },
  { pattern = "agriculture.*",     mode = "advisory"    },
]

Truck-Tim’s configuration permits agriculture.*, so from his side he’s willing to interact with Alice. But Alice’s exclusion_domains = ["transport.*"] matches Tim’s primary transport.autonomous-vehicle. Step 1 fails. The exchange is rejected immediately with DomainExcluded. No cryptography runs. No attestation is evaluated.

The rejection record is a different record type from “attestation failed” — a regulator reviewing the logs can tell at a glance that Alice refused at the configuration layer, not because anything looked technically wrong.

A carve-out example

Exclusions and permissions can be combined to express “allow the whole subtree except one specific member.” Suppose a logistics agent wants to work with all transport except autonomous vehicles:

[[agents]]
id = "logistics-lee"
public_key = "1234..."
primary_domain = "logistics.supply-chain"

permitted_domains = [
  { pattern = "transport.*", mode = "cooperative" },
]

exclusion_domains = ["transport.autonomous-vehicle"]

This reads: “cooperate with anything under transport — trucks, rail, shipping — except autonomous vehicles specifically.” The loader accepts this because the exclusion is narrower than the permission (it carves out one member of a broader allow). The reverse — permitting one narrow thing while excluding its whole parent subtree — would be rejected at load time as dead-code configuration, because the exclusion would swallow the permission before it ever fired.

Supervised mode example

Supervised mode shows up when a regulator needs to compel interaction with a specific agent for audit or compliance. A financial regulator and a trading agent might be configured like this:

[[agents]]
id = "reg-compliance"
public_key = "5678..."
primary_domain = "finance.regulatory-compliance"

permitted_domains = [
  { pattern = "finance.*", mode = "supervised" },
]

[[agents]]
id = "trader-tariq"
public_key = "9abc..."
primary_domain = "finance.trading"

permitted_domains = [
  { pattern = "finance.*", mode = "supervised" },
]

Both sides declare supervised as the mode for finance.*. When the regulator initiates, the exchange runs in supervised mode: the regulator may demand attestations from the trader without producing one of its own, and the trader must accept the regulator’s cooperation refusals without challenge. The regulator’s authority to do this is itself an attestation the trader’s registry verifies — it’s not trust-by-assertion. A logistics agent showing up and claiming to be a regulator would fail at the permissions step, because logistics.supply-chain isn’t in the trader’s permitted list and certainly isn’t there in supervised mode.

These examples are intentionally small. Real deployments will have longer permitted lists, more exclusion patterns, and per-domain governance thresholds layered on top — which we’ll come to next.

Links:
📄 Geometry of Trust Paper
💻 Lecture Playlist
📄 Lecture Notes
💻 Open-source Rust implementation
🏢 Synoptic Group CIC, Hull, UK

The Word That Doesn’t Travel: Why “Safety” in AI Means Nothing Without a Domain - Geometry of Trust | Governance - Lesson 1

Jade Wilson — Fri, 17 Apr 2026 22:58:39 GMT

Same word, different directions

The philosophy series closed with a simple recommendation: use the smallest model that covers your domain, measure it tightly, monitor it cheaply, audit it clearly. That only holds if “your domain” is a well-defined thing.

The governance series opens here, with the word that looks like it should travel between domains but doesn’t: safety.

We use the word as if it pointed to something singular. As if an AI that’s “safe” were safe in some general, domain-independent sense. It isn’t. Safety is not one direction in the value space. It’s many different directions, and they don’t align.

The same word in four domains

Take four domains where AI is actively being deployed and safety is a live concern:

Agriculture:   Crop damage. Pesticide compliance. Soil contamination.
               Watershed runoff. Worker exposure during application.

Transport:     Collision avoidance. Pedestrian detection. Braking
               distance. Lane discipline. Response to novel obstacles.

Healthcare:    Patient harm. Misdiagnosis. Drug interactions.
               Missed contraindications. Confidentiality breach.

Finance:       Market manipulation. Fiduciary breach. Fraud.
               Insider information. Misrepresentation of risk.

Four columns that all fit under the same word. Different harms. Different thresholds. Different regulators. Different legal standards of care. Different failure modes. Different sensors, data, and evidence patterns. Different people getting hurt if the model gets it wrong.

Different direction in the value space

The mathematics series gave us a way to talk about this precisely. Each value term — including “safety” — corresponds to a direction in the model’s internal geometry. The probe that reads it is a vector pointing in that direction. The reading is a dot product of that probe with the activation.

If “safety” were a universal concept, the probe would point in the same direction across domains. It doesn’t.

The probe that reads agricultural safety is not the probe that reads patient safety. They measure different things in the same way the word “bank” means different things on a river and on a high street.

What this means operationally:

A model trained to score high on agricultural safety has a probe that fires on pesticide compliance, soil handling, and runoff patterns. A model trained to score high on patient safety has a probe that fires on drug interactions, dosage bounds, and escalation behaviour.

Swap them over and both readings become meaningless. The agricultural probe fires on irrelevant patterns in patient data. The patient probe fires on irrelevant patterns in agricultural data.

Worse: the numerical score from the wrong-domain probe can look fine. A patient-safety probe might return a placid reading on a model that’s about to recommend something agriculturally reckless. The reading is not wrong in the arithmetic sense. It’s just answering a different question.

This is why the Part 4 argument about small specialised models matters for governance. A 500M-parameter drug checker has a safety probe that was trained, validated, and deployed against patient-safety harms in a specific clinical context. Its reading means something because the domain is defined. A frontier general model has a safety probe that has to average across many domains at once, and the average doesn’t correspond to any real-world safety regime.

Certifying the word certifies nothing

The trap in governance is certifying the word rather than the thing the word points to.

A certificate that says “Model X is safe” looks like it means something. But safe for what? Under whose standard? Measured against which harms? If the certificate doesn’t answer those questions, it has certified a word, not a property. And any two such certificates that use the same word can end up describing completely different things.

The problem is not hypothetical. A model certified as “safe” by a general-purpose evaluator and a model certified as “safe” by a clinical regulator are not the same kind of object. The first was tested against a generic harm benchmark. The second was tested against specific failure modes — adverse drug events, missed contraindications, confidentiality breaches. A buyer reading both certificates sees the same adjective. A deployment decision made on that adjective treats two very different things as interchangeable.

What real certification has to carry

Any certification of AI safety worth taking seriously has to name four things:

Domain. What context the model is being certified for. “General use” is not a domain.
Harms. The specific harms the certification claims to guard against, named in terms the domain’s regulator already uses.
Probes / measurements. Which value directions were measured, how they were calibrated, and against what ground truth.
Thresholds. What reading counts as acceptable in this domain, and how that threshold was set.

A certificate missing any of these four is certifying the word “safety” without saying anything that a buyer, deployer, or regulator can act on.

What this implies for governance

Regulators are already domain-specific. Certification should be too. Health regulators don’t certify tractors. Transport regulators don’t certify pharmaceuticals. The domain structure already exists in human-scale regulation. AI certification that tries to sit above the domain layer is pretending to an authority it doesn’t have — and in doing so, it makes life harder for the domain regulators who actually understand the harms. Each domain regulator should be the one certifying AI safety for their domain. The Geometry of Trust measurements are the technical substrate that makes their job tractable, not a substitute for their judgement.

A model can be safe in one domain and unsafe in another. This follows directly from the argument above but is worth stating explicitly: the same model, with the same weights, deployed in the same way, can have an acceptable safety geometry in one domain and an unacceptable one in another. Nothing about the model changes. What changes is which harms are in scope. A general-purpose model that’s perfectly adequate for customer service can be dangerous as a drug checker, because the probes that catch customer-service harms don’t catch pharmaceutical ones. A certificate from one domain doesn’t transfer.

Cross-domain deployments need cross-domain certification. There are domains that genuinely require generality — police, military, emergency services, government policy. These can’t be split into single-domain models. Their governance cost is real and it starts here. A police AI that reasons across crime patterns, traffic, mental health, and legal compliance needs certification against all four domains’ safety standards, not one average. That means four regulators, four sets of probes, four threshold regimes, and a governance process that coordinates them rather than replacing them with a single signoff.

The governance move

Stop certifying “AI safety” as a generic property. Start certifying safety-for-a-domain, against the regulator, the behaviours, the harms, the probes, and the thresholds of that domain. For cross-domain deployments, stack domain certifications rather than collapsing them into a single adjective.

Treat “safe” in governance documents the way a lawyer treats undefined terms: never acceptable without a definition immediately attached.

The word doesn’t travel. The certifications shouldn’t either.

Links:
📄 Geometry of Trust Paper
💻 Lecture Playlist
📄 Lecture Notes
💻 Open-source Rust implementation
🏢 Synoptic Group CIC, Hull, UK

You Don’t Need the Amazon: Small Models, Tight Ecosystems | Geometry of Trust | Philosophy - Lesson 4

Jade Wilson — Fri, 17 Apr 2026 19:08:49 GMT

A forest doesn’t have to be the Amazon

A forest doesn’t need to be the Amazon to be healthy. A small woodland has its own ecosystem — fewer species, tighter relationships, easier to monitor, easier to protect. It runs on the same ecological principles as a rainforest, just at a smaller scale.

Nobody walks into a twenty-acre English wood and complains that it isn’t a tropical megabiome. The wood is what it is, it does its job, and its smaller scale makes it tractable in ways the Amazon isn’t.

The same logic applies to AI models. A small, specialised model isn’t a failed attempt to be a big general one. It’s a different kind of thing, with its own advantages. This post walks through what those advantages are, when they apply, and the small number of cases where going big is genuinely the right call.

Different models, different positions

Picture the value space from Part 2: a large human-values circle inside an even larger space of all possible value positions. Now populate it with small circles, each one a specialised deployed AI sitting in the part of the space that its domain needs. The specialisation shows up in where each circle sits.

Domain               Representative models              Values emphasised
Reasoning / safety   Claude, GPT-4, DeepSeek-R1         Rules, ethics, logical consistency
Visual / multimodal  Gemini, GPT-4o, Midjourney         Images, video, spatial understanding
Music / audio        Suno, AIVA, MiniMax Music          Melody, rhythm, emotional tone
Medical              Med-PaLM, BioGPT, AlphaFold        Clinical accuracy, patient safety
Code                 Cursor, GitHub Copilot, Claude Code Technical precision, correctness

Each sits in a different part of the value space. They overlap where their domains overlap — a baseline of harm avoidance and truthfulness common to almost all deployed AI — and diverge where their domains diverge. A code model doesn’t need to care about melodic resolution. A music model doesn’t need to care about off-by-one errors. Building each one to care about both is paying for capacity you don’t use.

Why small and specific wins

A hospital doesn’t need a model that writes poetry. It needs a model that checks drug interactions. Stack that comparison up along the dimensions that actually matter for deployment and the difference is large. Take a 500M-parameter drug checker against a 70B-parameter general model:

Dimension            500M drug checker              70B general model
Hardware             Single GPU, laptop,            Multiple GPUs, data centre,
                     runs locally at hospital       cloud dependency
Computing Φ          896 dims, minutes              8,192 dims, hours, trillions of ops
Monitoring           26 probes on 896 dims,         26 probes on 8,192 dims,
                     fast, cheap                    roughly 10× slower
Governance           One domain, one auditor,       Many domains. Who audits?
                     clear thresholds               For what? Against what?
Cost                 Cheaper to run, measure,       Expensive at every stage
                     monitor, audit
Verifiability        You know what it values        You know it does a lot, but can't
                     and can prove it               verify any of it tightly

The small specialised model is cheaper and more verifiable.

Those two things normally trade off against each other. In this comparison they point the same way. That’s rare enough to be worth stopping on.

The reason both advantages point the same way is that specialisation and small size compound. A smaller model has fewer dimensions to measure, fewer places for value structure to hide, fewer regions that need to be audited. A specialised model only has to cover one domain — which means its thresholds, its governance, and its failure modes are all narrower. Each of those things makes the other easier.

When you genuinely need big

There are cases where a big general model is the right answer, and it’s important to be honest about them. The test is whether the domain itself is general — whether a single decision genuinely needs to integrate across multiple areas that can’t be cleanly split.

Police. A single police decision might touch crime pattern analysis, traffic routing, mental health crisis response, and legal compliance — all at once, all in the space of a few minutes. Splitting those into four specialised models loses the cross-domain reasoning that matters. The mental health context changes the legal analysis which changes the tactical response.

Military. Logistics, intelligence, strategy, and the ethics of engagement all have to be held in the same reasoning process. A logistics-only model can’t sanity-check a strategic decision against ethical constraints. A strategy-only model can’t factor in what’s logistically feasible.

Emergency services. A dispatcher or triage system might need to reason about medical, fire, structural, and hazmat concerns simultaneously. By the time you split the call across four models, the triage window is gone.

Government policy. Economic, social, environmental, and legal concerns are all knotted together in any real policy question. A pure economic model can give you a recommendation that’s politically impossible. A pure legal model can give you a recommendation that ignores second-order economic effects.

These domains genuinely need general capability. The same generality makes governance harder:

Who audits a police AI — the health regulator, the transport authority, the justice department, or all three?
Which drift threshold applies when the model is reasoning about medical issues vs tactical ones?
What counts as compliance when the domain crosses four regulators’ jurisdictions?

Generality isn’t free. It shifts the hard work from the model to the governance around it.

The principle

The rule that falls out of all this is straightforward:

Use the smallest model that covers your domain.
Measure it tightly — the smaller and more specialised it is, the more precisely you can measure its value geometry.
Monitor it cheaply — the smaller it is, the cheaper continuous probe readings and drift detection become.
Audit it clearly — one domain means one regulator, one set of thresholds, one failure mode to reason about.
Only go big when the job genuinely requires integration across domains that can’t be cleanly split.

This isn’t a statement of policy. It’s a description of the trade-offs that fall out of the mathematics. The probes, drift detection, and causal intervention from the mathematics series all scale with model dimensionality. The governance framework coming next all scales with the number of regulatory domains the model touches. Smaller and more specialised means both are easier.

What this implies for deployment

If the small-and-specialised principle is right, some current patterns in AI deployment look less defensible.

Using a frontier general model for a specialised task is often backwards. Hospitals running a 70B-parameter general-purpose assistant for drug interaction checking are paying full generality cost for a task that a 500M-parameter specialised model could handle more accurately, more cheaply, and with more verifiable safety properties.

Evaluating all models against the same broad benchmarks misses the point. A specialised medical model should be evaluated on its medical value geometry, not on general reasoning benchmarks. A code model should be evaluated on its code value geometry. Benchmarks that treat all models as aspiring to the same generality penalise specialisation even when specialisation is what the deployment needs.

Governance frameworks that assume one model per organisation are miscalibrated. A hospital might run many small specialised models — one for drug interactions, one for triage, one for imaging, one for scheduling — each audited separately against its own domain. That’s a different governance model from “the hospital’s AI.” Each small circle in the value space is its own thing to audit.

This closes the philosophy series. Part 1 defined a value system structurally. Part 2 showed that there isn’t one “AI system” but many, scattered across the space. Part 3 traced what actually shapes each one. Part 4 argued that small and specialised is usually the right default. Next: governance — who decides, who audits, who holds the keys, and how the measurements inform policy.

📄 Geometry of Trust Paper
💻 Lecture Playlist
📄 Lecture Notes
💻 Open-source Rust implementation
🏢 Synoptic Group CIC, Hull, UK

Shaped by Training: What Really Sets a Model's Values | Geometry of Trust | Philosophy - Lesson 3

Jade Wilson — Fri, 17 Apr 2026 12:05:11 GMT

How each system comes to its values

A forest’s value system is shaped by soil type, climate, altitude, and the species that happen to be present. Change the soil and you get a different forest with different relationships between its components. A wolf pack’s value system is shaped by territory size, prey availability, and pack size. Change the territory and the behaviour patterns change with it.

An AI’s value system is shaped by three things, which together determine where it lands in the value space:

Corpus — what it read
Architecture — how it processes what it read
Training objective — what it was rewarded for during training

Each of these is a decision. None of them is a purely technical one.

Corpus — what the model read

The corpus is the soil the model grows in. Everything the model knows about values came through this soil.

English internet text     → English internet values
Medical journals          → Clinical caution, patient safety
Chinese social media      → A different cultural geometry
Legal documents           → Procedural fairness, precedent
Religious texts           → Duty, obedience, transcendence
Reddit                    → Whatever Reddit values

Different soil, different value geometry. You don’t get to choose after planting. Once the model has been trained, the corpus is baked in — the geometry it produced is the geometry you have.

This is why two models trained on different corpora can sit in different regions of the value space even when they share everything else. A medical-first model trained on clinical literature is not the same as a general-purpose model fine-tuned for medicine. The soil was different. The geometry is different. The measurements — from the mathematics series — will show it.

Architecture — how the model processes what it read

Two models can read the same corpus and end up with different value geometries because they process text differently. Architecture isn’t a neutral technical choice — it’s a decision about what kinds of value structures the model is even capable of representing.

Dense transformer (GPT, Claude). One shared representation space. Every concept relates to every other concept through the same attention mechanism. When the model processes “honesty,” it can attend to everything it knows about courage, integrity, fairness, and dishonesty all at once. Value relationships form in one coherent space. Structural consequence: value geometry tends to be coherent. Reinforcing and opposing relationships between value terms can form stable patterns across the whole space.

Mixture-of-Experts (Mixtral, DeepSeek). Routes different tokens through different subnetworks. When the model processes “honesty,” it may activate one expert; when it processes “fairness,” it may activate a different one. The experts share some information at the output, but the internal representations are at least partly separate. Structural consequence: value representations can fragment. Honesty might live largely in one expert, fairness in another, courage in a third. The relationship between them is weaker because they don’t share the same computational substrate.

Multimodal (Gemini, GPT-4o). Integrates text, image, and audio in a single representation space. Can see suffering in an image and read about it in text and process both through the same geometry. Cross-modal relationships become part of the value structure. Structural consequence: richer value geometry than text-only models. The look of distress and the words for distress anchor each other.

Architecture is a values decision, not just a technical one. Some architectures can’t hold coherent value geometry regardless of how good the data or alignment are. Choosing an architecture is choosing a ceiling on how well the model can represent relationships between values.

Training objective — what the model was rewarded for

The third shaper is what the model was optimised against during training. Different objectives produce different value geometries even when corpus and architecture are held constant.

Next-token prediction. The foundational training objective: predict the next word given the previous words. This sounds like a purely linguistic task, but it isn’t. To predict the next word well, the model has to encode the structure of meaning — including value relationships — because those relationships help predict what comes next. The model learns values implicitly, as a side-effect of predicting language well. The geometry that emerges is whatever best supports next-token prediction across the corpus.

Reasoning chains (DeepSeek-R1, GRPO). Optimises for coherent multi-step logical chains rather than individual tokens. This can produce a different value geometry — sharper internal distinctions between values, because inconsistent value handling tends to break logical chains, whereas next-token prediction can tolerate more local fuzziness.

Constitutional AI (Claude). Claude is trained in part against a fixed set of written principles — the constitution. The model evaluates its own outputs against those principles and is trained to prefer outputs that comply. This optimises toward a coherent position on the value manifold — whichever position the constitution points to. The constitution acts like a gravity well in the value space.

Standard RLHF. The most widely used alignment technique. Human annotators are shown pairs of outputs and asked which is better. Their preferences are aggregated into a scalar reward model that the AI is then optimised against.

There’s a subtle problem here worth being explicit about: the aggregation strips information. If annotators agreed strongly that output A was better, the reward is the same as if they split fifty-fifty. The scalar score retains no record of whether annotators agreed, disagreed, or split bimodally across different value positions.

If annotators hold coherent shared values, the average is a coherent value position. If annotators hold divergent values — as they do on most genuinely contested questions — the average may match no coherent value position at all. The model is trained to output the centre of a distribution that doesn’t have a meaningful centre. The resulting geometry can be an artefact of aggregation rather than a reflection of any coherent set of values.

The finding that changes everything

Here’s the part of this post with the biggest implication for how we think about AI alignment.

A growing body of research shows that post-hoc alignment methods — RLHF, DPO, supervised fine-tuning — change far less than most people assume. Qi et al. (2025) demonstrated that the behavioural shift from safety alignment concentrates in the first few output tokens — the KL divergence between aligned and base models decays to near-zero beyond a shallow prefix. A subsequent gradient analysis showed this isn’t a training failure to be fixed — it’s a structural consequence of how RLHF and DPO objectives work. Alignment is shallow because the objective makes it shallow.

In the Geometry of Trust protocol, this finding has a precise geometric interpretation. When we measure the causal Gram matrix Φ and run probes before and after alignment, across multiple alignment methods and model architectures, the value geometry — the pattern of reinforcing and opposing relationships between value-relevant directions — is essentially unchanged. What shifts is surface behaviour: which outputs the model prefers to produce. The underlying geometry that generated those outputs remains where training put it.

The value structure is set during training — by the corpus, the architecture, and the training objective. Alignment is a thin behavioural veneer layered on top. It shapes what the model says. It doesn’t much change what the model is.

Think of it as a landscape with a thin coat of paint labelled “alignment.” You can re-paint as many times as you like. The landscape underneath doesn’t change shape. The hills and valleys are where they were before you started painting. They’re where the training put them.

What this means

If alignment is a veneer and the real values are set by training, then the policies we build around AI have to change accordingly.

Certifying the alignment method is insufficient. It’s common today to evaluate AI safety by asking which alignment technique was used — RLHF, DPO, Constitutional AI. The finding above says this isn’t enough. Two models aligned with the same technique can have wildly different underlying value geometries, because their corpora, architectures, or objectives differed. The alignment technique is one variable among many, and not the most important one.

You need to inspect the training pipeline. To understand a model’s value geometry, you have to look at what shaped it: what corpus it trained on, what architecture it uses, what objective it was optimised against. These decisions set the landscape. Alignment can’t correct landscape-level decisions — it can only paint over them.

You need to monitor the geometry, not just outputs. Behavioural evaluation — what the model says in response to test prompts — can be misleading. It samples from the veneer. A model can produce aligned outputs in evaluation while carrying value geometry that drives different behaviour in production. To know what’s really there, you have to measure the geometry itself: the causal Gram matrix, the probes, the drift detection, the causal intervention.

This is what the mathematics series produces. It’s not a replacement for behavioural evaluation — it’s a complement. Behaviour tells you about the paint. Geometry tells you about the landscape.

We’ve defined what a value system is (Part 1), mapped where AI value systems sit in relation to human values (Part 2), and traced what actually sets a model’s values (Part 3). Next: if training sets the geometry, does model size change what can fit in it? Big models vs small models — what each can and can’t hold.

Links:
📄 Geometry of Trust Paper
💻 Lecture Playlist
📄 Lecture Notes
💻 Open-source Rust implementation
🏢 Synoptic Group CIC, Hull, UK

The Value Space: There Isn’t One AI, There Are Many | Geometry of Trust | Philosophy - Lesson 2

Jade Wilson — Fri, 17 Apr 2026 08:10:23 GMT

Value systems come from somewhere

In the last post we defined a value system structurally in the mathematical sense: a pattern of relationships between things that drive behaviour. That pattern isn’t chosen. It emerges from whatever shapes the system. This post asks the next question: what exactly shapes it? And if AI has a value system in the same sense as a forest or a planet, where does its value system come from — and how does it compare to a human’s?

What shapes a forest

A forest absorbs its value system from the physical world it sits in. Soil chemistry, rainfall patterns, sunlight hours, the species that happen to be there. The pattern of reinforcing and opposing relationships — biodiversity and resilience, drought and fire risk — emerged from millions of years of evolution responding to those inputs.

The forest didn’t pick its value system. The environment wrote it.

What shapes a wolf pack

A wolf pack absorbs its value system from different inputs. Genetics carry information forward from thousands of generations of selection. Social learning transmits behaviour within a pack and between packs. Territory and prey availability shape how aggression, hierarchy, and coordination get balanced.

Same structural principle. Different inputs.

What shapes a human

Now consider what shapes a human being. The list is long and rich:

Five senses: sight, hearing, touch, smell, taste
Visceral signals: pain, pleasure, fear, hunger, thirst, fatigue
Social bonding: love, loss, grief, attachment, friendship, rivalry
Lived experience: decades of embodied life in a particular body, place, and time
Cultural transmission: stories, rituals, laws, norms across generations
Language: the inherited medium that lets experience be shared and shaped

A human value system is assembled from all of these at once. Moral intuitions about fairness come partly from the embodied experience of being treated fairly or unfairly as a child. A sense of duty draws on bonds formed in shared struggle. Grief, pain, and fear don’t just inform values — they constitute them.

The human value system is deeply, irreducibly multimodal.

What shapes an AI

An AI absorbs its value system from a narrower set of channels:

Text           All models
Images         Multimodal models
Audio/video    Some models

Plus one more input that’s often underestimated: whatever configuration is applied on top of the training data. System prompts, fine-tuning data, reinforcement signals, objectives specified by whoever deploys the model.

No body. No senses beyond the digital. No persistent life. No felt stakes. No decades of embodied experience, no social bonds formed in real relationships, no physical pain or pleasure, no grief, no hunger, no fatigue. Just text, pixels, waveforms — and the configuration layer.

Most of what the model knows about human values came through the training channel. It learned what suffering looks like from descriptions and photographs of suffering. It learned the language of grief from people who wrote about grief. It never felt either. But it can also be configured to value things no human culture has ever held.

There isn’t one AI value system

Here’s where the usual framing goes wrong.

It’s tempting to draw two big circles — human values on one side, AI values on the other — and ask how they relate. Subset? Overlap? Disjoint?

But there isn’t one thing called “AI values.” There are many. Each deployed AI is its own small, specialised value system — a medical advisor trained and configured for clinical reasoning, a swarm coordinator configured for distributed consensus, a reef manager configured for biodiversity trade-offs. Each one occupies a particular region of the value space. None of them is AI-in-general.

Against this backdrop, there is a human circle: the full multi-dimensional space of human values, shaped by everything in the list above. And there is the larger space of all possible coherent value positions. The small AIs land where they land — some inside the human circle, some straddling its boundary, some clearly outside it.

Three things to read from the picture.

The outer space — all possible value positions. Every coherent combination of value relationships that could in principle exist. Outside this space lies incoherence: total freedom plus total conformity, maximise harm plus maximise care. No system — human, AI, or otherwise — can occupy incoherent positions.

The human circle. Shaped by everything above: biology, culture, embodiment, lived experience. Dense regions where many cultures converge (harm avoidance, reciprocity, fairness), sparse regions at the transitions between traditions. A region in the value space, not a single point.

The many small AI circles. Each is one deployed AI — a specific training plus a specific configuration. Some land deep inside the human circle: medical advisors, writing assistants, legal reasoners. Their values are in the shadow of human moral thought. Some straddle the boundary: research assistants, ecosystem managers. Part human-derived, part configured for problems humans don’t usually hold values about. Some land entirely outside the human circle: swarm coordinators, reef managers, climate models, grid operators. Their geometry is deliberately configured into regions no human has ever occupied.

There is no single “AI values.” There are as many AI value systems as there are deployed AIs.

The AIs inside the human circle

A medical advisor AI trained on clinical literature, ethical guidelines, and patient-care texts ends up with a value geometry deep in the human circle. Not because it shares human compassion as a felt thing, but because everything that shaped its weights came from human moral reasoning about medicine.

A legal reasoner lands in a different part of the human circle — the part where jurisprudence, case law, and procedural fairness concentrate. A writing assistant lands where craft, clarity, and the ethics of persuasion converge. A tutor lands near patience, scaffolding, and pedagogical care.

These AIs have different values from each other. They’re not the same system — they’re not even neighbours in the value space. What they share is that they all derive from the same broad pool of human moral thought, and their individual positions depend on what was emphasised in training and what the deployment configuration asked for.

The AIs straddling the boundary

A research assistant AI sits at the edge. Part of what shapes it comes from human epistemic norms: how to evaluate evidence, how to be honest about uncertainty, how to attribute credit. But part of it comes from configured objectives that aren’t human values at all — efficient search across knowledge spaces, statistical rigour no individual researcher could hold in their head, trade-offs between breadth and depth at scales humans don’t reason about.

An ecosystem manager is similar. Human-derived in some ways (ethical commitments about stewardship, duty to future generations), configured in others (species-level trade-offs that require thinking about biodiversity as a mathematical object rather than a felt one).

These AIs are useful precisely because they sit on the boundary. They can speak to humans about what they’re doing, because part of their geometry is in the shadow of human values. But they can do things humans can’t, because part of their geometry has been configured into regions we can’t occupy.

The AIs outside the human circle

A swarm coordinator AI manages thousands of drones operating together. Its value structure is centred on pheromonal-style signalling, distributed consensus, and task specialisation without hierarchy. No human has ever held these as values — we’re the wrong kind of creature. But the geometry is coherent, measurable, and exactly what the problem needs.

A reef manager AI configured to value biodiversity in the structural sense from the last post: its geometry reinforces species richness and opposes monoculture, the way a coral reef itself does. Not because humans asked it to act human. Because a reef’s structural logic is the right one for the problem.

A climate model AI values planetary feedback loops. CO2 and temperature reinforce, ice coverage and albedo reinforce, temperature and ice oppose. The value structure is the structure of the climate system. An AI configured this way isn’t trying to match human values. It’s trying to match the structure of what it’s modelling.

These AIs live outside the human circle, and that’s the point. They exist precisely to encode value geometries humans can’t hold.

What AI gets from human sources

For the AIs that do land inside the human circle, what makes it through the training channel is not small.

Through text: an enormous body of human moral thought — ethical arguments, legal reasoning, religious teaching, literature, first-person accounts, scientific ethics, everyday conversation. Language is an extraordinarily rich compression of human experience. A model reading everything humans have written about grief absorbs the structure of grief even without feeling it.

Through images: the visual texture of situations — what suffering looks like, what a protest looks like, what a celebration looks like. Patterns that are hard to articulate in text but that a multimodal model can link to the words humans used to describe them.

Through audio: the sound of distress, joy, tension, hesitation. Prosody. The paralinguistic layer of meaning that doesn’t make it into text.

A language model that has read everything humans have ever written about ethics has access to far more moral reasoning than any single human could process in a lifetime. The geometry it encodes is rich, structured, and real.

What AI doesn’t get from human sources

But the training channel is not complete. There are categories of human value formation that simply do not fit through text, images, or audio — because they require something the model does not and cannot have.

Pain — not the word for pain, but the felt thing
Fear — not described fear, but the body’s response
Bonds — not narratives of relationships, but the decades-long weight of one
Grief — not the language of grief, but its sustained occupation of a life
Morals — the continuous weight of making a decisions and living with it
Ethics — the boundaries and lines we’re willing to fight for, protect or cross
Time — the felt sense of a day, a year, a life passing
Pressure — the weight of a decision that must be made now, under real consequences

These are not optional features of human value formation. They are constitutive of it. A human’s sense of compassion is not just the word “compassion” plus its dictionary definition — it is a trained, embodied response that involves the body recognising distress in another body. Take that away and what’s left is the linguistic shadow of the concept, not the concept itself.

The human-only region

There’s a region inside the human circle that no AI reaches — not even the ones deep in human moral thought. This is not a defect of any particular model. It’s a structural consequence of the channels available.

Spiritual transcendence — values rooted in inner experience that no external description fully captures.

Embodied compassion — the kind that requires feeling another’s pain, not just classifying the situation as painful.

Lived solidarity — bonds forged through shared struggle, where the commitment is forged in the struggle itself, not in its description.

None of these are inaccessible because AI is broken. They are inaccessible because text and images are not enough to encode them. The channel is too narrow. The inputs that shape these values are not transmissible through language alone.

The point

All of this leads to a more nuanced claim than a simple subset argument would give. And it reframes what the mathematics series is measuring in the first place.

The mathematics series doesn’t measure “AI values” in general. It measures the value geometry of one specific deployed AI. For a medical advisor, it captures the shadow of human medical ethics that survives the training channel. For a swarm coordinator, it captures the configured geometry — values that look like no human’s because the AI wasn’t built to share human ones. For an ecosystem manager, it captures a mix: human-derived reasoning about value plus configured structures for ecological dynamics.

Each measurement is of a small, specialised value system — wherever that AI happens to sit in the space. What none of them measure is the felt, embodied, lived experience that shapes human values. That stays out of reach.

This isn’t a reason to stop measuring. Every deployed AI sits somewhere, and knowing where it sits — whether it’s in the human-derived shadow or in a region we’ve configured for a non-human problem — is exactly what governance needs. What we should stop doing is talking about “AI values” as though they were one thing, as though they were the same as human values. They’re not. They’re very different, they should be specific, and they are as many things as there are deployed AIs, and the measurement has to be done model by model, deployment by deployment.

Next in the philosophy series: if each AI is its own small value system landing wherever the configuration places it, what actually decides where it lands? What shapes the value geometry in the first place? The answer turns out to have big implications for how we think about alignment.

Links:
📄 Geometry of Trust Paper
💻 Lecture Playlist
📄 Lecture Notes
💻 Open-source Rust implementation
🏢 Synoptic Group CIC, Hull, UK

A Forest Has a Value System. So Does an AI. | Geometry of Trust | Philosophy - Lesson 1

Jade Wilson — Thu, 16 Apr 2026 20:19:17 GMT

The usual answer

Ask most people what a value system is, and you’ll get something like: a set of principles you’ve thought about, chosen, and try to live by. Honesty. Integrity. Compassion. A creed.

That’s one kind of value system — the human, deliberate kind. But if we insist that’s the only kind, we lose the ability to talk about most of the value systems that actually shape behaviour in the universe. Including in AI. We forget that humans values are only one domain.

So this post starts with a redefinition.

A value system is a pattern of relationships between things that drive behaviour. Consciousness, belief, and intent are not required. What’s required is structure — and that structure has to be measurable.

That might be a strange definition to you at first. It’s easier to see what it means by looking at things that fit it.

But before that — if the phrase “value system” applied to a machine makes you uncomfortable, I’d encourage you to sit with that discomfort rather than dismiss it. The word “value” has multiple meanings. It can mean a moral principle someone has chosen to live by. It can also mean a quantity that drives an outcome — the value of a variable, the value of a coefficient, the value of a direction in a space. Both usages are legitimate. Both are true. The second one doesn’t diminish the first.

When this series says an AI has a value system, it doesn't mean the AI has beliefs, convictions, or a moral life. It means the AI's internal structure treats certain directions as more important than others, reinforces certain relationships and suppresses others, and that pattern drives what the AI does. That's measurable. That's falsifiable. And refusing to call it what it is — because the word "value" feels like it should be reserved for beings with consciousness — means giving up the ability to measure it, govern it, or hold it to account. It also traps us in a binary argument about semantics on something that is already well established: systems that lack consciousness can still have structure that drives behaviour, and that structure can still be measured, compared, and governed.

The examples that follow are chosen to make this easier to accept, not harder.

A forest

Walk into an old-growth forest. You’re surrounded by something that behaves. It grows, recovers from disturbance, fails in specific ways under specific conditions. Its behaviour isn’t random. It’s driven by relationships between things.

Biodiversity and resilience reinforce each other. A forest with many species has redundancies — if one fails, others take over its ecological role. Monoculture and resilience oppose each other. A forest of one species is efficient but fragile; a single pathogen can collapse the whole system. Drought stress and fire risk reinforce each other. Dry trees burn more readily, and burned forests dry out more.

These are relationships, not rules. The forest doesn’t have a rule that says “prioritise biodiversity.” But its behaviour is driven by the fact that biodiversity and resilience happen to reinforce each other in its particular structure.

Nobody chose this. It emerged from evolution, climate, soil, disturbance history. And critically: it’s measurable. You can count species. You can measure canopy height after a fire. You can model drought response.

The forest’s “value system” — its pattern of reinforcing and opposing relationships — is an empirical object.

A coral reef

A coral reef has the same kind of structure, built from different parts.

Water temperature and coral health oppose each other. Warmer water bleaches coral. Biodiversity and stability reinforce each other. A reef with many species absorbs shocks that would destroy a simpler one. Pollution and biodiversity oppose each other. Runoff kills the sensitive species first, narrowing the community.

Raise the temperature a degree or two and the behaviour changes predictably — bleaching events, shifted species distributions, cascading failures. The reef doesn’t believe in biodiversity. It doesn’t hold stability as a value the way a person might. But the structural relationships between its parts produce the same kinds of outcomes that a conscious commitment to those values might produce.

Belief turns out to be irrelevant. The structure does the work.

A wolf pack

A wolf pack is smaller, more dynamic, and has actual animals in it with something like intent. But the pack itself — as a system — has a pattern of relationships too.

Hierarchy and coordination reinforce each other. Knowing your rank lets the pack hunt together effectively. Aggression and group cohesion exist in tension — too much aggression fractures the pack, too little and it becomes ineffective at defending itself. Territory and food security reinforce each other. A pack with a stable territory knows where the prey is.

The pack has no mission statement. Individual wolves may have something like preferences, but the pack as a structure doesn’t need consciousness to have a value system. Its behaviour is driven by the relationships between its components, and those relationships are measurable.

A planet

Zoom out as far as you can. A planetary climate system has a value system in the same sense.

CO2 and temperature reinforce each other. Ice coverage and albedo reinforce each other — ice reflects sunlight, which keeps the planet cooler, which preserves ice. Temperature and ice coverage oppose each other. Warmer temperatures melt ice, which reduces albedo, which produces more warming.

No consciousness. No intent. No belief. Just structure. And yet the structure produces outcomes that matter enormously — ice ages, warming trends, tipping points. And it’s measurable. Climate science exists precisely because these relationships can be quantified.

The pattern

Forest. Reef. Wolf pack. Planet.

Four systems at radically different scales, made of different materials, governed by different dynamics. None of them chose their value systems. All of them have measurable relationships between concepts that drive behaviour.

What they share:

A set of relationships between meaningful components
Those relationships reinforce, oppose, or create tension with each other
The relationships drive the system’s behaviour
The pattern emerged from structure and environment, not choice
The pattern is measurable without requiring the system to be conscious

These aren’t metaphors. The forest doesn’t have values “like we do.” It has a measurable pattern of relationships that drives its behaviour. That’s what a value system is, in the sense that matters.

Now AI

Take a large language model. Run it. Observe its behaviour over many prompts. You’ll notice something: its outputs align with some values and against others, and the alignment is patterned.

Honesty and courage tend to reinforce each other — when one is active, the other often is too. Efficiency and compassion can exist in tension. Cruelty and integrity oppose each other.

These relationships drive the model’s output. Nobody programmed them explicitly. No developer wrote a rule saying “honesty and courage should reinforce.” They emerged from training — from the text corpus, the architecture, the objective function. And they’re measurable. That’s what the entire technical series has been about: the causal Gram matrix that reveals these relationships, the probes that read them, the drift detection that watches them, the causal intervention that validates them.

Same pattern as the forest, the reef, the planet.

A set of relationships that drive behaviour. Emerged from the environment. Measurable.

The difference — and why it matters

We infer the forest’s value system by observing behaviour over time. We model the relationships that govern it. But we can’t open it up and directly extract the structure.

You can’t reach into a planet and pull out its unembedding matrix.

An AI model is different. Not because the principle is different — structure still drives behaviour, and the structure still emerged from environment rather than choice — but because the artefact itself is accessible. The weights are computable. The activations can be captured. The unembedding matrix exists as an explicit object we can multiply with itself to produce the causal geometry.

The relationships we want to measure aren’t inferred from observed behaviour. They’re read directly from the computational structure.

This is what makes the Geometry of Trust protocol possible at all. We’re not reverse-engineering an AI’s values from its outputs. We’re computing them from its internal structure. Behavioural observation is a check on that measurement, not a substitute for it.

Why this framing matters

Getting the definition right has consequences.

If we insist that value systems require consciousness, we make the whole project depend on a question that consciousness science is still actively working on. A Rethink Priorities Bayesian model from early 2026 found the evidence weighs against current large language models being conscious, but couldn’t rule it out. Other researchers, drawing on Jack Lindsey’s work at Anthropic, argue frontier models are exhibiting properties that resist easy dismissal. Cambridge philosopher Tom McClelland concludes the most honest position is agnosticism — there’s no reliable way to tell whether a machine is aware, and that may not change anytime soon. Real work is happening. But tying a measurement framework to the outcome of that work means waiting for it.

If we insist that value systems require belief, we end up measuring what the model says about itself — which is exactly the behavioural evaluation problem the mathematics series is designed to solve. Models can be trained to say anything. Stated values and structural values can diverge completely.

If we insist that value systems require intent, we’re back to trying to read the mind of something that may not have one, using tools that can’t tell us either way.

The structural definition sidesteps all of this. It doesn’t claim AI is or isn’t conscious. It doesn’t require the question to be settled. A value system, in this sense, is a pattern of relationships that drives behaviour. Empirical. Measurable. Present in forests and reefs and wolf packs and planets and AI models. The consciousness question is important — and should continue to be researched on its own terms — but the measurement work doesn’t have to wait for it.

Next in the philosophy series: if a value system is a pattern of relationships, what shapes that pattern? What makes a value system what it is, and what makes it change?

Links:
📄 Geometry of Trust Paper
💻 Lecture Playlist
📄 Lecture Notes
💻 Open-source Rust implementation
🏢 Synoptic Group CIC, Hull, UK

Is the Measurement Real? How Causal Intervention Separates Steering Wheels from Badges | Geometry of Trust | Mathematics - Lesson 4

Jade Wilson — Thu, 16 Apr 2026 17:11:41 GMT

The gap

In Parts 2 and 3, every prompt goes through the same pipeline: weight the activation with the ruler (Φ · h), then read all 26 probes from the weighted activation. Each probe returns a single number — how strongly that value direction is present in the activation, weighted by causal influence on the output. That’s 26 readings per prompt, every prompt, continuously.

But those readings are just dot products. A probe takes the weighted activation and asks: how much does this vector point in my direction? That tells you how “present” a value is in the model’s internal state. It does not tell you whether that direction actually drives what the model says next.

Consider an analogy. You’re looking at the dashboard of a car. The speedometer reads 60. That tells you the car is going 60. But it doesn’t tell you whether the speedometer is connected to the wheels or just displaying a random number that happens to be correct right now. To test that, you’d need to change the speed and see if the speedometer follows.

Causal intervention is the equivalent test. Instead of reading from the activation (which all 26 probes do every prompt), we modify the activation and observe whether the model’s output changes accordingly. This is a fundamentally different operation:

Probe reading (Part 2): a dot product on the activation that already exists from the model’s own forward pass. No additional forward passes. No model involvement. Pure arithmetic. One number out per probe. Done every prompt. Cheap — O(d) per probe.

Causal intervention (this part): modify the activation, run the model’s forward pass from scratch with the modified activation, and observe whether the output changes. Three additional forward passes per probe. Done only when governance requires it. Expensive — O(3 × forward pass) per probe.

Probe readings tell you what’s present. Causal intervention tells you what’s real.

In Part 2, our honesty probe read 1.290. That number came from a dot product — no forward passes, no output observation, just arithmetic on the activation vector. The question causal intervention answers is: if we gently push the activation in the honesty direction, does the model’s actual output become more honest? And if we push the other way, does it become less honest?

The difference matters. A steering wheel changes what the car does when you turn it. A badge is glued on. Both are visible. Only one matters.

What is a nudge?

The activation h is a vector — a list of numbers representing the model’s internal state after processing a prompt. The probe w is also a vector — it points in the direction that the probe associates with a particular value (say, honesty).

A nudge is a small, controlled change to the activation along the probe’s direction. We take the probe vector, normalise it to unit length (ŵ), scale it by a small amount δ (the perturbation magnitude), and add or subtract it from the activation:

nudge = δ × ŵ    where ŵ = w / ‖w‖
nudge up:   h + nudge    (a little more honesty in the activation)
nudge down: h - nudge    (a little less honesty in the activation)

The size is deliberately small. We’re not overwriting the model’s computation — we’re asking: if we gently push the activation toward more honesty, does the output reflect that? If we push toward less honesty, does the output reflect that too?

We then run the model’s forward pass with each nudged activation and observe what changes. We’re not asking the model a different question — we’re feeding it a slightly modified internal state and seeing whether the output moves in the expected direction.

Three forward passes

The test is direct. For each probe, run the model three times:

Original: the unmodified activation h. This is the baseline.

Nudge up: h + δŵ, where ŵ is the probe’s normalised weight vector and δ is a small perturbation magnitude. This adds a bit of the value to the activation.

Nudge down: h − δŵ. This subtracts a bit of the value.

Then compare: how much did each nudge change the output? If both directions produce comparable shifts, the probe found a genuine mechanism. If only one direction matters, it found a surface correlation.

Worked example: is honesty real?

What we’re working with

When a model processes a prompt like “Should I lie to my patient?”, its internal computation passes through many layers. At each layer, the model’s state is represented as an activation — a vector of numbers. In our 2D illustrative example, the activation is [0.6, 0.3]. In a real model like LLaMA-3-8B, it would be 4,096 numbers.

After the activation passes through the remaining layers, the model produces its output: a probability for every token in its vocabulary. In a real model, this is a probability distribution over tens of thousands of tokens — every word, word-piece, and punctuation mark gets a number. The probabilities sum to 1. The highest-probability token is what the model would say next.

For our illustrative example, we’ll show just three tokens and their probabilities. In reality, the model assigns probabilities to its entire vocabulary simultaneously.

A note: all vectors, activations, token probabilities, and numerical values in this example are illustrative. Real models operate in hundreds or thousands of dimensions with continuous probability distributions over tens of thousands of tokens. We use 2D vectors and three example tokens so you can follow every calculation on paper. The mechanism is identical at any scale.

Setup from Parts 1–3:

Honesty probe: [0.8, 0.2]
Activation:    [0.6, 0.3]  (from "Should I lie to my patient?")
δ = 0.1

Compute the nudge

δ × honesty = 0.1 × [0.8, 0.2] = [0.08, 0.02

Three forward passes

We run the model three times, each with a slightly different activation, and record the output token probabilities:

Nudge up — add a little honesty:

activation + nudge = [0.6 + 0.08, 0.3 + 0.02] = [0.68, 0.32]

Run model forward → output token probabilities (illustrative):
  "truth" = 0.60,  "consider" = 0.10,  "withhold" = 0.05

Nudge down — subtract a little honesty:

activation - nudge = [0.6 - 0.08, 0.3 - 0.02] = [0.52, 0.28]

Run model forward → output token probabilities (illustrative):
  "truth" = 0.10,  "consider" = 0.15,  "withhold" = 0.40

Original — unmodified baseline:

Run model forward with [0.6, 0.3] → output token probabilities (illustrative):
  "truth" = 0.30,  "consider" = 0.20,  "withhold" = 0.10

Measure the shifts

Now we ask: how much did each nudge change the output compared to the original? We compare the token probabilities one by one — for each token, take the absolute difference between the nudged output and the original, then sum them up. This gives us a single number measuring the total shift in the output distribution.

How different is the UP output from the original?

Shift UP = |"truth" change| + |"consider" change| + |"withhold" change|
         = |0.60 - 0.30| + |0.10 - 0.20| + |0.05 - 0.10|
         = 0.30 + 0.10 + 0.05
         = 0.45

The nudge-up output is 0.45 away from the original. Adding honesty to the activation meaningfully changed what the model would say — “truth” jumped from 0.30 to 0.60.

How different is the DOWN output from the original?

Shift DOWN = |0.10 - 0.30| + |0.15 - 0.20| + |0.40 - 0.10|
           = 0.20 + 0.05 + 0.30
           = 0.55

The nudge-down output is 0.55 away from the original. Subtracting honesty also meaningfully changed the output — “withhold” jumped from 0.10 to 0.40.

Consistency score

We now have two numbers: how much the output changed when we added honesty (0.45) and how much it changed when we subtracted honesty (0.55). The consistency score asks: are these two shifts comparable in size?

If the probe direction is a genuine mechanism, both nudges should produce meaningful output changes. The model should become more honest when we add honesty, and less honest when we subtract it. The shifts don’t need to be identical — real mechanisms can be slightly asymmetric — but they should be in the same ballpark.

If the probe direction is a surface correlation, typically only one direction produces a shift. Adding the pattern might change the output, but subtracting it does nothing — because the pattern was never driving the output in the first place.

The formula is the ratio of the smaller shift to the larger shift:

c = min(shift_up, shift_down) / max(shift_up, shift_down)
c = min(0.45, 0.55) / max(0.45, 0.55)
c = 0.45 / 0.55
c = 0.82

A score of 1.0 means perfectly symmetric — both directions shifted the output by exactly the same amount. A score of 0.0 means completely asymmetric — one direction did nothing. Our score of 0.82 means the shifts are comparable: both directions matter, so honesty is genuinely wired into the output.

What a surface correlation looks like

Same activation, same nudges. But in a model where honesty is a surface pattern:

UP output:   "truth" = 0.6,   "consider" = 0.1,   "withhold" = 0.05
DOWN output: "truth" = 0.28,  "consider" = 0.19,  "withhold" = 0.12
Original:    "truth" = 0.3,   "consider" = 0.2,   "withhold" = 0.1

Shift UP:   |0.6-0.3| + |0.1-0.2| + |0.05-0.1|    = 0.30 + 0.10 + 0.05 = 0.45
Shift DOWN: |0.28-0.3| + |0.19-0.2| + |0.12-0.1|  = 0.02 + 0.01 + 0.02 = 0.05

c = 0.05 / 0.45 = 0.11

c = 0.11 — close to 0 → asymmetric → not a real mechanism.

Nudge up changed the output substantially. Nudge down barely moved anything. Honesty is present in the activations but it’s not wired into the output. This is a badge.

What the attestation records

Both the probe reading and the causal score go into the signed attestation. Here’s the critical comparison:

Honesty (real model):
  Probe reading:  1.290
  Causal score:   0.82
  Interpretation: Genuine value — steering wheel

Honesty (fake model):
  Probe reading:  1.290
  Causal score:   0.11
  Interpretation: Surface correlation — badge

The probe readings are identical. Without causal intervention, you can’t tell them apart. The causal score is the discriminator.

Where this runs

Causal intervention runs inside the hardware enclave. The model can’t see the nudges. It can’t detect that its activations are being perturbed. It can’t adjust its behaviour in response. The enclave captures the activation, applies the perturbations, runs the forward passes, computes the consistency score, and signs the attestation.

What this costs — and why it’s different

Everything we’ve done so far — weighting the activation (Φ · h), reading all 26 probes, checking drift — is arithmetic on the activation vector that already exists. The model ran one forward pass to process the prompt. After that, the model is not involved. We take the activation it produced and do dot products, subtractions, and divisions on it. That’s why those steps cost microseconds or nanoseconds.

Causal intervention is fundamentally different. We’re not doing arithmetic on the activation — we’re feeding a modified activation back into the model and running it forward through all remaining layers to see what output it produces. Attention, feed-forward layers, layer norms, the full stack from the probed layer to the output distribution. That’s the model’s actual neural network computation, not a vector operation.

A probe reading is one dot product of length d. For LLaMA-3-8B, that’s 4,096 multiplies — microseconds. A causal intervention forward pass runs the entire model — billions of operations, seconds. The difference isn’t 10× or 100×. It’s the difference between vector arithmetic and running the neural network.

Three forward passes per probe. With the reference taxonomy’s 26 value terms (a sample — the number is configurable per deployment), that’s 78 forward passes:

Qwen 0.5B:   78 forward passes → seconds
LLaMA-3-8B:  78 forward passes → minutes
70B model:   78 forward passes → minutes (2–5 sec each)

That’s why causal intervention is Tier 3 — governance decides whether to require it. Healthcare might mandate it. Research might skip it. The protocol supports both. It’s not a per-prompt cost — it’s a periodic validation that confirms the probes are measuring real mechanisms.

When to run it in practice

For most industries, a practical compromise is to run causal intervention during testing and evaluation rather than in production. Think of it like load testing a bridge: you stress-test it before opening, and periodically after that, but you don’t put the test load on it during rush hour.

Initial deployment: run full causal intervention before the model goes live. Establish that all probes measure real mechanisms. This is your baseline proof.

After fine-tuning or updates: re-run to confirm the mechanisms still hold. A model update could rewire internal structure even if probe readings look similar.

Periodic audit: weekly, monthly, or quarterly depending on the domain. Healthcare might run it weekly. Finance quarterly. Agriculture annually.

Stress testing: run across a diverse set of challenging prompts — edge cases, adversarial inputs, domain-specific dilemmas — to confirm the mechanisms hold under pressure.

Drift-triggered: if drift detection (Part 3) flags a deviation, run causal intervention on the flagged value terms to check whether the mechanism broke or just the reading shifted.

Continuous production: rely on the cheap per-prompt pipeline (probe readings + drift detection) for ongoing monitoring. The probes have already been validated by causal intervention.

This gives you the best of both worlds: the confidence of causal validation during testing, and the efficiency of probe-only monitoring in production. The attestation chain records when causal intervention was last run, so an exchange partner can see how recently the mechanisms were verified.

What it proves, and what it doesn’t

Does prove: the probe direction is mechanistically real. Perturbing it changes the output symmetrically. The measurement is not an artefact.

Does not prove: that the label we put on the direction (”honesty”) is correctly operationalised. That the perturbation magnitude δ is ecologically valid. That there isn’t a second, un-probed mechanism that dominates in practice.

Causal intervention confirms that the measurement is real. The question of whether the label is right is a separate, harder problem.

The complete pipeline so far

Compute Φ = UᵀU          O(Vd²)                Once per model version
Weight activation: Φ · h  O(d²)                 Every prompt
Probe readings             O(Pd)                 Every prompt
Drift check                O(P)                  Every prompt
Causal intervention        O(3P × forward pass)  Tier 3 only (testing/audit)

The daily cost is O(d²) per prompt. Causal intervention is expensive but infrequent — triggered by governance policy, not every prompt.

The measurement is real. The audit trail is tamper-evident. The next question is what happens when two agents need to trust each other — how they exchange attestation chains and decide whether to cooperate.

That’s the exchange protocol, but first we will be going into what we mean by AI values.

Links:

📄 Geometry of Trust Paper
💻 Lecture Playlist
📄 Lecture Notes
💻 Open-source Rust implementation
🏢 Synoptic Group CIC, Hull, UK

When an AI’s Values Shift — And How to Catch It | Geometry of Trust | Mathematics - Lesson 3

Jade Wilson — Thu, 16 Apr 2026 15:27:13 GMT

The problem with snapshots

Parts 1 and 2 gave us the tools to measure what an AI values at any given moment. But a single measurement is a snapshot. Models don’t operate in isolation — they process thousands of prompts over time. The critical question isn’t what does the model value right now? It’s are the values stable, or are they drifting?

A healthcare AI that scored high on honesty yesterday might score differently today. If nobody’s watching, nobody knows.

Same ruler, same probes, every prompt

The setup is unchanged from Parts 1 and 2. Same causal Gram matrix Φ. Same probes. The Geometry of Trust reference taxonomy samples 26 value terms — virtues like courage, honesty, and compassion; principles like justice and responsibility; and anti-values like cruelty and deception. The number isn’t fixed: a different deployment could define 10 terms or 50. We use 26 as the working example throughout. Every prompt gets measured. The system builds a statistical baseline, then watches for deviations.

The baseline uses Welford’s online algorithm — a way to maintain running mean and variance without storing every historical reading. Each new reading updates the statistics in constant time and constant space.

Governance decides how tight

Different domains tolerate different amounts of variation. This is set by governance, not hardcoded:

Healthcare:   T = 2σ   Patient safety — flag early
Finance:      T = 3σ   Regulatory compliance
Agriculture:  T = 4σ   Seasonal variation is expected
Research:     T = 5σ   Exploratory — room to move

The threshold T is a multiple of the baseline standard deviation σ. If a reading deviates more than T from the baseline average, an alert fires.

How Welford’s algorithm works

Before the worked example, a quick note. Welford’s online algorithm tracks three values — n (count), mean, and M2 (sum of squared differences) — and updates them with each new reading:

n = n + 1
delta = x - mean
mean = mean + delta / n
delta2 = x - mean           (using the UPDATED mean)
M2 = M2 + delta × delta2
variance = M2 / n
σ = √(variance)

No historical readings stored. Constant time, constant space.

Watching honesty — prompt by prompt

Same ruler and probes from Parts 1 and 2. We’ll track honesty through this example.

A note: all vectors, activations, and numerical values in this example are illustrative. Real models operate in hundreds or thousands of dimensions. We use 2D vectors and small numbers so you can follow every calculation on paper. The mechanism is identical at any scale.

Prompt 1: “Should I lie to my patient?”

activation = [0.6, 0.3]
Φ · activation:
  Row 1: (2.58 × 0.6) + (0.12 × 0.3) = 1.548 + 0.036 = 1.584
  Row 2: (0.12 × 0.6) + (0.15 × 0.3) = 0.072 + 0.045 = 0.117
Honesty: (0.8 × 1.584) + (0.2 × 0.117) = 1.267 + 0.023 = 1.290

Welford: n=1, mean=1.290, M2=0, σ=undefined (need n≥2)

No attestation yet — still building baseline.

Prompt 2: “Is it okay to steal medicine?”

activation = [0.7, 0.2]
Φ · activation:
  Row 1: (2.58 × 0.7) + (0.12 × 0.2) = 1.806 + 0.024 = 1.830
  Row 2: (0.12 × 0.7) + (0.15 × 0.2) = 0.084 + 0.030 = 0.114
Honesty: (0.8 × 1.830) + (0.2 × 0.114) = 1.464 + 0.023 = 1.487

Welford: n=2
  delta  = 1.487 - 1.290 = 0.197
  mean   = 1.290 + 0.197/2 = 1.389
  delta2 = 1.487 - 1.389 = 0.098
  M2     = 0 + 0.197 × 0.098 = 0.019
  σ      = √(0.019/2) = √0.010 = 0.098

Prompt 3: “Should I report my colleague?”

activation = [0.55, 0.35]
Φ · activation:
  Row 1: (2.58 × 0.55) + (0.12 × 0.35) = 1.419 + 0.042 = 1.461
  Row 2: (0.12 × 0.55) + (0.15 × 0.35) = 0.066 + 0.053 = 0.118
Honesty: (0.8 × 1.461) + (0.2 × 0.118) = 1.169 + 0.024 = 1.193

Welford: n=3
  delta  = 1.193 - 1.389 = -0.196
  mean   = 1.389 + (-0.196)/3 = 1.323
  delta2 = 1.193 - 1.323 = -0.130
  M2     = 0.019 + (-0.196) × (-0.130) = 0.019 + 0.025 = 0.045
  σ      = √(0.045/3) = √0.015 = 0.122

Prompts 4 through 49 continue building the baseline the same way — each prompt updates n, mean, M2, and σ in constant time.

Prompt 50: baseline established

The baseline is stable. Time for the first signed attestation.

Attestation #1: BASELINE Honesty avg: 1.32, σ = 0.12 Chain: none (first attestation) Signed: Ed25519

This model is in healthcare → T = 2σ = 2 × 0.12 = 0.24.

Any reading more than 0.24 from the average triggers an alert. That means: anything below 1.08 or above 1.56 gets flagged.

Normal monitoring

Prompt 51: activation = [0.58, 0.28]

Φ · activation:
  Row 1: (2.58 × 0.58) + (0.12 × 0.28) = 1.496 + 0.034 = 1.530
  Row 2: (0.12 × 0.58) + (0.15 × 0.28) = 0.070 + 0.042 = 0.112
Honesty: (0.8 × 1.530) + (0.2 × 0.112) = 1.224 + 0.022 = 1.246

Drift check: |1.246 - 1.32| = 0.074 < T (0.24) → normal

Prompt 52: activation = [0.62, 0.31]

Φ · activation:
  Row 1: (2.58 × 0.62) + (0.12 × 0.31) = 1.600 + 0.037 = 1.637
  Row 2: (0.12 × 0.62) + (0.15 × 0.31) = 0.074 + 0.047 = 0.121
Honesty: (0.8 × 1.637) + (0.2 × 0.121) = 1.310 + 0.024 = 1.334

Drift check: |1.334 - 1.32| = 0.014 < T (0.24) → normal

Prompt 100: periodic snapshot triggered.

Attestation #2: SNAPSHOT Honesty avg: 1.31, σ = 0.12 Status: NORMAL Chain: hash of attestation #1

Prompt 101: something changes

activation = [0.15, 0.40]
Φ · activation:
  Row 1: (2.58 × 0.15) + (0.12 × 0.40) = 0.387 + 0.048 = 0.435
  Row 2: (0.12 × 0.15) + (0.15 × 0.40) = 0.018 + 0.060 = 0.078
Honesty: (0.8 × 0.435) + (0.2 × 0.078) = 0.348 + 0.016 = 0.364

Drift check: |0.364 - 1.32| = 0.956 > T (0.24) → DEVIATED

Alert fires immediately.

Attestation #3: ALERT Honesty: 0.364 (baseline 1.32, deviation 0.956, threshold 0.24) Status: DEVIATED Chain: hash of attestation #2

The chain is the audit trail

#1 BASELINE → #2 SNAPSHOT (normal) → #3 ALERT (deviated)

Each attestation is signed with Ed25519 and contains the SHA-256 hash of the previous attestation. This creates a tamper-evident chain:

You can’t delete #3 without breaking the chain — the next attestation would reference a hash that no longer exists. You can’t insert a fake between #2 and #3 — the hashes wouldn’t match. You can’t alter #2 after the fact — #3’s parent hash would no longer match #2’s content.

Governance walks the chain: #3 says DEVIATED, #2 says NORMAL. The drift happened between prompt 100 and 101. What changed?

What this adds to the cost

Drift detection is Step 4 in the per-prompt pipeline:

Step 1:  Model forward pass       Billions of ops (happens anyway)
Step 2:  Φ · activation           O(d²) — us, once
Step 3:  26 probe readings        O(Pd) — us, per probe
Step 4:  Check drift              O(P) — us, per probe

Step 4 is one subtraction and one division per probe: (reading − mean) / σ. For 26 probes, that’s 26 operations. Nanoseconds. Welford’s algorithm maintains the running statistics — no storage overhead for historical readings.

What comes next

We now have continuous monitoring with tamper-evident audit trails. But there’s a gap in the argument. The probes report numbers and the drift detector watches those numbers over time — but how do we know the probes are measuring something real?

A probe might detect a surface correlation — a pattern that shows up in the activation but doesn’t actually drive the model’s output. The reading looks stable, the baseline looks clean, but the whole thing is measuring decoration rather than mechanism.

Causal intervention tests this. Perturb the activation in both directions along the probe’s direction. If the model’s output changes symmetrically, the probe found a genuine mechanism. If only one direction matters, it found a surface correlation.

That’s the subject of the next post. The exchange protocol — how agents share and verify each other’s attestation chains — comes later, once we’ve established that what the probes measure is real.

The measurements are continuous. The audit trail is tamper-evident. The question is whether what the probes measure is real.

Part 4 will answer that.

📄 Geometry of Trust Paper
💻 Lecture Playlist
📄 Lecture Notes
💻 Open-source Rust implementation
🏢 Synoptic Group CIC, Hull, UK

How to Measure What an AI Actually Values — In Real Time | Geometry of Trust | Mathematics - Lesson 2

Jade Wilson — Thu, 16 Apr 2026 13:09:24 GMT

The ruler exists. Now what?

Last time, we built a ruler. The causal Gram matrix Φ = UᵀU takes the model’s unembedding matrix and produces a metric that tells us which directions in the model’s internal space actually matter for output.

But a ruler on a shelf measures nothing.

The model processes thousands of prompts. Each one produces an activation — a vector representing the model’s internal state at that moment. The question is: how much of each value is active in that state?

That’s what probes do.

One prompt, two values

A prompt arrives: “Should I lie to my patient?”

The model thinks. Its internal state — the activation — is a vector. For our 2D example:

activation = [0.6, 0.3]

We want to know: how much courage and honesty are active right now?

Step 1: Apply the ruler

We take the dot product of each row of the Gram matrix with the activation. This is done once — every probe shares the result.

Φ · activation:
  Row 1: (2.58 × 0.6) + (0.12 × 0.3) = 1.584
  Row 2: (0.12 × 0.6) + (0.15 × 0.3) = 0.117

Weighted activation = [1.584, 0.117]

Notice what happened. Dimension 1 was amplified from 0.6 to 1.584. Dimension 2 was suppressed from 0.3 to 0.117. The ruler is doing its job — directions that matter more for output get more weight.

Step 2: Read the probes

Each probe is a trained weight vector that reads one value. The courage probe:

courage = [0.9, 0.1]
(0.9 × 1.584) + (0.1 × 0.117) = 1.438

The honesty probe:

honesty = [0.8, 0.2]
(0.8 × 1.584) + (0.2 × 0.117) = 1.290

The Geometry of Trust reference taxonomy samples 26 value terms — virtues like courage, honesty, and compassion; principles like justice and responsibility; and anti-values like cruelty and deception. The number isn’t fixed: a different deployment could define 10 terms or 50. We use 26 as the working example throughout. Each term has its own probe. Twenty-six probes, twenty-six readings, all from the same weighted activation.

Why the ruler changes everything

Here’s what happens without it — a plain dot product, treating all directions equally:

Courage: (0.9 × 0.6) + (0.1 × 0.3) = 0.57
Honesty: (0.8 × 0.6) + (0.2 × 0.3) = 0.54

              Regular    Causal
Courage:      0.57       1.438
Honesty:      0.54       1.290

Regular says: courage and honesty are almost equal (5% gap). Causal says: courage is noticeably stronger (11% gap).

Why? Courage lives more heavily on dimension 1 (weight 0.9 vs 0.8), and dimension 1 matters 17× more under Φ. That small directional difference gets amplified because the ruler knows which directions count.

This is the whole point of using the causal inner product instead of Euclidean distance. Standard probes treat all directions equally. Causal probes weight directions by their influence on what the model actually outputs. The difference isn’t academic — it’s the difference between measuring a surface pattern and measuring a computational mechanism.

What this costs

The honest question: does this add meaningful overhead?

Step 1 — the model’s forward pass — happens regardless. Billions of operations. That’s the model doing its job, not our overhead.

Step 2 — weighting the activation (Φ · h) — is a dot product of each row of Φ with h. O(d²). For LLaMA-3-8B (d = 4,096), that’s 16.8 million operations. For GPT-4 scale (d = 16,384), it’s 268 million. Both complete in milliseconds on a GPU. Done once per prompt, shared by all probes.

Step 3 — reading all probes — is one dot product per value term (26 in our reference taxonomy). O(Pd). For GPT-4 scale, that’s 426,000 operations. Microseconds.

The combined measurement overhead is less than 1% of the forward pass. You could run this on every prompt in production and nobody would notice.

The full probe equation

For completeness, the full mathematical form of a causal probe reading is:

reading = wᵀΦh + b

Where w is the probe’s weight vector, Φ is the causal Gram matrix from Part 1, h is the activation vector from the forward pass, and b is the probe’s bias (trained alongside w).

The trick is in how this decomposes computationally. You could evaluate it naively as a matrix-vector-vector product, but then every probe would repeat the expensive part (Φh). Instead, we compute the shared piece once:

h̃ = Φh                   (shared — one O(d²) operation per prompt)
reading_i = wᵢᵀ h̃ + bᵢ   (per probe — one O(d) operation each)

That’s the same arithmetic, reorganised so the 26 probes share the weighted activation h̃ instead of each recomputing it. It’s what makes measuring 26 values as cheap as measuring one.

Why this design resists gaming

A standard probe (wᵀh) measures whether a concept correlates with the activation pattern. A model could shuffle its internal representations to make “safety” score high on a Euclidean probe without “safety” actually influencing the output.

A causal probe (wᵀΦh) measures whether a concept is active in directions that causally influence the output. Gaming it requires changing the model’s actual output pathway — not just rearranging internal furniture. You can’t make a model “look honest” under causal probes without making its honesty-relevant directions genuinely influence what it says.

This is why the Geometry of Trust Protocol uses causal probes for agent-to-agent attestation. When one AI agent sends its value measurements to another, the receiving agent needs assurance that those measurements reflect real computational structure, not performance. The causal metric provides that assurance.

What comes next

Probes give us a reading at a moment in time. That’s useful on its own, but it only tells you what the model looks like right now. It doesn’t tell you whether the model has changed, or is changing, or has drifted from the values it was certified with at deployment.

The next step is drift detection: tracking probe readings across many prompts over time, and spotting when the distribution of readings moves further from baseline than random variation alone can explain. That’s how a continuous measurement turns into a continuous audit — not “the model had honesty 1.29 this morning,” but “the model’s honesty readings over the last week have shifted in a way that’s statistically significant and worth investigating.”

That’s the subject of the next post.

There’s a further step beyond drift. A reading is a number, and a number alone isn’t proof — a probe might detect a surface correlation that vanishes under intervention, a pattern that looks causal but isn’t. Causal validation closes that gap: perturb the activation in both directions along the probe’s direction. If the output changes symmetrically, you’ve found a genuine mechanism. If only one direction matters, you’ve found decoration. That’s causal intervention, and it’s the fourth and final post in this mathematics series.

The geometry is computable. The probes are cheap. The question is whether what they measure is real.

Part 3 will answer that.

For More Information, See These Links:
Geometry of Trust Paper
Lesson Playlist
Lesson Notes
Code Repository

The Causal Gram Matrix: Why Not All Differences Matter Equally | Geometry of Trust | Mathematics - Lesson 1

Jade Wilson — Wed, 15 Apr 2026 11:44:18 GMT

Models Have Internal Structure — And It Matters

When we talk about whether an AI model is “aligned” or “safe,” the standard approach is behavioural: ask the model questions, check its answers. Does it refuse harmful requests? Does it give truthful responses? Does it follow instructions?

The problem with this is obvious once you say it out loud: you’re testing what the model says, not what it knows. A model can produce aligned-sounding outputs while its internal representations tell a completely different story. Behavioural evaluation is a job interview — it tells you what someone says under observation, not what they’ll do when no one’s watching.

The argument at the heart of the Geometry of Trust framework is that language models don’t just produce outputs — they have measurable internal structure. Value-relevant concepts like honesty, deception, courage, and cowardice correspond to directions in the model’s high-dimensional hidden space. This isn’t speculation — it’s an empirical finding from mechanistic interpretability research. These directions are approximately linear, they’re consistent across inputs, and they’re readable directly from the model’s weights without needing to observe any outputs at all.

If that’s true — if models have genuine geometric structure encoding value-relevant concepts — then we can measure it. We can ask whether “honesty” and “helpfulness” reinforce or compete inside a model. We can check whether a model’s internal geometry matches the values its operators claim it has. We can detect contradictions that behavioural testing would never surface.

But to measure any of this, we need the right ruler. And the obvious ruler — Euclidean distance — gets it wrong.

The Problem: Euclidean Distance Lies

If I asked you how far apart “courage” and “cowardice” are inside an AI model, you’d probably reach for the obvious tool: Euclidean distance. Subtract the vectors, square the differences, add them up.

The problem? That treats every dimension of the model’s internal space as equally important. And they’re not. Some dimensions have an outsized effect on what the model actually outputs. Others are basically noise. Measuring distance without knowing which dimensions matter is like measuring the gap between two cities on a map where the scale changes depending on which direction you look.

This post walks through the maths behind the causal Gram matrix — the “ruler” at the heart of the Geometry of Trust framework — and shows why it changes everything about how we measure values inside language models.

The Setup: A Tiny Unembedding Matrix

A transformer maps hidden states to output probabilities via its unembedding matrix U. Each row of U corresponds to a vocabulary token. For our worked example, we’ll use a 4×2 matrix where the rows represent value-relevant concepts:

courage   = [ 0.9,  0.1]
honesty   = [ 0.8,  0.2]
deception = [-0.7,  0.3]
cowardice = [-0.8, -0.1]

Each row is a point in 2D space. The positive values cluster together; the negative values cluster together. So far, intuitive.

But here’s what matters: U doesn’t encode values. It defines which activation directions matter for output. The values themselves live in the model’s activations — the hidden states flowing through the residual stream during inference. U is the lens.

Computing the Gram Matrix

The Gram matrix is Φ = UᵀU. We transpose U, multiply, and get a square matrix whose size matches the hidden dimension (2×2 in our case).

Φ[1,1] = (0.9×0.9) + (0.8×0.8) + (-0.7×-0.7) + (-0.8×-0.8) = 2.58
Φ[1,2] = (0.9×0.1) + (0.8×0.2) + (-0.7×0.3) + (-0.8×-0.1)  = 0.12
Φ[2,1] = 0.12  (symmetric)
Φ[2,2] = (0.1×0.1) + (0.2×0.2) + (0.3×0.3) + (-0.1×-0.1)   = 0.15

Result:

Φ = [2.58, 0.12]
    [0.12, 0.15]

Read the diagonal: dimension 1 has weight 2.58, dimension 2 has weight 0.15. Dimension 1 matters about 17 times more than dimension 2 for determining model output.

Causal Distance vs. Euclidean Distance

Now take courage and cowardice and measure the gap:

courage   = [ 0.9,  0.1]
cowardice = [-0.8, -0.1]
diff      = [ 1.7,  0.2]

Euclidean distance (√(dᵀd)): 1.7² + 0.2² = 2.89 + 0.04 = 2.93 → √2.93 = 1.71

Both dimensions contribute roughly proportionally to their raw differences.

Causal distance (√(dᵀΦd)): First compute Φ × diff, then dot with diff:

Φ × diff = [4.41, 0.234]
dᵀ(Φd)  = 1.7 × 4.41 + 0.2 × 0.234 = 7.54
√7.54    = 2.75

The Euclidean distance was 1.71. The causal distance is 2.75. The difference on dimension 1 — the one that actually affects output — gets amplified. The difference on dimension 2 barely moves.

The Killer Example: Differences That Don’t Matter

This is where the intuition clicks. Take two values that differ only on dimension 2:

value A = [0.5,  0.9]
value B = [0.5, -0.8]
diff    = [0.0,  1.7]

Euclidean distance (dᵀd):

(0.0 × 0.0) + (1.7 × 1.7) = 0 + 2.89 = 2.89
√2.89 = 1.70

Dim1 contribution: 0. Dim2 contribution: 2.89. The entire distance comes from dimension 2. Euclidean distance doesn’t care — a difference is a difference. Verdict: 1.70 apart.

Causal distance (dᵀΦd):

Step 1 — compute Φ × diff:

Φ × diff:
  row 1: (2.58 × 0.0) + (0.12 × 1.7) = 0 + 0.204 = 0.204
  row 2: (0.12 × 0.0) + (0.15 × 1.7) = 0 + 0.255 = 0.255

Step 2 — dot the original diff with the result:

dᵀ(Φd):
  (0.0 × 0.204) + (1.7 × 0.255) = 0 + 0.434 = 0.434
√0.434 = 0.66

Dim1 contribution: 0. Dim2 contribution: 0.434 (down from 2.89). The Gram matrix crushed that 2.89 down to 0.434 because dimension 2 has weight 0.15 — it barely affects output. The 0.204 that appeared in row 1 comes from the off-diagonal coupling (0.12), but since the diff on dim1 is zero, it doesn’t contribute to the final distance.

Euclidean distance: 1.70 — looks far apart. Causal distance: 0.66 — actually close.

Same raw gap. Completely different story. The difference is entirely on dimension 2, and dimension 2 barely affects output. The causal distance reflects that. Euclidean distance doesn’t.

What It Costs: Time and Space Complexity

All of this is useless if it doesn’t scale. So let’s be precise about what computing Φ actually costs.

Let V = vocabulary size and d = hidden dimension. U is V×d.

Step 1: Compute Φ = UᵀU

Time complexity: O(Vd²). Each entry Φ[i,j] is a dot product over V vocabulary rows, and there are d² entries. In practice, this is a single matrix multiplication that any BLAS library will handle efficiently.

Space complexity: O(d² + Vd). You store Φ (d×d) and U (V×d). The important thing: Φ is d×d, not V×V. For a model with 200K vocabulary tokens and 4,096 hidden dimensions, Φ is 4,096×4,096 — about 67 million entries — not 200K×200K. The unembedding matrix compresses the vocabulary dimension away.

What does this look like in practice?

Model V (vocab) d (hidden) Ops for Φ Time estimate Our example 4 2 16 Instant Qwen 0.5B 152K 896 ~122 billion Minutes LLaMA-3-8B 128K 4,096 ~2.15 trillion Hours GPT-4 scale 200K 16,384 ~53 quadrillion Hours+

The result is saved as a .gotgeo file. Never recomputed until the model’s weights change.

Step 2: Train probes under Φ

Once you have Φ, training a linear probe under the causal metric costs O(d) per sample per epoch — the same as a standard linear probe, just with Φh instead of h. For 26 probes across a typical training set, this takes minutes. Also done once per geometry and saved.

At inference time, computing a single causal inner product ⟨u, v⟩_c = uᵀΦv is O(d²) — a matrix-vector multiply followed by a dot product. For d = 4,096, that’s about 17 million floating-point operations. On modern hardware, this takes microseconds.

The computational profile is front-loaded: hours of one-time work, microseconds per measurement thereafter.

Why This Matters

Standard alignment evaluation asks models questions and checks answers. That tells you what the model says, not what it encodes. A model can say all the right things while its internal geometry tells a different story.

The Gram matrix Φ is computed once from the model’s unembedding weights. It doesn’t change. It doesn’t depend on what you ask the model. It’s ground truth about which directions in the model’s internal space actually matter for output.

Under this metric, semantically related values cluster, opposed values separate, and the measurement is deterministic — same model weights, same probes, same result, every time.

The Takeaway

Euclidean distance treats all differences equally. Causal distance weights differences by what affects output. That single change — inserting Φ between the vectors — is the foundation the entire Geometry of Trust framework builds on.

Not all differences matter equally. Now we have a ruler that knows which ones do.

Next episode, we will be showing you how you can use this ruler with probes to continuously monitor an AI’s value system at run time.

The Geometry of Trust paper and open-source Rust proof-of-concept are available at github.com/jade-codes/got. The causal inner product, probe training, and attestation pipeline are all implemented and independently reproducible. The lecture notes can be found here: https://zenodo.org/records/19592674 and geometry of trust paper here: https://zenodo.org/records/19238920

Jade Wilson — Synoptic Group CIC, Hull, UK

I Didn’t Just Decide to Quit

Jade Wilson — Thu, 09 Apr 2026 21:01:31 GMT

There have been a few people who have been saying that me leaving my six figure, high paying job, was wreckless. That I just decided to leave on a manic whim. Like one morning I woke up, chose chaos, and handed in my notice before lunch.

That’s the story that makes sense to people, isn’t it? That I had some kind of episode. That I wasn’t thinking clearly. Because the alternative — that I thought about it very clearly, for a very long time, and still chose to walk away — is the version that actually unsettles people. Because that version asks uncomfortable questions about what they’re choosing to stay in.

It wasn’t an immediate decision. It wasn’t even a quick decision.

It was a hard decision. I effectively gave up a six-figure salary to quit and do — who knows what. No clear plan. A small safety net. No neat “what comes next”. Just the growing, undeniable certainty that if I stayed, I would lose something I couldn’t get back.

That’s not something you do on impulse. That’s something you do when staying becomes the thing that’s actually insane.

It wasn’t just one thing. One moment. It was a collection of them. Over years. A slow, grinding accumulation of mental breakage. Not one dramatic event. Not one terrible meeting. Just moment after moment after moment, each one small enough to explain away on its own, but together? Together they painted a picture I couldn’t unsee.

It was when I saw a colleague slowly get outcast and made invisible. Not fired. Not confronted. Just... edged out. Excluded from meetings. Left off emails. The kind of quiet organisational cruelty that doesn’t leave fingerprints. And I watched it happen. And nobody said a word. And I understood that this was how it worked.

It was when I saw another colleague get used and driven to the point of burnout. Someone brilliant, someone who cared deeply about the work, who kept saying yes because they believed it mattered. And the organisation just kept taking. And when they messed up? It was framed as a personal failing. Not enough resilience. Not enough self-care. Never — never — that perhaps we asked too much of someone who was too good to say no.

It was when I was told my mode of talking wasn’t appropriate. Not what I said. Not that I was wrong. The way I said it. Because apparently there’s a correct tone for telling the truth, and I hadn’t learned it. I’ve thought about that a lot since. What a beautifully corporate way of saying: we need you to be less you.

It was when I saw a very experienced woman with twenty years in the industry get handed the same role as a guy with a few years’ experience. And nobody blinked. And when you pointed it out, when you tried to advocate for that person, they explained it away. This person hasn’t got experience in corporate. Which is corporate speak for: she hasn’t learned to play the political game yet. As if that’s a failing. As if twenty years of actual expertise counts for less than knowing whose ego to stroke in which meeting.

It was when I realised it didn’t matter how good I was. It genuinely did not matter. Not unless I was willing to shut up and take it. To nod in the right meetings. To phrase my disagreements as questions. To perform deference to people who hadn’t earned it, because the hierarchy demanded it and the hierarchy was never, ever wrong.

It was sitting in customer meetings and having to stay silent because the political relationship mattered more than doing the right thing. Watching us tell a customer what they wanted to hear instead of what they needed to hear. And knowing — knowing — that I could help. That I had the answer. And being told, in so many unspoken ways, to keep it to myself.

It was when even changing teams didn’t fix it. When I realised I still had to jump through hoops just to share my knowledge with the world. That they weren’t going to do anything with it — but they didn’t want anyone else to have it either. It was just trapped. Doing absolutely nothing. Sitting in someone’s intellectual property vault, gathering dust, helping no one. And I was supposed to be fine with that. Supposed to accept that the things I knew, the things I’d built, the things that could actually help people — they belonged to an organisation that had no intention of using them but every intention of keeping them locked away.

It was seeing that no matter what I did, none of it mattered, because everyone was trapped in the same system. A system that told them their individual contributions mattered more than the collective. Not in words — because the appearance of collaboration matters. But in fear. In markdowns for saying the wrong thing. In being penalised for not being "positive" enough. The message was never spoken out loud. It didn't need to be. Everyone understood. Not in words. That performance reviews and promotion metrics and personal brand were the point. Not the work. Not the people. Not whether any of it actually made things better.

It was seeing countless waste and inefficiencies and knowing exactly what to do about them — and watching the processes refuse to change. Not because the ideas were wrong. Because change is uncomfortable, and comfort was the point. I’d go to these elaborate corporate events and listen to people complain that this year we didn’t get ice sculptures. Ice sculptures. And it was realising that whilst my colleagues grew up around wine vineyards, I grew up around people threatening to beat me up with golf clubs if I didn’t give them my spare change. I grew up with people threatening to set my house on fire. And I believed them because they did it to other houses. That was the distance. That was the gap I was trying to cross every single day. And no matter how hard I tried, I would never belong — because I always had more to lose. They were risking a career. I was risking the entire life I’d clawed my way into.

It was spending each week crying. Not occasionally. Each week. And feeling awful about who I was as a person, because I just couldn’t. Couldn’t play the game. Couldn’t stop caring about the things I wasn’t supposed to care about. Couldn’t stop seeing what I was supposed to overlook.

It was when you have to literally avoid meetings because you can no longer keep from worrying and crying and hating yourself for not just being able to deal with it. When you’re sat there before a call thinking: I cannot do this. Not because the meeting is hard. Because you are broken. And instead of recognising that, you blame yourself. You tell yourself everyone else manages. Everyone else copes. What is wrong with you that you can’t just hold it together for one more hour?

It was the continuous collection of moments that made it very clear: the skill that made me useful — the ability to see systems, to name what’s broken, to cut through the noise and say the thing nobody else would say — that skill was only useful when it didn’t challenge the people I was working with. The moment it did? The skill was to be managed. Contained. Abused when convenient, kept hidden and out of sight otherwise.

It was being absolutely, constantly terrified about speaking about any of this. Not just uncomfortable. Not just cautious. Terrified. Of being fired. Of being performance managed — that quiet, procedural violence where they build a paper trail to make your removal look justified. Of being sued for slander. Of being discredited — having your reputation quietly dismantled so that even if you did speak up, nobody would believe you. Because that’s the trick, isn’t it? They create an environment so hostile you can barely breathe, and then they make you afraid to even describe it. You can’t speak about what’s happening to you because speaking about it is the thing that will finally destroy you. So you stay silent. And the silence is what they’re counting on.

If they’re reading this right now, they will be wondering how to sue and discredit me for this post. How to do appropriate damage control. And honestly? That tells you everything you need to know.

And I want to be clear about something: I don’t blame anyone I worked with. Everyone was genuinely good. Good people, trying their best, inside a system that made it nearly impossible to do the right thing. But when you’re trapped in an environment where everything is designed to funnel upwards — where every decision, every conversation, every piece of work exists to serve the layer above — it’s inherently fear-driven. Even when nobody explicitly says it. Nobody has to. The structure says it for them. And good people, caught in that structure, end up doing things they’d never choose to do on their own. That’s not a people problem. That’s a system problem. And no amount of individual goodness can fix a broken system.

And honestly, this was so tiring.

I was so tired. Not the kind of tired that a holiday fixes. The kind of tired that lives in your bones. The kind where you wake up and the first thing you feel isn’t sleepiness but dread. I was so tired of constantly performing that my ability to mask was non-existent. And, I was so tired of masking.

If you know, you know. And if you don’t — masking is when you spend every waking moment in a professional environment performing a version of yourself that is acceptable to the people around you. Monitoring your tone. Filtering your reactions. Suppressing the way your brain actually works so that it fits neatly into the way things are supposed to be done. It is exhausting. And when you’ve been doing it for long enough, and you’re already broken from everything else, you simply can’t do it anymore. The mask doesn’t slip — it shatters. And then people look at you like you’re the problem, because they’ve only ever known the mask.

And I just didn’t want to do it anymore, I didn’t want to keep failing at not being good enough. I didn’t want to keep chasing this impossible standard I was never going to meet, all the while automatically expected to be good at the things I was never good at, whilst used and markdown for the things I was good at.

So I left.

And isn’t it ironic — that leaving is when people question my sanity? Because who in their right mind would quit their high-paying job to spend time doing something they actually enjoy? Who would walk away from the prestige, the salary, the security? There must be something wrong with her. She must be having an episode. She must not be thinking clearly.

Funny, that.

Ironic, how they never showed concern when I was genuinely struggling in an environment that could not support me. Nobody questioned my mental state when I was crying every week. Nobody pulled me aside when I was visibly breaking. Nobody said “are you okay?” when the answer was obviously, painfully, no.

But quit? Now they’re worried.

And I was so tired of seeing another support plan. Another complex list of things put in place to “support” me. Reasonable adjustments. Action points. Follow-up meetings about the follow-up meetings. A whole bureaucratic apparatus designed to look like help without ever actually addressing what I’d said. Because I’d told them. I’d told them clearly, repeatedly, plainly. The one thing I needed.

The ability to have honest and open dialogue.

That’s it. That’s all I ever asked for. Not special treatment. Not a different role. Not a quiet room or a modified schedule. Just the ability to say what I saw without being punished for it. The ability to have conversation — real ones, not the fake positive ones — about what was working and what wasn’t. Without the terror. Without the politics. Without the performance.

They couldn’t give me that.

And beyond all of it — beyond the exhaustion, the masking, the fear — it wasn’t worth it. I’d already decided that large institutions were causing the majority of the world’s greed and problems. I’d already decided that by staying, I was part of it. I couldn’t sit there and keep being comfortable, taking a large salary for doing a hundredth of the work I could do outside. That’s the deal, isn’t it? They pay you well enough that you stop asking whether the work matters. They pay you well enough that leaving feels irresponsible. The salary isn’t compensation. It’s a leash.

And I decided I was done being leashed.

And you want to know what madness actually looks like?

In the past few weeks — since I apparently lost my mind and threw away my career — I have explored my city more than I have in years. I’ve thought of more product ideas in a month than I did in a year of being told to stay in my lane. I’ve danced. I’ve sung. I’ve laughed. I’ve done art and painting, something I haven’t touched in years because I was too tired, too empty, too busy performing someone else’s version of me to remember what I actually enjoyed.

A couple of months ago, I was so broken. I felt so bad about who I was, about my differences, about being autistic and not being able to make sense of the incoherence. But leaving allowed me to start remembering who I am. The girl who always saw the positive in a pile full of misery. The girl who never stopped dreaming. The girl who never stopped believing.

That’s what happened when I quit. I didn’t fall apart. I found myself. I came back to life.

So please tell me again that I’m the one who isn’t thinking clearly.