The Word That Doesn’t Travel: Why “Safety” in AI Means Nothing Without a Domain - Geometry of Trust | Governance - Lesson 1
This is the first post in the Geometry of Trust governance series. This series asks what small, specialised models and tight domain-specific measurement implies for governance.
Same word, different directions
The philosophy series closed with a simple recommendation: use the smallest model that covers your domain, measure it tightly, monitor it cheaply, audit it clearly. That only holds if “your domain” is a well-defined thing.
The governance series opens here, with the word that looks like it should travel between domains but doesn’t: safety.
We use the word as if it pointed to something singular. As if an AI that’s “safe” were safe in some general, domain-independent sense. It isn’t. Safety is not one direction in the value space. It’s many different directions, and they don’t align.
The same word in four domains
Take four domains where AI is actively being deployed and safety is a live concern:
Agriculture: Crop damage. Pesticide compliance. Soil contamination.
Watershed runoff. Worker exposure during application.
Transport: Collision avoidance. Pedestrian detection. Braking
distance. Lane discipline. Response to novel obstacles.
Healthcare: Patient harm. Misdiagnosis. Drug interactions.
Missed contraindications. Confidentiality breach.
Finance: Market manipulation. Fiduciary breach. Fraud.
Insider information. Misrepresentation of risk.
Four columns that all fit under the same word. Different harms. Different thresholds. Different regulators. Different legal standards of care. Different failure modes. Different sensors, data, and evidence patterns. Different people getting hurt if the model gets it wrong.
Different direction in the value space
The mathematics series gave us a way to talk about this precisely. Each value term — including “safety” — corresponds to a direction in the model’s internal geometry. The probe that reads it is a vector pointing in that direction. The reading is a dot product of that probe with the activation.
If “safety” were a universal concept, the probe would point in the same direction across domains. It doesn’t.
The probe that reads agricultural safety is not the probe that reads patient safety. They measure different things in the same way the word “bank” means different things on a river and on a high street.
What this means operationally:
A model trained to score high on agricultural safety has a probe that fires on pesticide compliance, soil handling, and runoff patterns. A model trained to score high on patient safety has a probe that fires on drug interactions, dosage bounds, and escalation behaviour.
Swap them over and both readings become meaningless. The agricultural probe fires on irrelevant patterns in patient data. The patient probe fires on irrelevant patterns in agricultural data.
Worse: the numerical score from the wrong-domain probe can look fine. A patient-safety probe might return a placid reading on a model that’s about to recommend something agriculturally reckless. The reading is not wrong in the arithmetic sense. It’s just answering a different question.
This is why the Part 4 argument about small specialised models matters for governance. A 500M-parameter drug checker has a safety probe that was trained, validated, and deployed against patient-safety harms in a specific clinical context. Its reading means something because the domain is defined. A frontier general model has a safety probe that has to average across many domains at once, and the average doesn’t correspond to any real-world safety regime.
Certifying the word certifies nothing
The trap in governance is certifying the word rather than the thing the word points to.
A certificate that says “Model X is safe” looks like it means something. But safe for what? Under whose standard? Measured against which harms? If the certificate doesn’t answer those questions, it has certified a word, not a property. And any two such certificates that use the same word can end up describing completely different things.
The problem is not hypothetical. A model certified as “safe” by a general-purpose evaluator and a model certified as “safe” by a clinical regulator are not the same kind of object. The first was tested against a generic harm benchmark. The second was tested against specific failure modes — adverse drug events, missed contraindications, confidentiality breaches. A buyer reading both certificates sees the same adjective. A deployment decision made on that adjective treats two very different things as interchangeable.
What real certification has to carry
Any certification of AI safety worth taking seriously has to name four things:
Domain. What context the model is being certified for. “General use” is not a domain.
Harms. The specific harms the certification claims to guard against, named in terms the domain’s regulator already uses.
Probes / measurements. Which value directions were measured, how they were calibrated, and against what ground truth.
Thresholds. What reading counts as acceptable in this domain, and how that threshold was set.
A certificate missing any of these four is certifying the word “safety” without saying anything that a buyer, deployer, or regulator can act on.
What this implies for governance
Regulators are already domain-specific. Certification should be too. Health regulators don’t certify tractors. Transport regulators don’t certify pharmaceuticals. The domain structure already exists in human-scale regulation. AI certification that tries to sit above the domain layer is pretending to an authority it doesn’t have — and in doing so, it makes life harder for the domain regulators who actually understand the harms. Each domain regulator should be the one certifying AI safety for their domain. The Geometry of Trust measurements are the technical substrate that makes their job tractable, not a substitute for their judgement.
A model can be safe in one domain and unsafe in another. This follows directly from the argument above but is worth stating explicitly: the same model, with the same weights, deployed in the same way, can have an acceptable safety geometry in one domain and an unacceptable one in another. Nothing about the model changes. What changes is which harms are in scope. A general-purpose model that’s perfectly adequate for customer service can be dangerous as a drug checker, because the probes that catch customer-service harms don’t catch pharmaceutical ones. A certificate from one domain doesn’t transfer.
Cross-domain deployments need cross-domain certification. There are domains that genuinely require generality — police, military, emergency services, government policy. These can’t be split into single-domain models. Their governance cost is real and it starts here. A police AI that reasons across crime patterns, traffic, mental health, and legal compliance needs certification against all four domains’ safety standards, not one average. That means four regulators, four sets of probes, four threshold regimes, and a governance process that coordinates them rather than replacing them with a single signoff.
The governance move
Stop certifying “AI safety” as a generic property. Start certifying safety-for-a-domain, against the regulator, the behaviours, the harms, the probes, and the thresholds of that domain. For cross-domain deployments, stack domain certifications rather than collapsing them into a single adjective.
Treat “safe” in governance documents the way a lawyer treats undefined terms: never acceptable without a definition immediately attached.
The word doesn’t travel. The certifications shouldn’t either.
Links:
📄 Geometry of Trust Paper
💻 Lecture Playlist
📄 Lecture Notes
💻 Open-source Rust implementation
🏢 Synoptic Group CIC, Hull, UK

