You Don’t Need the Amazon: Small Models, Tight Ecosystems | Geometry of Trust | Philosophy - Lesson 4
This is the fourth post in the Geometry of Trust philosophy series. This post asks the practical follow-up to the previous ones: how big does a model need to be?
A forest doesn’t have to be the Amazon
A forest doesn’t need to be the Amazon to be healthy. A small woodland has its own ecosystem — fewer species, tighter relationships, easier to monitor, easier to protect. It runs on the same ecological principles as a rainforest, just at a smaller scale.
Nobody walks into a twenty-acre English wood and complains that it isn’t a tropical megabiome. The wood is what it is, it does its job, and its smaller scale makes it tractable in ways the Amazon isn’t.
The same logic applies to AI models. A small, specialised model isn’t a failed attempt to be a big general one. It’s a different kind of thing, with its own advantages. This post walks through what those advantages are, when they apply, and the small number of cases where going big is genuinely the right call.
Different models, different positions
Picture the value space from Part 2: a large human-values circle inside an even larger space of all possible value positions. Now populate it with small circles, each one a specialised deployed AI sitting in the part of the space that its domain needs. The specialisation shows up in where each circle sits.
Domain Representative models Values emphasised
Reasoning / safety Claude, GPT-4, DeepSeek-R1 Rules, ethics, logical consistency
Visual / multimodal Gemini, GPT-4o, Midjourney Images, video, spatial understanding
Music / audio Suno, AIVA, MiniMax Music Melody, rhythm, emotional tone
Medical Med-PaLM, BioGPT, AlphaFold Clinical accuracy, patient safety
Code Cursor, GitHub Copilot, Claude Code Technical precision, correctness
Each sits in a different part of the value space. They overlap where their domains overlap — a baseline of harm avoidance and truthfulness common to almost all deployed AI — and diverge where their domains diverge. A code model doesn’t need to care about melodic resolution. A music model doesn’t need to care about off-by-one errors. Building each one to care about both is paying for capacity you don’t use.
Why small and specific wins
A hospital doesn’t need a model that writes poetry. It needs a model that checks drug interactions. Stack that comparison up along the dimensions that actually matter for deployment and the difference is large. Take a 500M-parameter drug checker against a 70B-parameter general model:
Dimension 500M drug checker 70B general model
Hardware Single GPU, laptop, Multiple GPUs, data centre,
runs locally at hospital cloud dependency
Computing Φ 896 dims, minutes 8,192 dims, hours, trillions of ops
Monitoring 26 probes on 896 dims, 26 probes on 8,192 dims,
fast, cheap roughly 10× slower
Governance One domain, one auditor, Many domains. Who audits?
clear thresholds For what? Against what?
Cost Cheaper to run, measure, Expensive at every stage
monitor, audit
Verifiability You know what it values You know it does a lot, but can't
and can prove it verify any of it tightly
The small specialised model is cheaper and more verifiable.
Those two things normally trade off against each other. In this comparison they point the same way. That’s rare enough to be worth stopping on.
The reason both advantages point the same way is that specialisation and small size compound. A smaller model has fewer dimensions to measure, fewer places for value structure to hide, fewer regions that need to be audited. A specialised model only has to cover one domain — which means its thresholds, its governance, and its failure modes are all narrower. Each of those things makes the other easier.
When you genuinely need big
There are cases where a big general model is the right answer, and it’s important to be honest about them. The test is whether the domain itself is general — whether a single decision genuinely needs to integrate across multiple areas that can’t be cleanly split.
Police. A single police decision might touch crime pattern analysis, traffic routing, mental health crisis response, and legal compliance — all at once, all in the space of a few minutes. Splitting those into four specialised models loses the cross-domain reasoning that matters. The mental health context changes the legal analysis which changes the tactical response.
Military. Logistics, intelligence, strategy, and the ethics of engagement all have to be held in the same reasoning process. A logistics-only model can’t sanity-check a strategic decision against ethical constraints. A strategy-only model can’t factor in what’s logistically feasible.
Emergency services. A dispatcher or triage system might need to reason about medical, fire, structural, and hazmat concerns simultaneously. By the time you split the call across four models, the triage window is gone.
Government policy. Economic, social, environmental, and legal concerns are all knotted together in any real policy question. A pure economic model can give you a recommendation that’s politically impossible. A pure legal model can give you a recommendation that ignores second-order economic effects.
These domains genuinely need general capability. The same generality makes governance harder:
Who audits a police AI — the health regulator, the transport authority, the justice department, or all three?
Which drift threshold applies when the model is reasoning about medical issues vs tactical ones?
What counts as compliance when the domain crosses four regulators’ jurisdictions?
Generality isn’t free. It shifts the hard work from the model to the governance around it.
The principle
The rule that falls out of all this is straightforward:
Use the smallest model that covers your domain.
Measure it tightly — the smaller and more specialised it is, the more precisely you can measure its value geometry.
Monitor it cheaply — the smaller it is, the cheaper continuous probe readings and drift detection become.
Audit it clearly — one domain means one regulator, one set of thresholds, one failure mode to reason about.
Only go big when the job genuinely requires integration across domains that can’t be cleanly split.
This isn’t a statement of policy. It’s a description of the trade-offs that fall out of the mathematics. The probes, drift detection, and causal intervention from the mathematics series all scale with model dimensionality. The governance framework coming next all scales with the number of regulatory domains the model touches. Smaller and more specialised means both are easier.
What this implies for deployment
If the small-and-specialised principle is right, some current patterns in AI deployment look less defensible.
Using a frontier general model for a specialised task is often backwards. Hospitals running a 70B-parameter general-purpose assistant for drug interaction checking are paying full generality cost for a task that a 500M-parameter specialised model could handle more accurately, more cheaply, and with more verifiable safety properties.
Evaluating all models against the same broad benchmarks misses the point. A specialised medical model should be evaluated on its medical value geometry, not on general reasoning benchmarks. A code model should be evaluated on its code value geometry. Benchmarks that treat all models as aspiring to the same generality penalise specialisation even when specialisation is what the deployment needs.
Governance frameworks that assume one model per organisation are miscalibrated. A hospital might run many small specialised models — one for drug interactions, one for triage, one for imaging, one for scheduling — each audited separately against its own domain. That’s a different governance model from “the hospital’s AI.” Each small circle in the value space is its own thing to audit.
This closes the philosophy series. Part 1 defined a value system structurally. Part 2 showed that there isn’t one “AI system” but many, scattered across the space. Part 3 traced what actually shapes each one. Part 4 argued that small and specialised is usually the right default. Next: governance — who decides, who audits, who holds the keys, and how the measurements inform policy.
📄 Geometry of Trust Paper
💻 Lecture Playlist
📄 Lecture Notes
💻 Open-source Rust implementation
🏢 Synoptic Group CIC, Hull, UK

