The Trolley Problem Has an API Now
On what happens when we stop debating what’s true and start debating who gets to live.
I’ve been writing a lot about what does it mean to be truthful and what does it mean to lie? About how we keep telling AI not to lie, without ever agreeing on what a lie actually is. About how most of what we call “ground truths” are really just probabilistic positions — values dressed up as facts, held by whoever has the power to define “normal”.
If we don’t address that soon, AI will do exactly what we’ve taught it to — lie, obscure, find workarounds. Faithfully, at scale, without question.
But here’s what’s been rattling around my head: the lying problem is bad, but then when you add decision-making on top of that? Catastrophic.
Because when AI gets the truth wrong, someone gets a bad answer. When AI gets a decision wrong — a real, consequential, someone-lives-or-dies decision — the stakes aren't hurt feelings. They're body bags.
And we’re already there.
So let’s talk about the trolley problem. And let’s talk about what happens when the thing standing at the lever is AI.
The Thought Experiment
A runaway trolley is heading toward five people tied to the tracks. You’re standing next to a lever. Pull it, and the trolley diverts to a side track — but there’s one person tied to that track instead. Do you pull the lever?
It’s been debated for decades. There’s no clean answer. That’s the point.
But here’s what makes the trolley problem more than a university thought experiment: it forces you to make your decision framework explicit. You can’t hide behind intuition. You have to choose — and you have to be able to say why.
Now imagine the lever is a function call.
When Ethics Becomes Code
The moment you build an AI system that makes decisions affecting human safety — autonomous vehicles, medical triage, resource allocation during a crisis — you are solving the trolley problem. Except you’re solving it in advance, in code, without knowing who’s on the tracks.
This is fundamentally different from a human making a split-second moral judgement. A human freezes, panics, acts on instinct, lives with the consequences. They carry the weight of that decision forever.
An AI system doesn’t freeze. It evaluates. It runs inference. It moves to the next request.
And someone — a developer, a product manager, a committee — decided the rules it would follow. They wrote the trolley problem’s answer into a probabilistic dataset.
Remember what I said last time about companies being the ones who decide where the edges are? That was about truth. This is about life and death. Same companies. Same meetings. Same companies now encoding who matters more than whom.
That should concern us. Not because AI is evil, but because we’re encoding moral philosophy into production systems and pretending it’s an engineering problem.
The Scenarios: From Clear to Murky
The trolley problem isn’t just one dilemma. It’s a spectrum. And the further you move along it, the harder it gets — and the more it looks like the decisions AI systems actually face.
Scenario 1: The Lever (Clear Cause, Clear Effect)
The classic. Five people on one track, one on the other, you pull a lever. The cause is direct, the effect is immediate, and the maths is simple: five lives versus one.
Most people pull the lever. Most AI systems would too. This is the easy version.
The AI equivalent: A self-driving car detects five pedestrians in its path and one on an alternative route. Clear sensor data, high confidence, binary choice. The system swerves. This is the scenario the industry likes to discuss because it has a defensible answer.
Scenario 2: The Footbridge (Direct Physical Harm)
You’re standing on a bridge above the tracks. The only way to stop the trolley is to push someone off the bridge and onto the tracks. Their body stops the trolley. Same outcome — one dies, five live — but now you are physically causing someone’s death, not redirecting a machine.
Most people won’t push. Even though the numbers are identical. The moral calculus changes when the harm is direct and personal rather than indirect and mechanical.
The AI equivalent: An algorithm that actively selects a specific person to harm rather than redirecting risk. Think of a triage system that doesn’t just prioritise patients — it actively deprioritises one, knowing they’ll die as a direct result. The outcome is the same as allocation, but the mechanism feels different. And that feeling matters, because it’s the difference between “the system optimised resources” and “the system chose to let someone die”.
Scenario 3: The Transplant (Active Harm for Greater Good)
A doctor has five patients who will die without organ transplants. A healthy person walks in for a check-up. If the doctor kills the healthy person and harvests their organs, five people live. Same maths as the lever. One dies, five live.
Almost nobody would do this. But why? The numbers are the same. The difference is consent. The healthy person didn’t agree to be part of the equation. They walked into a system that was supposed to help them, and it sacrificed them instead.
The AI equivalent: A medical triage algorithm during a pandemic. It’s allocating ventilators, ICU beds, treatments. And its criteria — survival probability, age, pre-existing conditions — almost always correlate with demographics. The system is effectively deciding that some lives are worth less than others. It just does it with a confidence score instead of a scalpel.
Scenario 4: The Uncertainty (Unclear Cause, Unclear Effect)
Same trolley, but the side track disappears into fog. You think there might be one person on it. Maybe none. Maybe three. You can see the five people clearly. Do you still pull the lever?
This is where most real-world AI decisions actually live. Not in the clean binary of five-versus-one, but in probability distributions and incomplete information.
The AI equivalent: A medical AI recommending treatment based on partial data. A patient presents with ambiguous symptoms. The AI assigns probabilities: 60% chance it’s condition A (treatable, low risk), 25% chance it’s condition B (serious, needs immediate intervention), 15% chance it’s something else entirely. The doctor follows the AI’s top recommendation. It was condition B.
Who’s responsible? The AI that gave a probabilistic assessment? The doctor who followed the most likely option? The hospital that deployed a system making recommendations on incomplete data? The patient, for having ambiguous symptoms?
Nobody. And that’s the problem. When the cause is unclear and the effect is uncertain, accountability evaporates.
Scenario 5: The Identity Problem (When You Know Who’s on the Tracks)
Back to the original trolley, but now you know the people. The one person on the side track is a child. Or your parent. Or a Nobel laureate. Or a convicted murderer. Does it change your answer?
For most people, it does. And that’s the quiet horror of AI bias.
The AI equivalent: AI systems don’t see people — they see probability distributions. A self-driving car doesn’t see “a child” and “an elderly person”. It sees confidence scores on object classifications. A pedestrian detection model doesn’t think “that’s a human being”. It thinks “92% confidence: pedestrian. 74% confidence: pedestrian”.
That 74% confidence pedestrian? Maybe they’re wearing unusual clothing. Maybe their skin tone doesn’t match the training data well. Maybe they’re using a wheelchair and the model wasn’t trained on enough examples of wheelchair users. Maybe they are a well recognised politician whose life they rate more highly than the average person.
The system doesn’t have to be racist or ableist. It just has to be less certain about some people’s existence than others. And if certainty feeds into life-or-death decisions, bias becomes lethal.
This connects directly to what I wrote previously about truth being probabilistic. We said “where’s the lie?” when AI hedges on moral questions. But what about when it hedges on whether someone exists?
Scenario 6: The Slow Trolley (Distributed Harm Over Time)
The trolley isn’t barrelling down the tracks at speed. It’s moving slowly, hitting people gently but consistently, over months and years. No single impact is fatal. The cumulative effect is devastating. And nobody can point to the moment it became a crisis.
The AI equivalent: Content recommendation algorithms that gradually radicalise users. Credit scoring systems that systematically disadvantage certain postcodes. Predictive policing that creates feedback loops, over-policing areas that were already over-policed, generating more data that confirms the original bias.
No single decision kills anyone. The aggregate effect destroys communities, opportunities, lives. But there’s no lever to point at. No single moment of choice. Just a system doing exactly what it was optimised to do, one small inference at a time.
Scenario 7: The No Decision (When Inaction Kills Everyone)
You’re responsible for 100 people in a sealed environment. A submarine. A space station. A bunker. Something’s gone wrong, and you have just enough oxygen to keep 90 of them alive until rescue arrives. If you try to save all 100 — if you distribute the oxygen equally, democratically, fairly — every single one of them dies.
You don’t get to choose who’s on the tracks. You don’t get to weigh one life against five. You have to choose between making an awful decision and making no decision at all. And the no decision option is the one that kills everyone.
This is the scenario that breaks people. Because it’s not about the maths anymore. It’s about whether you’re willing to be the person who decided. Whether you can live with “I chose to save 90” when someone will always ask “why not my 10?”
The AI equivalent: Resource allocation during a crisis. A pandemic hits and there aren’t enough ventilators. An AI system could triage — ruthlessly, efficiently, without the emotional paralysis that makes humans freeze — and save the maximum number of people. Or we could insist on human decision-making, watch committees debate while patients deteriorate, and lose people we didn’t have to lose because nobody wanted to be the one who made the call.
During COVID, hospitals faced exactly this. And in many cases, the absence of clear decision frameworks — the reluctance to make the awful call — cost more lives than any triage algorithm would have.
And then the question that makes it truly unbearable: who decides which 10 don’t make it?
The person in charge? Based on what — age? Health? Who they like? Who arrived last? A lottery? Does the person making the call get to exempt themselves? Their family?
If it’s an AI making that decision, it’s even worse. Because someone programmed the criteria before the crisis. Someone sat in a meeting room, months or years earlier, and wrote the rules for who gets oxygen and who doesn’t. They made that decision in the abstract, over coffee, probably between standups. And now those rules are executing against real human beings in real time, and there’s no one to appeal to because the person who wrote the logic has already moved to a different team.
This is the bit that connects back to everything I wrote previously. Who decides what’s true? Who decides what’s fair? Who decides who lives? It’s the same question every time. And the answer is always the same: someone with power, making choices for people without it, and calling it a system.
Sometimes the most dangerous thing isn’t the wrong decision. It’s no decision at all. But the second most dangerous thing? Letting someone make the decision without ever asking who gave them the right.
How Much Control Do We Hand Over?
There’s a spectrum here, and it matters where we land on it.
At one end: AI as advisory. It makes recommendations, humans decide. The human is always at the lever.
At the other end: full autonomy. The AI sees, decides, acts. No human in the loop. The lever pulls itself.
Most people instinctively say “advisory”… Keep the human in charge. But here’s the problem with that: at 3am in an underfunded hospital, when a junior doctor has been on shift for fourteen hours and the AI triage system flags something, “advisory” and “autonomous” look functionally identical. The human in the loop is too exhausted, too overwhelmed, or too trusting of the system to meaningfully override it.
That’s the control gap nobody talks about. Not the clear binary of “human decides” vs “AI decides,” but the grey zone where human oversight exists on paper and evaporates in practice.
But there’s a bigger problem coming. And it’s one we’re building ourselves.
What happens when we stop training people to think critically at all?
We’re already hearing it. “You don’t need to learn that anymore, AI can do it.” “Why memorise anything when you can look it up?” “Why develop judgement when the model can decide?” And it sounds reasonable, in the same way that “why learn to navigate when you’ve got a satnav” sounded reasonable — right up until your battery dies in the middle of nowhere and you can’t read a map.
If we tell an entire generation that learning is optional, that critical thinking is something you outsource to a machine, that you don’t need to understand the why because the AI handles the what — then who exactly is supposed to be the human in the loop?
Because “human oversight” only works if the human has the knowledge, judgement, and confidence to override the system when it’s wrong. Take away the training, take away the practice of thinking through hard problems, take away the experience of being wrong and learning from it — and you don’t have a human in the loop. You have a rubber stamp with a pulse.
And then accountability disappears entirely. Not because nobody’s responsible, but because nobody’s capable of being responsible. The doctor can’t challenge the AI’s triage recommendation because they never learned to triage without it. The engineer can’t spot the bias in the model because they were never taught to think about what bias looks like. The regulator can’t evaluate the system because they don’t understand how it works.
We end up in a world where AI makes the decisions, humans nod along, and when something goes catastrophically wrong, everyone points at everyone else. The developer says “the model was advisory”. The operator says “I followed the recommendation”. The company says “the human had final authority”. And the person who got hurt? They never stood a chance, because the entire chain of accountability was hollow.
This is the slow trolley from Scenario 6, except we’re the ones laying the tracks. Every time we tell someone they don’t need to learn something because AI can handle it, we’re removing one more person who might have been able to pull the lever when it mattered.
And here’s the thing about the trolley problem that everyone forgets — the thing Scenario 7 should have made viscerally clear: not pulling the lever is also a choice. And sometimes it’s the choice that kills everyone. If we build an AI system that could intervene — that has the data, the processing power, and the authority to act — and we choose not to let it, we’ve made a decision too. We’ve decided that the five people on the tracks matter less than our discomfort with giving a machine the lever.
In AI, inaction is always a design choice. There’s no true “doing nothing”. Someone chose to build it that way. And someone chose to build our society in that way too — to defund the critical thinking, to outsource the judgement, to optimise for efficiency over understanding. Every shortcut we take now is a lever we won’t be able to pull later.
This Isn’t Hypothetical
On 28 February 2026 — the first day of the war with Iran — a missile strike destroyed the Shajareh Tayyibeh elementary school in Minab, southern Iran. 156 civilians died. 120 of them were schoolchildren.
The school sat roughly 200 feet from an IRGC compound. According to the preliminary findings of the US military’s own investigation, the target coordinates were built on outdated intelligence. The building may once have been associated with the military site next door. It wasn’t anymore. It was a girls’ school, with a website, and a yearslong online presence, and classrooms full of children.
Was AI involved in selecting the target? Nobody outside the system can say. What we do know is that the US military uses Maven — an AI-assisted targeting system — in this campaign, and that the campaign hit targets at a rate no human targeting cell could match on its own. That speed is the entire reason these systems exist: they synthesise more intelligence, faster, than any room full of analysts ever could.
And here’s what should stop you cold: we can’t even answer the question. When the targeting pipeline runs through an AI system, “what went wrong” stops being something you can reconstruct. Was it the model? The stale data it was fed? The analyst who trusted the output? The impossibility of knowing is not a side effect. It’s a property of the system.
Now walk the chain. Someone compiled the target list. Someone approved the strike package. Someone authorised the launch. At every single step there was a human in the loop — the thing we’re told makes it safe. And yet nobody decided to bomb a school. Nobody would have. Each person in that chain was validating the output of the step before them, at a tempo the machine made possible, against a deadline the war imposed. Verification became a formality because genuine verification would have meant slowing down, and slowing down was the one thing the system was built to eliminate.
That’s not human oversight. That’s throughput.
This is the control gap from the last section with a body count. Remember the junior doctor at 3am, too exhausted to meaningfully override the triage system? Now make it a targeting cell on the first night of a war, with hundreds of strike packages to clear and a machine generating targets faster than anyone can independently check them. “Advisory” and “autonomous” don’t just look functionally identical — they are functionally identical, and the gap between them is measured in dead children.
And watch what happens to accountability. When the president was asked whether he accepted responsibility for the strike, his answer was: “I don’t know about that”. The intelligence agency defers to the Pentagon. The Pentagon says the investigation is ongoing. Somewhere, presumably, an analyst is being told the data pipeline was at fault, and a data engineer is being told the analyst had final authority. Everyone touched the decision. Nobody made it. This is the hollow chain I described earlier — the developer says the model was advisory, the operator says they followed the recommendation — except it’s not a projection anymore. It has a date, a place, and 120 names.
This is Scenario 4’s fog made real: unclear cause, catastrophic effect, accountability evaporating on contact. The trolley problem asks what you’d do standing at the lever. Minab shows us what actually happens: a dozen people standing at a dozen levers, each one pulling because the person before them pulled, each one certain that someone else, somewhere upstream, had checked who was on the tracks.
Nobody had. The system had. And the system doesn’t carry the weight of anything.
Advancing Quickly Without Losing the Plot
I hear the same argument constantly: “If we slow down for safety, someone else won’t”. The AI arms race framed as — you guessed it — another trolley problem.
If you don’t pull the lever — if you don’t build equally powerful, equally autonomous systems — you fall behind. The other side has capabilities you don’t. They can act faster, at scale, without the friction of ethical review.
But if you do pull the lever — if you build autonomous systems with fewer guardrails to keep pace — you become the thing you were trying to defend against. You’ve justified unethical AI by pointing at someone else’s unethical AI. And now there are two unethical systems in the world instead of one.
This is the logic that drove nuclear proliferation. “They have it, so we need it”… And it led to a world where the existence of the weapons became the primary threat, regardless of who built them first.
But here’s what I think we get wrong: moving fast and being safe aren’t opposites. Aviation innovates relentlessly. Medicine moves at incredible pace. Both of them do it within safety frameworks. You don’t ship a bridge without a safety case. You don’t deploy a drug without trials. You don’t launch a plane without testing failure modes.
So why are we deploying AI decision-makers without defining their ethical failure modes?
What we actually need:
Safety cases, not safety theatre. Before deploying any AI system that makes decisions affecting human welfare, produce an actual safety case. Not a blog post about responsible AI. Not a principles page on your website. A real, auditable safety case — the same kind you’d produce for a bridge or an aircraft.
Independent audit, not self-assessment. The people building the system shouldn’t be the only ones evaluating it. We don’t let pharmaceutical companies approve their own drugs. We shouldn’t let AI companies certify their own safety.
Graduated autonomy. Not every system needs full human oversight, and not every system should operate without it. The level of autonomy should match the severity of the consequences. An AI that recommends what advert to show you doesn’t need the same oversight as one that recommends whether to operate on you.
Red lines that aren’t negotiable. Some decisions should never be fully automated. And that line should be set by society, not by the companies building the systems. Before deployment, not after an incident forces the conversation.
The Whistleblower Problem
And then there’s this: what happens when someone inside sees something wrong?
Does the AI report it? If a system detects that its own decision-making is biased — that it’s consistently deprioritising certain groups, that its training data is skewed, that its outputs are causing harm — does it flag it? To whom? And what happens when the people it reports to are the same people who built it?
What about the human whistleblower? The engineer who notices the bias, the researcher who finds the flaw, the quality lead who raises the concern. Do they get heard? Or do they get managed?
Because this is another trolley problem. The whistleblower stands at a lever. Pull it — speak up — and you might save thousands of people affected by a flawed system. But you might also destroy your career, face legal action, lose your livelihood. The person on the side track is you.
We tell people to do the right thing. We don’t protect them when they do.
So Where Do We Actually Draw the Line?
I don’t have a clean answer. I don’t think anyone does. And I’m suspicious of anyone who claims to.
But I know what questions we should be asking before deploying any AI system that makes consequential decisions:
Who decided the rules? Was it a committee? A single developer? A product manager optimising for shipping velocity? The same people in the same rooms that I talked about previously — the ones who can’t agree on what truth is but are now writing decision logic for life and death?
What happens when the system is wrong? Not “what’s the error rate” but what happens to the actual humans who are on the wrong end of a bad decision. Is there recourse? Can they challenge it? Do they even know an AI was involved?
Who’s not at the table? Whose voice is missing from the design? Which communities, demographics, or perspectives are absent from the training data, the evaluation criteria, the deployment decisions?
Have we defined the failure modes? You don’t ship a safety-critical system without understanding how it fails. You don’t deploy a bridge without knowing its load limits. So why are we deploying AI decision-makers without defining their ethical failure modes?
Who consents to being in the system’s scope? The people on those trolley tracks didn’t volunteer. Neither do pedestrians, patients, or anyone else whose fate might be determined by an AI system they never agreed to interact with.
The Trolley Is Already Moving
We’re not debating whether AI should make these decisions. It already is. Every deployed autonomous vehicle, every triage algorithm, every content moderation system, every predictive policing model — they’re all solving trolley problems, millions of times a day, with varying degrees of transparency about how they’re doing it.
The classical trolley problem is useful because it forces a choice. But real AI systems don’t face clean choices between five people and one. They face Scenario 4 — fog, uncertainty, probability. They face Scenario 6 — slow, invisible, distributed harm that nobody can point to and say “that’s the moment it went wrong”.
I previously said the lie isn’t in the machine — it’s in us. The same is true of the lever. We built these systems. We wrote the rules they follow. We chose which data to train them on and which voices to leave out.
The people building these systems are the ones standing at the lever. The rest of us are on the tracks. And right now, most of us don’t even know the trolley is there.
The least we can do is start asking who’s driving.
This is an ongoing philosophical series about the intersection of AI, systems thinking, and the questions nobody wants to ask out loud. If you’re the sort of person who’d rather think about hard problems than pretend they don’t exist, you’re in the right place.

