← back to chapters
12 / 54
\part_02Better Together
chapter 12

A very tidy wrong answer

Bored fox with chin on paw

Richard Schwartzstein teaches at Harvard Medical School and he likes showing students how easy it is to reach for the wrong explanation. He took a 45-year-old man with obesity and shortness of breath and fed the case into ChatGPT. It came back confident: congestive heart failure.

Safe answer. Plausible. Wrong. He was obese and short of breath. Complications of obesity can create the same signs without heart failure.

That’s the point, the label can be plausible and still be the wrong cause. ChatGPT saw a pattern and matched it to the nearest familiar answer. Obese, short of breath, it reached for heart failure because that’s a common story in that neighbourhood. Humans do it too, by the way. We just do it with less confidence and worse formatting.

Mechanistic reasoning is different. It asks what is physically happening here, in this specific body, right now. What is the chain that links the symptom to the cause? In Schwartzstein’s example, the chain is obesity changing how the body moves air and shifts fluid around, not a failing heart pump. Same outward signs, different reason.

I’m not having a go at AI. It did what it’s built to do. It produced an answer that sounds like the kind of thing that belongs in a hospital note. The danger is that sounding right gets mistaken for being right, and when the work is serious, that mistake costs.

That’s where the human value sits when you work with these tools. Not in getting them to write more, or faster, or smoother. In judgment. In being able to look at a confident answer and ask, “What would have to be true for this to be right?” and, “What else could explain the same signs?” That little pause matters now because the tools are getting better at removing it for you. They give you a clean story and a neat next step, and your brain relaxes.

Marketing has the same trap. Feed AI a brief and you get something that looks like strategy. Structurally sound. Pattern-matched from thousands of campaigns. But it can’t tell you whether this strategy will work for this brand, with this history, against these competitors, given what happened last quarter. That’s mechanistic reasoning. That’s judgment.

It’s the difference between “our awareness is down so we need a bigger top-of-funnel push” and “our page got slower, the checkout got fiddlier, and we’re bleeding people before the message even has a chance.” Same symptom, different cause, different fix.

There’s another problem sitting underneath all of this; you can’t judge the output if you don’t know what good looks like.

Alex Yuen at Harvard’s design school waits until halfway through the semester before bringing AI into student workflow. A trick I’m going to nick for my classes. His reasoning is simple: if you haven’t done the work yourself, you can’t judge whether AI output is any good.

Skip the foundation and everything AI produces looks plausible. You lose the ability to tell the difference between a strong answer and a well-written one. This is why “humans in the loop” can turn into a comforting lie. The loop is only useful if the human can tell the difference between a good answer and a good sounding answer. Otherwise you’re just nodding at the screen, grateful it saved you time, while it quietly picks the story for you.

Judgment is built from behaviours. Putting in the work. Knowing what good is. Poking past the first answer. Building real things so you see where reality pushes back. Staying hopeful enough to use the tool, and sceptical enough not to believe it.

Part 3 is about these behaviours.

Do the reps

#homechapters.md/get_the_book
Ask Isa