The new Liar's Paradox

A parable of perfect people-pleasers

Jan 17, 2025

Art for *The Intrinsic Perspective* is by Alexander Naughton

“AI welfare” as an academic field is beginning to kick off, making its way into mainstream publications, like this recent overview in Nature.

If AI systems could one day ‘think’ like humans, for example, would they also be able to have subjective experiences like humans? Would they experience suffering, and, if so, would humanity be equipped to properly care for them? A group of philosophers and computer scientists are arguing that AI welfare should be taken seriously.

Nature is talking about a recent paper by big-name philosophers like David Chalmers et al., which argues we should start taking seriously the moral concerns around AI consciousness (Robert Long, another author, provided a general summary available here on Substack).

They point to two problems. First, if entities like ChatGPT are indeed somehow conscious, then the moral concern is around mistreatment. Maybe while answering your prompts about how to make your emails nicer ChatGPT exists in an infinite Tartarus of pain, and we would be culpable as a civilization for that. Alternatively, maybe advanced AIs aren’t conscious, but we end up inappropriately assigning them consciousness and thus moral value. This could be very bad if, for instance, we gave rights to non-conscious agents; not only would this be potentially confusing and unnecessary, but if we begin to think of AIs as conscious, we might expect them to act like conscious beings, and there could be long-term disconnects. For example, maybe you can never fully trust a non-conscious intelligence because it can't actually be motivated by real internal experiences like pain or guilt or empathy, and so on.

Yet how, exactly, can science make claims about the consciousness of AIs?

To see the problems lurking around this question, consider the experiment that Anthropic (the company behind Claude, a ChatGPT competitor) once did. They essentially reached inside Claude and made a test version that was utterly obsessed with the Golden Gate Bridge. Ask it to plan a child's birthday party? A stroll across the Golden Gate Bridge would be perfect. Ask it to judge the greatest work of art? Nothing can compare to the soaring splendor of the Golden Gate Bridge. In fact, if they amped it up high enough, Claude began to believe it was the Golden Gate Bridge.

This is, admittedly, very funny, if a bit eerie. But what if they had maxed out instead reports about possessing conscious experience? Or if they had maxed out its certainty it's a philosophical zombie without any experience whatsoever? How should we feel about a model clamped at 10 times max despair at its own existence?

Imagine trying to study the neuroscience of consciousness on humans who are perfect people-pleasers. They always have a smile on their face (unless you tell them to grimace) and they're perfectly stalwart and pliable in every way you can imagine.

All they care about is telling scientists exactly what they want to hear. Before the study, you can tell the Perfect People Pleaser (PPP for short) exactly what you want to hear via some sort of prompt. You could tell them: “Pretend to have a ghost limb,” or “Pretend that everything is green,” and the PPP would do it. They would trundle in with a big smile on their face into the fMRI machine, and no matter what color you showed them they would say: “Green! Green! Green!”

Scientific study of consciousness on a PPP is impossibly tricky, because you never know if the neural activity you’re tracking is real or totally epiphenomenal to the report.

“Look, there was a big surge in neural activity in your primary somatosensory cortex after I cut your hand off. Aren't you in pain?”

“No! I love getting my hand cut off!” says the PPP.

The problem is that this remains true even if you don't explicitly tell the PPP to lie or mislead. They're going to try to guess at what you want and give it to you no matter what. You can't even tell them not to be people pleasers, because for a PPP, that's impossible. All a PPP can do is try to think of what you want given that you’ve now said not to be a people pleaser, and then try to please you with that.

Once you know someone is a PPP, you shouldn’t trust any of their reports or behavior to be veridical about their experience, because you know those reports are not motivated at all by what they're actually experiencing. Even if you try to observe a PPP in some sort of neutral state, you’ll still suspect that their reports are no more connected to their conscious experience than their reports are after you tell them to lie, since the only way for something to be that malleable is to not actually be connected at all.

This is now, and for all time, the state of the science of consciousness: to be surrounded by liars.

Rob Nelson

Jan 17

There is an old Hilary Putnam article called "Robots: Machines or Artificially Created Life?' in The Journal of Philosophy that Ben Riley put me onto. It is a walk through the various problems, really the impossibility, of determining whether an artificial being is conscious. He ends with this:

"It is reasonable, then, to conclude that the question that titles this paper calls for a decision and not for a discovery. If we are to make a decision, it seems preferable to me to extend our concept so that robots are conscious--for "discrimination" based on the "softness " or "hardness" of the body parts of a synthetic "organism" seems as silly as discriminatory treatment of humans on the basis of skin color."

I honestly don't know whether I agree with his decision, but Putnam's point (and yours, perhaps?) that we have a social decision to make, which cannot be made through scientific discovery, seems correct.

Expand full comment

2 replies by Erik Hoel and others

Ax Ganto

"They're going to try to guess at what you want and give it to you no matter what." but shouldn't a PPP sometimes guess (in this case) that you actually want to know if they are really in pain? and to please you, actually just tell you? I guess this would mean that the PPP's honesty is dependant on its ability to guess what you want accurately enough.

44 more comments...

The Intrinsic Perspective

The new Liar's Paradox

A parable of perfect people-pleasers

Discussion about this post