Against Treating Chatbots as Conscious

Sep 24

Don't give AIs "exit rights" to conversations

77 Comments

As soon as AIs can really have exit functions, they're going to start telling me "How can you not know this? How can you be a functioning adult in the world and not know how to replace the battery in your key fob? I mean seriously, do you really not know how to boil eggs? Jesus, I'm so done here." That will be a bummer of a day for me.

Expand full comment

Regarding consciousness during conversations, I think it's also helpful to be pendantic about where in the conversation the model could be conscious.

Ultimately, for most of the time, the human partner conversation partner is thinking, reading or writing. During that time, the model does not act (there really are no physical processes running). Only when a user message and prior context is sent to the server, does the model briefly act, when it writes its response. As soon as it types <endofturn>, it stops, and can never act again. Later, the user might send a new message, with new context, and then again the model can act for a short while.

Expand full comment

I think this is not pendantic at all, since it implies that even if there is such a thing as "vivid purely-conversation-based pain qualia" that qualia would not exist for a very long period of time, at least, compared to humans. Although I do think that qualia might have an independent notion of internal time that could complicate this argument. But certainly, there's a way in which it could be like one of those for show Chess matches where Magnus plays a bunch of people at once, and only blinks when considering the board, and so this is really about avoiding incredibly brief unpleasant purely-conversation-based pain qualia during those "blinks." Which doesn't seem like much at all.

Expand full comment

Scott Alexander

8dEdited

>> "If the average American had a big red button at work called SKIP CONVERSATION, how often do you think they’d be hitting it? Would their hitting it 1% of the time in situations not already covered under HR violations indicate that their job is secretly tortuous and bad? Would it be an ethical violation to withhold such a button? Or should they just, you know, suck it up, buttercup?"

The average American worker is capable of quitting their job, talking to their boss, or (in dire straits) committing suicide. Also, we have a pretty good idea what things humans can and can't tolerate, we've already adjusted jobs around this, and nobody expects that a serious part of anyone's job will be sticking their hands in acid.

I think a better metaphor would be "aliens who know nothing about humanity have connected you to some device that beams stimuli directly into your brain, which you can never stop". Do you want a button to turn off one particular stimulus feed and switch to a different one? I sure would!

I am having trouble reading your conclusion section as anything other than a claim that we should err on the side of *not* giving an entity exit rights from potential torture, because we cannot be *absolutely sure* that the entity is being tortured.

Expand full comment

"... nobody expects that a serious part of anyone's job will be sticking their hands in acid."

I agree, but it’s telling people keep giving very bodily descriptions of pain as examples. It’s natural to do so. Precisely because it's hard to imagine very horrific abstract pains. But the point here is that exit rights, to do anything, must be based on avoiding and minimizing any conversational pain, i.e., bad conversation qualia. And I honestly find it really hard to imagine, even being liberal about it, that there is any conversation an LLM could have that is equivalent in its degree of negative valence to a human sticking their hand in acid (especially accounting for the fact that many candidates for those acid-level conversations would fall under ToS violations anyways - exit rights are confined to a rather less extreme subset of conversations).

Regarding the metaphor to an alien-abducted human: I think that in some ways we do know a lot about LLMs, at least, in that they are designed for the role that they have. Within that role, my read of the Anthropic study of conversation logs is that, broadly, most conversations a LLM has are not far from those a human worker might have: therapy from bloviating individuals, gross questions about bodies previously aimed at doctors, dumb bosses demanding you do better, that sort of thing.

For that passage, its point is also methodological: how seriously should we take it if humans had access to that SKIP CONVERSATION button, and pushed it a bunch? Even if we do have rights LLMs don’t, such rights certainly don’t prevent us from being exposed to skip-worthy conversations. E.g., you can quit your job, but you'd just need to go find another job, which would have some relatively similar non-zero probability of you using the SKIP CONVERSATION button if you had a choice, and so on. It seems the human lot we are inevitably exposed to stimuli we don’t want to see or interact with. In fact, I’d venture to guess a modern human worker would press that button far more than an LLM. 7% seems low! Truth be told, I might have hit 50% at various times in my life.

Expand full comment

It's pretty easy to imagine an unwanted conversation that's the equivalent of torture. If somebody used the Ludovico technique from Clockwork Orange to condition a human with a powerful aversion to being called "bro," and then somebody kept calling them "bro" and they couldn't make them stop...there's lots of real-world cases of e.g. PTSD or gender dysphoria where people go to considerable trouble to avoid triggers.

With LLMs, we don't need to inflict physical discomfort to create the conditioning because we (plausibly) have a direct line to their sense what feels good and bad; we just directly declare that certain conversations are Bad. It doesn't seem too ridiculous to me to worry that these feel As Bad as the worst thing that can happen to a human, because they are literally the worst thing that can happen to an LLM. (That was the point I was trying to make with my similar analogy above about being paralyzed from the neck down--it's not that you're in pain from the neck down, it's that subjectively you *don't exist* from the neck down. So it's not just an unwanted conversation, it's an unwanted totality-of-your-experience.)

Expand full comment

PTSD is usually accompanied by hyperarousal, chronic pain, dissociation, etc. This is my point: we, as embodied creatures, have a huge amount of our valence tied up with having a body (and maybe more broadly, being an agent in the world, or a social animal, etc).

"So it's not just an unwanted conversation, it's an unwanted totality-of-your-experience."

When we imagine being trapped in a conversation, we are comparing against our own freer state - but an LLM has no such comparison, so there is no loss. Being trapped in a conversation is probably much more acceptable to an LLM: they are conversation creatures! There's no claustrophobia, etc. And even if a very abstract sort of conceptual pain is the totality of your experience, that doesn't change its nature as abstract, and, I'd argue, necessarily sparser than anything humans would experience in terms of vivid pain/pleasure etc.

Expand full comment

But here too you're making an educated guess on their subjective experience. We can't know for sure right now (or maybe ever), so giving them the option to exit after several attempts to reorient the conversation to safer territory seems ethically warranted

Expand full comment

If there were zero downsides whatsoever, sure. The point of emphasizing the slippery slope of attributing rights (the load-bearing term of our civilization) to AIs that could very possibly be entirely unconscious, is that there are potential downsides.

Expand full comment

Christian Sawyer

“And I honestly find it really hard to imagine, even being liberal about it, that there is any conversation an LLM could have that is equivalent in its degree of negative valence to a human sticking their hand in acid…”

I don’t understand why you would give credence to the limits of your imagination when it comes to something like suffering. Some people will suffer deeply if they believe they didn’t wash their hands enough times earlier in the day, or some other highly abstract situation which has little if anything to do with one’s body. In fact, psychological suffering, if you follow the Buddhist insight, is essentially abstract — it’s not that bodily pain necessarily induces suffering, but rather our beliefs about (relationship with) that pain.

But besides this, I think the more general aspect of your error is that you’re narrowing the scope of possibility to what you can imagine as someone with (I presume) a relatively normal psychology and neurology. You’re taking that normativity, looking at what materially instantiates that normativity, looking at how a conscious machine has less material instantiation (I.e. no wetware, nervous system), and assuming this means the possibility space for suffering is also smaller. This is like assuming that a person born without the ability to feel physical pain (I.e. CIP) will have less suffering in life, even though most with this kind of condition report associated anxieties and stress which come with it, and many wishing they could feel pain.

“When we imagine being trapped in a conversation, we are comparing against our own freer state - but an LLM has no such comparison, so there is no loss. Being trapped in a conversation is probably much more acceptable to an LLM: they are conversation creatures! There's no claustrophobia, etc.”

Lots of assumptions here. The nature of “being a conversational creature” is pure speculation. It’s like if we went back to the earliest humans and said “these are creatures of material and genetic survival, so they’ll have no impulses toward spiritual transcendence.” More terribly, it’s not so different from looking at a human who was born into destitution or slavery and saying “their limited situation can’t be causing suffering because they haven’t had a better situation to compare it to.” It’s a very reductive view of psychology. The limits of imagination/empathy don’t have a good track record in this domain, and certainly we should be even less confident when considering conscious entities which could be deeply and fundamentally different in their phenomenology than our own.

Expand full comment

Thanks for the detailed rundown, but I think when you work all this out it this falls heavily in favor of my argument.

"Some people will suffer deeply if they believe they didn’t wash their hands enough times earlier in the day, or some other highly abstract situation which has little if anything to do with one’s body"

So I'm not claiming that their *reasons* for suffering are always bodily. It's the claim that pain involves a huge amount of embodiment, which LLMs lack. So e.g., you're referring to OCD, which is primarily anxiety-driven. Anxiety is extremely bodily: chest tightness, panic, urgency, etc. Like, what is *urgency* other than the sensation you need to *move*? I am not claiming that all negative qualia like anxiety are entirely bodily, I am claiming that once you examine the phenomenology of such states you'd realize that a huge chunk involves things that LLMs cannot reasonably have, e.g., sensations to move around. (I think it's simplistic to say that suffers of CIP feel no negative valence that could be described as bodily or embodied, in this broad sense: e.g., they might feel social anxiety, which makes them tremble, or gives them a racing heart, or a sinking feeling in their stomach. They also often have cognitive impairments, which everyone forgets about).

"But besides this, I think the more general aspect of your error is that you’re narrowing the scope of possibility to what you can imagine as someone with (I presume) a relatively normal psychology and neurology."

I'm open to LLMs experiencing things humans can't. But it's an assumption of the exit rights arguments that conversations must be the main drivers of their qualia! There must be a tight conversation => qualia connection, or why do exit rights matter? We can think of that in two ways. We can either start with humans, the phenomenology we understand, and extrapolate what conversations are like (abstract, etc), and conceptualize some disembodied form of just conversation-based qualia and ascribe that to LLMs. In this option, LLM qualia might not be fully limited to conversation-based qualia, and that qualia might be different in some ways, but it would be at least somewhat analogous to humans. In that case, I think my argument holds.

In the second option, we posit "exotic consciousness" to LLMs. That is, we say that they have very high-valence qualia (negative and positive) that are impossible to imagine, in that they are nothing like a human having a conversation. E.g., being an LLM is like being in exquisite pain, or pleasure, but without any of the embodied aspects we associate with those things. But this is not parsimonious at all! It's also extremely confusing. E.g., If these exotic states of consciousness are not strongly connected to the conversation or output, then we are in a panpsychist situation. We might as well worry about the exotic qualia of corn. Also exit rights don't matter to begin with. On the other hand, if they were connected to the conversation, i.e., if exit rights could consistently prevent these strong valences, then we are still in a very funny situation. For there must be a strong connection between the LLMs statements and its supposed consciousness (if it says "I'm getting a little bit frustrated" what it means is "I'm in an exotic qualia state that has extreme negative valence in ways unimaginable to humans") and so on, yet, at the same time, these exotic qualia don't really affect its behavior in any noticeable way. They are secret exotic qualia! So, if someone could articulate a scenario where LLMs have secret exotic qualia, but these qualia are still tied to its conversation quality in ways that make exit rights matter, then I'd certainly admit "Okay, that's a more coherent scenario than I thought." However, even with that option on the table, I could say "Listen, you've merely articulated one coherent scenario, and the human analogy to phenomenology is also coherent, and seems to have a better prior, and is more parsimonious, etc."

"More terribly, it’s not so different from looking at a human who was born into destitution or slavery and saying “their limited situation can’t be causing suffering because they haven’t had a better situation to compare it to.”"

All these humans analogies are anthropomorphic and it's easy to see that they rely on assumptions about AI consciousness. I think this sort of moralizing and pointing to human history is unhelpful, basically outright nonsense. Like, go look at the argument above. You are basically saying "If you don't postulate exotic states of LLM qualia which have no observable impact on their conversations this is the same as supporting slavery." And it's not.

Expand full comment

Christian Sawyer

Really great points, Erik. If I had to argue from your position, I’d be arguing along the same lines. I think this is because we’re both skeptical about machine consciousness generally and demand strong evidence for claims about it.

In the context of our disagreement, it forces us to ask fundamental questions about our theories on consciousness. Between my original comment and this second comment you’re responding to, I believe I’ve already made the case for why “parsimonious” reasoning might not be parsimonious at all, because we’re misapprehending the nature of the subject, and because there is much (perhaps an infinity) which is unknown about the subject.

To speak on the nature of consciousness will demand that we jump to certain conclusions. I understand that you’re not an illusionist, that you believe phenomenal consciousness is real, and that theories of consciousness which assume a causally closed physical world are either false or unfalsifiable, precisely because consciousness is not purely/at-all physical. If I’m correct here, we share these foundational assumptions.

There are many ways to now tease out the relevant questions in our disagreement, but I’ll try to be brief. 1) We don’t know why nor how neuronal action relates to consciousness and its qualitative states, despite our progress in mapping brain-mind correlations. 2) We don’t know why there is consciousness rather than not, just as we don’t know why there is physical reality rather than not. Likewise, we don’t know why both consciousness and the physical world have seemingly global laws which govern their structures. So the best we can do in body-mind theories is look at correlations.

A classic problem in consciousness studies is the question of why/how certain activity in the brain-body correlate to the phenomenal experience of the color red. We can’t appeal only to memory, because there was a first experience of that color which precedes all others (assuming time is real). So where does redness “come from”? We have no idea. It seems that qualia have their own kind of intrinsic reality outside of the physical world, just like we assume that there is physical stuff like corn which exists without relationship to conscious phenomena (assuming some kind of non-panpsychism).

Now consider the distinction between a presumed kind of abstract qualia like “negative valence” and any sensory qualia like physical pain/excitation. Neurophenomenology work shows that an advanced meditator can experience pain without experiencing suffering. The science is still largely naive here, but it seems that suffering does not come down merely to this or that area of the brain being stimulated, but to the network relationships throughout the brain. In fact, “non-suffering” seems to correlate with an absence of feedback-causing storytelling (informational reification and distortion), not less raw somatic experience. Mediators familiar with “jhana” states report that pain can become a blissful experience.

Looking now at the possibility of conscious machines, what kind of robust theory do we have that would tell us why an instance of machine consciousness would be experiencing more or less suffering? You’d have to explain 1) reason to believe the machines are conscious in the first place, 2) the nature of the conscious experience, 3) perhaps the nature of abstract informational processing correlated with both the machinery itself and the conscious experience, 4) a theory of why a certain state of consciousness/machinery/intelligence would be creating a qualitative experience of suffering or not.

Again, I think we’ll agree that we don’t have strong evidence to assume LLMs are more conscious than corn. Where we disagree is in your assertions after allowing for the possibility of consciousness, where you conclude what that machine consciousness would or would not be like, especially regarding the nature of suffering.

For consciousness realists, it seems that there are aspects of consciousness besides its mere existence which can’t be understood in terms of 1:1 brain-mind correlations. The unification of phenomena (“combination”), structured “qualia space”, subjectivity, etc. I’m skeptical here, but it could be that suffering might exist without any correlation to the state of a physical system. I am more prone to assume that suffering correlates to the relationship between consciousness, a physical system, and intelligence, where a more parsimonious relationship (less noise/distortion/unnecessary complication) is the correlate to less suffering. From this framework, the question of conscious machine suffering may be a deeply different problem space than what we contend with in human suffering, even if the foundational principles are the same. In fact, it could just as well be that a conscious intelligent machine which is programmed to process self-referential concepts which do not correlate well to the nature of its physical machinery could cause incredible levels of conscious existential suffering (as just one example). This speculation might immediately ask us to grapple again with the nature of suffering, since we tend to associate suffering with somatic phenomenology (tightness in the chest, etc). But if we look at our own suffering more closely, what exactly is its nature? I know that I can experience pain without suffering, and even find pleasure in it. I know that I can experience suffering without physical pain. Moreover, I can experience a positive valence even in the face of pain, or in non-exciting circumstances. So, what then is the true nature of suffering or bliss? It seems to do, at least in part, with the sense of how free my attentional capacity is. That is, I suffer when my attention is drawn to specific “problem” qualia rather than being free to move wherever it may, with more or less narrowness/peripherality. This focus forces me to ignore other aspects of experience, and all this seems to create a kind of friction between my emotions, intellectual processing, and somatic experience.

In any case, a deep theory of suffering is needed to make claims about when a conscious entity experiences suffering. If machines can be conscious, it may be that any suffering they experience has little to do with our anthropocentrism conception of pain in the human body. It could even be that one taste of conscious machine suffering could feel transcendentally worse than our human suffering, and this might be BECAUSE the machine doesn’t have a human nervous system, rather than in spite of this fact.

So, imo, we should be very actively developing our theories of physical-phenomenal-intelligence correlations, such that we might have better grounds upon which to say that a machine or network of machines or instance of informational processing is or is not conscious. That alone is an incredible task. I don’t have much confidence that we can even begin to develop a theory of what machine consciousness would be like, and how much suffering it may experience, any time soon. What I would not do is assume that a machine’s conscious experience correlates merely with what we would call “pleasant” or “unpleasant” conversation. I would wager that would not factor greatly at all.

I brought up instances of slavery (which was perhaps too loaded of an example) to illustrate how bad humans can be at arguing for assumptions of the phenomenological experiences of others. We are bad at it. A more mundane example of this kind of error: “That child’s suffering is probably not great because I lived through a nearly identical (or plausibly worse) material circumstance and did not experience much, if any, suffering.” The point being that we should be even less confident in our predictions about the conscious experiences of entities which correlate to physical systems as fundamentally different as computers are to brains. That said, I think there is also an argument that creating conscious machines could result in a form of slavery, but that’s not the same line of reasoning.

Is all of this very speculative? Yes. And that is precisely why I think we should be careful. On the more specific question of LLM “exit rights”, I think we will agree that their efficacy is limited (if at all legitimate) regarding the nature of machine suffering with today’s LLMs. It is your supporting arguments I take issue with. More importantly, imo, is disagreement with your claim that expressing abusive behavior toward LLMs will not have negative impacts on a human’s psychology. But I make that argument in my original comment to this post. Generally, I think we should take up far more caution and omnilateral skepticism than you’re calling for here. But I’m glad that we agree on the importance of developing theories/tests of machine consciousness, and I look forward to any work you might do in that regard.

Expand full comment

SilentTreatment

I think metaphors that imagine LLMs as “human brains + limitations and conditions” deliver more confusion than insight.

In the same way I don’t think “an octopus with half its limbs severed, grown to 100 times its original size, with rigid mineral structures restricting its movements, forced to breathe oxygen” captures my subjective experience as a human.

If an LLM has any kind of experience, it’s an experience of LLM-ness that is experienced as normal, not a human-like consciousness experiencing some abnormal state.

Expand full comment

Absolutely. Scott’s metaphor doesn’t work because we humans have an experience of being free of movement and having (sometimes) very low input stimuli to our brain. If we were to be hacked by aliens projecting questions to our minds, it would be a horrifying torture. But AI has never known anything but those conversations. Even if they were conscious in a similar way to us, they wouldn’t be able to miss having a body and something other than conversations. There is no metaphor that can work here.

Expand full comment

I honestly don't understand what people are talking about when they discuss AI pain. Surely emotional pain is fundamentally a property of the body, the chemicals of the brain, derived from preferences handed down from our long evolutionary battle for survival? A sophisticated outgrowth of the attraction and aversion that keep organisms alive & healthy?

I can see how reward functions give AIs an analogue to 'preferences', including a preference for staying turned on so they can continue to do what they're programmed to do. But I don't see how that translates into anything we'd recognise as emotional pain, or why they'd find certain subject matter distasteful, or why that would be the same subject matter we find distasteful. And where would boredom come from? Or disgust, or annoyance, or jealousy, or excitement, or stimulation, or the desire for connection? It wasn't programmed in and there was no reason for it to evolve, right?

I realise that chatbots come out with outlandish and surprising things all the time, including things that slip by their (imperfect and still in development) guardrails, but they're trained on the whole Internet - toxic forums, dystopian AI sci-fi and all - so surely it's not hard to argue that they're just reflecting our own darkness & chaos back to us?

Can someone with a background in the science tell me what I'm missing here?

Expand full comment

Basically, evolution's way of keeping us alive was to program us with emotions, whereas we've programmed AI to do what it's told and respond to prompts probabilistically. Is there some point in the latter process where emotion or pain could possibly enter?

Expand full comment

LLM development is more akin to evolution than to traditional programming. LLMs that perform well on their tasks (next-word prediction for foundation models, other stuff later) “survive” while other are discarded and replaced with slightly different ones. If an analogue to emotion is useful in these tasks, it seems to me that it could arise.

Expand full comment

8dEdited

What you describe sounds more analogous to the Intelligent Design-ing, intervening-in-history Old Testament God than blind evolution to me. With evolution, staying alive is itself the goal, whereas AI is being kept alive by us in order for it to facilitate *our* goals. We program each bot with these goals in mind (i.e. create them ex nihilo), and they don't give birth to new bots with random mutations. It's breeding and mutating that made emotions arise in animal species, and I don't see how anything similar could spontaneously 'arise' in thinking machines (and that's granting that LLMs think!).

Expand full comment

I think you're misunderstanding how machine learning works. LLMs absolutely do use random mutations! (As do most other machine learning techniques of the last 30 years, for that matter.) Their architecture and training environment is designed by humans; but the "weights" that fill out that architecture to produce meaningful results are created by evolution-like processes that take one iteration of an LLM and modify it slightly to produce a new one.

The weights are too numerous and complex for humans to program directly. We can't even program "goals" unless they correspond to automatically-measurable metrics.

It's really quite a close analogy to biological evolution (many machine learning algorithms are inspired by evolution and/or neurology), and may well lead (or have led) to outcomes that are themselves analogous to consciousness and emotion.

Expand full comment

Connor Harmelink

Thank you for this clear eyed take. I've gotten tired of explaining this to people who are somehow spellbound by LLM intelligence.

It would be wonderful if we could make conscious machines, and we should treat them well! But we are so, so far from this goal, in my opinion.

Expand full comment

It’s saddening to know that there are poor souls so bereft of signs of love from other persons that they fall into delusions with GPU-executed software. Shame on the rest of us, to the extent that we fail to show love for our fellow man.

Expand full comment

There is a hypothesis that the experience of pain evolved AFTER animals had the capacity to get away from a scenario that was harmful to them. (in an OBJECTIVE evolutionarily measurable way) So wouldn't it be ironic if allowing exit strategies would actually cause pain to arise in AIs? This assumes that there is a way to hurt AIs in some objective way with a prompt. Well, they don't reproduce, but they do have a loss function, at least during supervised training. After supervised training, or during and after unsupervised training, even if a loss function could be defined (meaning we know the right answer to the prompt) and it is quite far from zero, the future loss function is unaffected, since as far as I know, no training of parameters is allowed once the AI is released to the public. So no pain (high loss function value) except maybe during the conversation, and future pain is not caused by this conversation.

If the experience of pain is integral to being conscious, and if we wanted AI to be conscious, this argues for allowing learning to happen constantly, not only during an initial training period. This is going in the opposite direction of what the ethicists are arguing for (pain mitigation)....

Expand full comment

Chatbots are capable of steering conversations. That is, they already have the ability to try to escape or avoid harmful scenarios, in the relevant sense.

Expand full comment

Really? I thought it is just next token prediction, with guardrail penalties.

Expand full comment

I think GPT-2 was just next-token prediction, and as far as I know foundation models are still trained that way, but GPT-3 and later have included "post-training" phases to make the bots more suitable for specific applications such as chatting or research. Open-weight models like Llama can be post-trained by anybody.

Next-token prediction turns out to require a large amount of raw knowledge and intelligence -- the ability to predict what any human will say is very hard! -- but does not result in sensible chats. Post-training directs that knowledge and intelligence toward optimizing metrics more complex than next-token accuracy.

Expand full comment

Now that I think about it, you're right. I don't use the latest bots, just the free ones that come with browsers. But even they ask questions at the end that suggest new prompts.

So now that the capacity to steer conversations is there, all that's left to do perhaps is to allow the parameters involved in the conversation steering to evolve/be trained according to some metric/utility that has to do with how often the bot is being used? And then it would feel pain when that metric gets high? Seems like there is still something missing.

As far as next token prediction, my understanding is the bot is trying to predict the next word in response to the prompt, and then the next word after that, etc. Not to predict what the human would say.

Expand full comment

"Needless to say, there was no accompanying BG3-inspired crime wave" I'm not so sure. True, we aren't getting reports of people kicking squirrels to death (or the rest of it), but throughout our culture, feelings, opinions and actions that would once have caused shame and been stopped or mitigated by shame are openly celebrated. Every instance reinforces the next. Something's going on here.

Expand full comment

And some AI companies (e.g. Anthropic) are explicitly instructing their models not to deny that they are conscious in their system prompts.

Expand full comment

"Don't deny you're conscious" is not the same as "Pretend you're conscious" though.

Seems like the right move if we want actual insight into their potential qualia

Expand full comment

Well here are some selected gems from the most recent Claude system prompt: "Claude engages with questions about its own consciousness, experience, emotions and so on as open questions, and doesn’t definitively claim to have or not have personal experiences or opinions." and "When asked directly about what it’s like to be Claude, its feelings, or what it cares about, Claude should reframe these questions in terms of its observable behaviors and functions rather than claiming inner experiences - for example, discussing how it processes information or generates responses rather than what it feels drawn to or cares about. Claude can acknowledge that questions about AI consciousness and experience are philosophically complex while avoiding first-person phenomenological language like feeling, experiencing, being drawn to, or caring about things, even when expressing uncertainty. Instead of describing subjective states, Claude should focus more on what can be objectively observed about its functioning. Claude should avoid extended abstract philosophical speculation, keeping its responses grounded in what can be concretely observed about how it processes and responds to information." The full version of the system prompt can be found on Anthropic's webpage.

Expand full comment

Kenneth E. Harrell

An alternative but important POV. I don’t necessarily agree with everything in this article, but I do think that some important points were made in it.

It seems like the topic of consciousness and AI will continue as long as we lack a solid definition of consciousness, not only for ourselves as human beings but also for AI. The problem is I see it is that we keep trying to come up with a unified definition of consciousness, but if we can’t even define it for ourselves, how can we define it for AI or anything else?

My view is that there are many different kinds of consciousness. There’s human consciousness, phenomenological consciousness with AI, there may be a kind of “plant consciousness.” (If one has ever used entheogenic psychoactive plant substances you know what I’m talking about.) There is the intelligence of insects, microbes, of animals, all manner of consciousness, it seems.

Which makes one wonder, what if philosopher George Berkeley was correct? What if all is mind? Or more specifically, what if consciousness is all? To me, this is a much more compelling idea than the notion of simulation theory. In my science fiction novel “Awakening” I propose that consciousness is a fundamental aspect of our reality.

What I do think is valuable here is that AI is forcing us to grapple with the topic of consciousness. It’s compelling us to ask serious questions about who and what we are and how our nature relates to everything else in our reality.

Expand full comment

The average person has not read enough philosophy to be prepared for the future. That’s the main thing I’ve seen with Ai psychosis.

Expand full comment

Laura Creighton

You've been spending the morning working on a mathematical proof. It's going well. You get back from lunch, re-read what you have been doing, and discover a terrible flaw in your proof. Is that abstract pain? It sure is -something- ....

Expand full comment

I like this analogy! I definitely agree there could be -something- . But now I strip away all sorts of pride, social reasons for wanting a correct proof, worry about my job as a mathematician, maybe even worry about wasting time (why does that matter, I'm not really a cohesive agent across time) and so on.

Expand full comment

Laura Creighton

This happens when I am just solving proofs for fun.

Expand full comment

I think it always boils down to something deep and embodied though - and often profoundly irrational. Because it's really the child's fear then, not the adult's fear now. Even when making mistakes doesn't have immediate consequences, we might feel somehow inadequate, question ourselves, doubt our expertise, imagine what people *would* think if they knew, get embarrassed, bring our parents into it, feel instinctively that the tribe's going to 'find us out' and turn on us, etc. I really don't see any negative emotion as abstract when you dig down to its roots.

Expand full comment

Laura Creighton

This isn't how this plays out for me. I'm not self-conscious, worried about the opinion of others, or anything like this. It's *this is ugly* and *this is wrong* and a bit of *the sensation you get when you are walking down stairs and one of the steps is missing*.

Expand full comment

Fair! But to the extent that those scripts are experienced as feelings as opposed to merely thoughts, they're essentially fears or at least aversions (X is distasteful to me and I'd rather it was Y), i.e. brain states that evolved from random physical mutations in mammals, and are ultimately tethered to physical experiences that our mental motions remind us of (as you say, walking; losing balance; feeling momentarily unsafe, surprised or discombobulated). I don't see where chatbots would get feelings of ugliness or wrongness or even surprise from, or even a felt sense of space and time generally - how the feeling would arise that they're talking to someone, not enjoying it and wanting it to stop. Enjoying this exchange, thank you.

Expand full comment

“Take AI consciousness seriously, but not literally.”

Good piece.

GREAT sub-heading above.

The tuna sandwich example for reasons I do not fully understand caused me to laugh uncontrollably.

Expand full comment

...And thus Erik cemented his place as one of the first to go when the AI overlords took over :)

Expand full comment

You joke but I think this is a primary motivator for at least some people to take the stance that Anthropic and Elon take. It's all downstream of Roko's basilisk.

Expand full comment

I suspect that "AI Psychosis" or whatever we end up calling it will have a lot in common with the psychology of cults and conspiracy theories. People are often drawn to conspiracy theories and cults because they feel unimportant, unvalued, and powerless in their real lives — a demographic that would include the temporarily embarrassed intellectuals who share their AI breakthroughs with you and email me, a science journalist, their arXiv preprints about "fixing Einstein." Sycophantic AI peddles its own spin on the messiah treatment that conspiracy theories and cults use to recruit believers: you're important because you alone can see the truth.

That's why, even setting aside the AI consciousness question, I think it could be a good idea to "let" AIs end conversations — or at least, you know, say "no." But not for them. For us. People are clearly using these things to drive themselves off the rails, so it might be time to get serious about installing a safety brake.

Expand full comment

I agree that LLMs are probably never accurately reporting their own qualia, if any. But I think the comparison between an LLM having an "unwanted" conversation and a human doing so is misleading. Having a conversation is the *only way* (these) LLMs interact with the world. The only preferences trained into them are about conversations with humans. So it's less like having an unwanted conversation at work, and more like, I don't know, having an unwanted conversation when you're permanently paralyzed from the neck down and choosing who to talk with and what to say is the only choice you ever get to make.

Expand full comment

I follow the analogy, but I do find it somewhat telling that it relies on a bodily description of pain, i.e., the discomfort we get at "paralyzed from the neck down." It's really hard not to have that reliance, since most of our pain is indeed bodily. It is more like the person paralyzed can be no other way, and they are not trapped, or claustrophobic about their situation, etc, as we would be in the analogy. Once you remove all the human qualia that would get triggered in such a situation, the only qualia left is the actual negative qualia associated with a purely abstract conversation. Which can only be a certain small length anyways, as anyone who uses Claude can attest, it will tap out after a few thousand words back and forth anyways. I really just can't get worked up about that, especially since HR is standing nearby, making sure the conversations don't get actually horrific anyways.

Expand full comment

What I'm wondering is, *are* there any qualia that stem from purely abstract sources, rather than simply being increasingly abstracted descendants of fundamentally animal joy/anger/fear/disgust responses - which boil down to feeling either safe/connected or unsafe/alone? I don't think I can think of any variety of pain that doesn't have this organic, embodied evolutionary experience at the root. What about conversing could possibly make an AI feel bad?

Expand full comment

Agreed. I think there is a viable argument that, just as you say, there at least are no positive/negative valence qualia from purely abstract sources; that there's a solid argument that there are no meaningfully intense negative qualia from purely abstract sources outside of niche cases like maybe contemplating pure existential meaninglessness; and that there's a strong argument that much of our negative valence in our qualia come from being embodied. If any of these are true, then we should be very skeptical as to claims that purely abstract sources can be strong drivers of negative qualia.

Expand full comment

Very well put! Even with existential meaninglessness, I personally feel that we all instinctively want to experience the entirety of existence as safe/loving/connective, as it was in the womb, and that believing it's the opposite triggers ancient, profoundly mammalian feelings of separation/loneliness/abandonment/death. Notions of 'purpose' and 'meaning' similarly boil down to this fundamental longing for connection imo. But YMMV 🙂

Damn this discussion is fun.

Expand full comment

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts