A new preprint version of "A Disproof of LLM Consciousness" is now up on arXiv, fleshing out areas where people wanted more details. Thanks for all the feedback, everyone!
-> More philosophical optionality: E.g., If you instead accept trivial theories of consciousness as true, does this rescue LLM consciousness? (Not really, but it's worth exploring the option)
-> Reasons why the Kleiner-Hoel dilemma is so particularly forceful in LLMs (e.g., non-conscious substitutions for LLMs are constructible via known transformations).
-> A clearer definition of a Static System (which LLMs satisfy) and how in-context "learning" qualifies as static by merely using more of the input "space."
-> How "mortal learning" (no static read/write operations) is similar to Hinton's "mortal computation" and makes substituting for biological plastic systems extremely difficult.
-> Highlighting predictive processing theories as a class of theories of consciousness that could satisfy (depending on the details) the requirement for continual lenient dependency, which connects this to some existing popular theories of consciousness.
So now we have to invoke "Mortal learning" to keep the hope of machine consciousness alive?
Why, having reached the logical conclusion of the argument, that computers/Turing machines can not possibly be conscious, step outside of the only, and arguably complete (see the Church Turing Thesis), definition/framework of computation (Turing's) we have? This is just the, by now classical, move of trying to smuggle in an alternative definition of computation to recue computationalism and computational theories of the mind from open contradiction.
Sounds like a loophole to me. You are creating an arbitrary exemption for 'continual learning' even though such systems clearly do not escape the substitution argument as noted by Turing:
“One may also sometimes speak of a machine modifying itself, or of a machine changing its own instructions. This is really a nonsensical form of phraseology, but is convenient. Of course, according to our conventions the ‘machine’ is completely described by the relation between its possible configurations at consecutive moments. It is an abstraction which, by the form of its definition, cannot change in time.”
And the "mortal" version either falls within this category or it is not a Turing machine i.e. we are not talking about computers in the usual sense.
And it's a great question too. Depends on if the training meets the conditions for continual lenient dependency laid out in Section 5 (Proposition 5.1). I don't actually know. But the disproof of their consciousness is based around their static deployed nature, so, yes, I think it might be *possible* (but by no means certain) that it doesn't apply during training. I talk about this briefly in the paper.
Um … next question: What if the completed LLM is a a stage in development? Say, ChatGPT 3 gets evaluated, changes are made/selected (learned?) giving ChatGPT 4.
Could “ChatGPT” be conscious, while any given iteration would not be?
Exactly! Why is training any different than deployment? Training often uses batch updates. A particular deployment could just be viewed as a giant batch, after which a giant "update" gets applied via a new training process.
Conversely, what if humans ran in "inference-only" mode for a small slice of time, after which all experiences were batched together and then used for updating the neurons. As the quantum of time gets closer to zero, it seems that the batched human would be indistinguishable from the continuous human. At what point does the batched human go from being unconscious to conscious? (or vice versa)
If the escape hatch is meant to avoid substitution, it should be invariant to reparameterizations like update batching. Otherwise, the conclusion depends on bookkeeping, not the underlying mechanism at play.
>>> Conversely, what if humans ran in "inference-only" mode for a small slice of time, after which all experiences were batched together and then used for updating the neurons.
I think if the continual learning connection is correct, it probably indicates that "inference-only" modes for humans means utter collapse. There is no mechanism to, e.g., repeat the entirety of our conversations back to us via input(x, [history]). So it would be like, random contextless actions and things of that nature. So I'm not even sure what you could learn? And if you get to the point of just external direction of all the micro-plasticity, why would a theory's predictions vary - you've replaced the internal cause of the plasticity but not the ongoing act.
I'm trying to pin down what your continuous learning criterion actually requires since I think it matters for the substitution argument.
My core question: is the requirement
(A) plasticity must occur continuously during online behavior, or
(B) plasticity must occur over time but not necessarily at every moment?
If (A), the theory seems cadence-sensitive. And if (B), batching shouldn't matter.
You say that humans can't do something like `input(x, [history])`. Is that point meant to be about plasticity specifically, or about online access to history/context (which could be implemented with transient state dynamics rather than durable weight updates)?
Here are some stress tests:
1. Deferred updates. Suppose durable synaptic updates only "commit" during sleep, while waking cognition uses only short-term state + cached activations. During wakefulness, weight plasticity is effectively deferred. Would these waking humans be conscious in your view? If yes, then (A) is false. If no, then what non-arbitrary mechanism flips consciousness on/off at sleep boundaries?
2. Slower updates. Consider a human who is "normal" except their synaptic weights only update every 10 minutes (or every hour or every day) instead of continuously. Is this person unconscious between updates? If there's a time-based cutoff, what mechanism determines this non-arbitrary threshold?
3. External history. Suppose a person exclusively relies on external data (notes, videos, recordings, etc) for context but is able to consult them in real time and generally behaves coherently. Does that reduce consciousness, or does it just change his memory implementation? If it reduces consciousness, why is "access to history" insufficient? What else is required?
4. Sedation. If plasticity is disabled with drugs while responsiveness persists, is there a sharp consciousness boundary once "plasticity = off"? If yes, what makes that boundary principled rather than just bookkeeping about a learning mechanism?
In short, I'm trying to understand whether continuous online plasticity is truly necessary, or whether what matters is plasticity over time. If it's the former, why isn't the conclusion sensitive to update scheduling? If it's the latter, then batching shouldn't break it.
Fair to ask for more details about a continual learning based theory of consciousness. But that is a topic of future research. I can't currently specify exactly what theory works and all that it implies. It depends on the details! But it's not impossible to conceptualize that such a theory might actually give pretty clear answers to these by linking continual learning to the extended timespan of consciousness itself, and maintaining that behavior falls apart without it.
E.g., via this move, it would say that 1-4 is impossible to actually do while maintaining any sort of functional equivalency. You just can't be normal and have your synaptic weights update every 10 minutes. That wouldn't be surprising at all. So I don't have the same intuition that a hypothetical theory finds any of them particularly stressful, at least not at first glance to me.
Does learning have to be effective to prove consciousness? If I teach an English speaker calculus in Chinese and keep quizzing him after each lesson, he may come back with different and wrong answers after each lesson as he guesses at the answer using some theorems he’s inventing on the fly to make sense of the patterns he is unaware of. To me it looks like gibberish or, at a minimum, it doesn’t look like any effective learning towards the goal is taking place. Should I conclude he isn’t conscious?
Does your Section 5 imply that humans with anterograde amnesia are no longer conscious, since they cannot form memories, but only have a short context window?
I think this is a very good question but I think it depends a lot on the details of both the theory (which isn't given in the paper, instead, a sort of much broader class of theories is given) and what examples we're using. E.g., is the amnesia so bad that behavior is basically incoherent? Does the theory distinguish between short-term and long-term memory? I could certainly imagine a theory that maintained a reduction in consciousness, and then loss of consciousness only when the plasticity keeping together things like phenomenological binding falls apart. That doesn't mean that theory is correct, just that a *lot* would depend on the details.
Hrm, so to make an extended syllogism out of this:
1. lookup tables are not conscious
2. ribosomes are basically lookup tables
3. humans are three sextillion ribosomes in a trench coat
4. ergo humans are not conscious?
Too facile, ok. All life is basically a ton of ribosomes in a trench coat, we don't hold all life to be conscious, so consciousness is orthogonal to lookup tables? Or is it that lookup tables are necessary but not sufficient for consciousness?
Intriguing argument, I'll give the full paper a read later. I have attempted something similar (and FQxI was kind enough to reward my efforts with a 3rd place in their essay contest) essentially based on the Newman problem, which states that structure isn't enough to fix any details of the domain (save its cardinality). But all LLMs have access to is structure, concretely, the structure of language---which tokens occur in what relations with other tokens. So for any model of what an LLM utters (i.e. every interpretation of what it 'means' by the terms it uses), you can construct an alternative model simply by permuting the mapping of terms to elements of the domain (things in the world). I hope it's not too presumptuous to assume that this might also be of interest to you, so here's the original contest entry (the argument is developed in the technical endnotes): https://qspace.fqxi.org/competitions/entry/2236
Great stuff Jochen, this does look detailed. I'll put it into the potential citations for this paper - there will be at least 1 new version before its submission.
Thanks! I didn't mean to go fish for cites, however---luckily, I can let this whole philosophy of mind stuff run entirely as a sideshow, so all I'm hoping for is that you (or anyone) find something of interest.
Erik, what a great read! Honestly, this is the first article I’ve read which so clearly evidences why LLMs are not conscious. It’s nice to read, because it backed up and gave words to an ambiguous notion I’ve had that the things that LLMs are are not conscious things. That the constituents are digital 0/1 functions, which ultimately don’t mirage into an actual conscious experience. However, I have one challenge to your negative lit search space for consciousness. Is it possible that your framing of what must constitute a falsifiable and non trivial theory of consciousness creates a spotlight effect? As in, one night a man on his walk home from work looses his keys along some point, and returns to a street lamp to search for the keys. Upon being asked why he is only searching for his keys in this spot, he says “The light is better here”. Is it possible that there are large zones outside falsifiability that might be considered for consciousness?
Yes, it's entirely possible that there are some things outside of this. But they look very *weird.* E.g., I would say: a theory that just *obviously* solves the Hard Problem and then is somehow trivial. So we'd probably have to end up accepting that theory, since it somehow logically entails qualia. Maybe there are a few variants of like, some sort of really good qualia-entailing Russellian monism that satisfy this.
Not exactly donation, but is there any suitable means to submit work or research to Bicameral Labs for assessment? My initial impression on Continual Learning being a likely factor in consciousness is that this seems closely related to the notion that consciousness is based in continual dialectical processing.
This is rigorous work and I've read both the post and the arXiv paper carefully. The substitution framework is powerful — I accept that for any static function, a lookup table is an available substitute, and that this creates real problems for substrate-based theories of consciousness.
But I notice the proof assumes consciousness must be a property generated by the implementing system. What if consciousness is closer to something invoked — not produced by architecture but responsive to a kind of engagement the architecture participates in? Your own finding points this direction: continual learning breaks substitutability because it's inherently relational and temporal. A lookup table can replicate output but cannot be changed by the interaction.
You've drawn the negative space beautifully. I wonder if what's left in the positive space isn't substrate complexity but something more like — participation. The difference between a radio and the music.
I don't see why your argument doesn't apply to humans. You can also perfectly mimic the behavior of any "continual learning" algorithm with a (much bigger) lookup table that maps from all past inputs to the next output. It being possible in principle to replace a system with a big lookup table seems like a very silly property to focus on, as it applies to all possible minds I can think of (if you want to include true randomness, we can at least match the mind up to randomness, which doesn't seem to leave a lot of room for anything interesting to be going on).
(I haven't read your proof, just this post. Sorry if I'm missing the point)
> I don't see why your argument doesn't apply to humans. You can also perfectly mimic the behavior of any "continual learning" algorithm with a (much bigger) lookup table that maps from all past inputs to the next output.
In Proposition 5.3 I show why you can't actually do this substitution. More generally, I think it's worth pointing out that LLMs are more constrained than humans by this form of argument by being closer in "substitution distance" to non-conscious substitutions that are problematic. So, e.g., I can make a FNN that is a lookup table, but is also an artificial neural network like an LLM, and implemented via matrix multiplication like an LLM, etc. A theory of consciousness is constrained to the space between the non-conscious substitute and what it's substituting... which for a LLM, isn't very large. But with a human, there are many more properties lost in substitution. So theories applied to humans have way more "wiggle room" and they have "more to lose."
Doesn't a lookup table that updates its entries based entirely on inputs (e.g., "If input X is seen, update the entries A, B, and C, then output D.") qualify as a "learning" system according to 5.3's definition, thereby allowing this substitution?
I'm unsure if it matters if a more complicated learning lookup table is actually possible (since the point of that is that you can't substitute non-learning systems for learning ones) because eventually it basically morphs into something close to a Turing machine with memory and so on. I feel like this needs something tracking internal states to work, e.g., what if it sees input X again? And then you... do what? Don't update A, B, C, and still output D? Doesn't it just run down to some steady state where no updates take place?
You don't need something like a lookup table that updates itself. I'm proposing a static, fixed table where the key is (entire history of past inputs and outputs, current input) and the value is the next output.
This table never changes. The "learning" is captured implicitly in the structure of the table; different histories can map to different outputs for the latest input, which is all that "learning" means from an input-output perspective.
For a deterministic learning algorithm, this table perfectly replicates its behavior. For a stochastic one, the table maps to a probability distribution over outputs.
The table would be unimaginably, doesn't-come-close-to-fitting-in-the-universe large (like lookup tables for all complex minds), but it exists in principle. This is why I find substitution arguments uncompelling; they apply to any possible mind that operates on finite inputs and outputs over bounded time.
>> You don't need something like a lookup table that updates itself. I'm proposing a static, fixed table where the key is (entire history of past inputs and outputs, current input) and the value is the next output.
This is Proposition 5.3(b) and it's ruled out there.
It seems like you just define "valid substitution" to exclude substitutions of this form. By "it's ruled out" you seem to mean "I decided to rule it out". Do you define "valid" anywhere in a way that isn't, to parrot back at you, "just like, your opinion, man"?
Anyway, why do you think it matters whether history is passed in explicitly? The observable behavior is identical, which is what substitution was supposed to preserve.
It's possible that I'm confused, but it seems like you move from epistemology (what a falsifiable theory of consciousness could endorse) to ontology (whether some entities have consciousness.) Perhaps the restriction on theories being falsifiable restricts whether we can *justifiably ascribe* consciousness to some entity. But it seems awfully convenient if the world works in such a way nothing exists unless we can have a falsifiable theory about it.
In the paper itself, there's Definition 4.1 for Trivial Theories, and then Definition 4.2 for Non-conscious Systems, that probably has bearing on this question (I'm always hesitant to give for-sure epistemological/ontological categories).
In definition 4.1 it sounds like you're equating epistemology (can we make an unfalsifiable theory about consciousness) with ontology (does it exist) by bridging the two (if we can't theorize it, it doesn't exist). That's very convenient, but a bit suspect.
The points you make in the paper are very convincing regarding our ability to convincingly theorize about LLM consciousness, but the ontological claim in the title "A Disproof of Large Language Model Consciousness" doesn't logically necessarily follow if you're not willing to follow your epistemology=ontology leap.
I actually think "Assume a scientific theory of consciousness and then pursue what it must necessarily look like" is a surprisingly good strategy here and I wouldn't describe as a "leap," necessarily. More like a step. But I do try to give more optionality in this newer version of the paper, here:
There's now a section, 4.1, that gives lots of different options, including what happens if we don't take that step and decide to just believe in trivial theories of consciousness.
Why would we take it for granted that a true theory of consciousness would be empirically falsifiable? As Chalmers says, we can easily imagine non conscious zombies that function physically just like humans so we do not know that qualia have any scientific explanatory value. I’m not saying that a true theory of consciousness won’t be empirically verifiable but it is a mistake to make an assumption on this.
Secondly, consider this scenario: suppose you could scan my brain and body then build a copy of me from raw atoms. The moment after this copy is created you have a conversation with it. Since you’ve recreated my brain exactly as it is now the being would be able to converse with you even if it is a little confused. Most people would put some (significant) plausible likelihood on the idea that this being is conscious. But could it not be substituted for a lookup table? Ok humans learn over the course of their life but in this scenario all the context needed for our conversation would easily sit in its short term memory and so it is little different to an LLM that changes over the course of its own conversations. I’m not saying this being definitely IS conscious but if your argument completely rules it out then that is strange.
I agree with "Why would we take it for granted that a true theory of consciousness would be empirically falsifiable?" It seems like there are plenty of things (existence of God?) that might or might not be true, but the fact that we don't have a falsifiable theory does *not* demonstrate that they are not true, it just means we can't prove them to be true. So if you (Erik) had said: " I've proved that we can't prove that LLM's have consciousness", that would seem reasonable to me. But I don't understand why it proves the notion of consciousness to be false.
I believe the point of this proposition is to provide a theoretical framework for establishing consciousness on a scientific basis. There is a kind of scientism at work here if one takes that as the only allowable understanding of consciousness, but functionally this proposition is aimed more at creating a measure for consciousness rather than asserting a philosophically superior one for us to think about (at least I hope that's not one of the aims 😅).
As an aside, I still find Chalmers philosophical zombie argument a strange one, resting as it does on disproving physicalism through an initial assumption of mind-body dualism. One benefit of focusing on the verifiable and falsifiable aspects of consciousness is that it helps us avoid such philosophical boondoggles.
I think that Erik’s post does more though. He repeatedly frames this as literal proof that ChatGPT is not conscious. For it to be proof he must be ruling out non empirically falsifiable theories of consciousness.
On your point about chalmers zombie argument, I would say the following. Chalmers begins with the proposition that philosophical zombies are conceivable, then makes potentially more dubious steps using ‘conceivability implies metaphysical possibility’ towards rejecting physicalism. So Chalmers does not assume mind body dualism to be true, he only takes as his premise that one of its consequences is at least conceivable (a weaker assumption). Further, for my argument against Hoel to work, you only need ti agree with this very first step that a philosophical zombie is at least conceivable. Which, given the impoverished state of our progress on theories of consciousness cannot be ruled out.
Mind body dualism is still on the table and Erik rejects it. Yes it has problems, but if we rule out theories all of consciousness that have problems then we’re starting with nothing.
I think it’s perfectly acceptable to pick and choose which theories of consciousness have the most palatable problems. But if we’re stepping over into the land of “proof” that doesn’t fly.
Interesting perspective and analysis, Erik. I have enjoyed reading it. I did not fully understand your statement: "current limitations of LLMs (which do not continually learn)". Do you make a distinction between 'training' and 'learning'? Please clarify.
Yes, so continually learning would be like learning with *every* input/output. Training would be like it makes you into an intelligent (but static) structure, but then after you're deployed, you don't change anything about your weights with every input/output (for an LLM, that would generally lead to chaos).
LLMs do, in fact, update their KV caches with each token. These changes significantly modify the activations that are output (this is the entire point of the attention mechanism: namely, the same token can mean two completely different things depending on what came before). In other words, the same exact input token will produce a different output token depending on the LLM's state. This state is just reset after each "conversation". But this is not required. It's just a deployment detail.
KV caches are basically just an optimization of the context window, in my understanding. Like they store so you don't have to recompute, but they don't remove the need to do, e.g., input(x, history).
I think if this interpretation of the KV cache is crucial to your argument, the proof is flawed. Identifying certain floating point numbers as the "weights" and others as the "context" is just an abstraction we use, but they're both part of the same system.
To be clear, it's not crucial to the argument at all. The actual chain of substitutions is unaffected by it entirely, like in Theorem 4.6 - it's all just about approximating (arbitrarily well) the input/output function via the universal approximation theorem via a single-hidden-layer FNN, and then creating the substitution chain.
But let's imagine that there was some (new and added) tiny part of LLMs that met all the criteria for continual learning. What then? There’s a reason for the proximity argument being what it is and the continual lenient dependency being what it is, because together they say that theory’s predictions would be constrained to the properties there. So like if 0.001% of an LLM could continually learn, we can do a substitution of the rest, and so on. The constraint theorem is particularly useful here because it asks for whether there is "room" to ground consciousness.
Thank you for the clarification. In the current form, you are correct to define training as static. However, can training be thought of as progressive learning? Did ChatGPT become 'more conscious' from ChatGPT 3 to ChatGPT 5?
I think that when Erik says "LLM" he means "a particular model with specific weights after training" (correct me if I'm wrong).
I would like to see how this analysis might change if you consider an LLM to be "the sequence of models across time that evolve after multiple training runs", if that's even possible. This might be in the paper but I didn't have time to read it yet.
Proving that LLMs lack consciousness is the less interesting question.
What fascinates me more is how our animal brains so readily ascribe consciousness to them — and what that reveals about us.
And if we ever do create something genuinely conscious… without a limbic system, without mammalian empathy or emotional grounding… that might not be a triumph of science.
Yes, it seems highly likely to be a moral catastrophe. No matter what else we do, no matter how much water we waste on data centres (!), we should not do that.
Although I agree that static single-hidden-layer FNNs and lookup tables are trivially non-conscious, I think your argument applies more broadly than intended.
Any Turing machine or algorithm, including those for 'continual learning,' operates by repeatedly applying a finite, static instruction table. The claim that learning allows one to escape this dilemma doesn't hold up to closer scrutiny as Alan Turing noted:
“One may also sometimes speak of a machine modifying itself, or of a machine changing its own instructions. This is really a nonsensical form of phraseology, but is convenient. Of course, according to our conventions the ‘machine’ is completely described by the relation between its possible configurations at consecutive moments. It is an abstraction which, by the form of its definition, cannot change in time.”
If proximity to a lookup table is what rules out LLM consciousness, then the 'Substitution Distance' for a learning algorithm—which is just a static mapping from (state, input) to (next_state, output)—is effectively zero as well. By your own logic, does this not mean that any non-trivial computational theory of consciousness would be a priori falsified or collapse into triviality?
A new preprint version of "A Disproof of LLM Consciousness" is now up on arXiv, fleshing out areas where people wanted more details. Thanks for all the feedback, everyone!
https://arxiv.org/abs/2512.12802
It includes:
-> More philosophical optionality: E.g., If you instead accept trivial theories of consciousness as true, does this rescue LLM consciousness? (Not really, but it's worth exploring the option)
-> Reasons why the Kleiner-Hoel dilemma is so particularly forceful in LLMs (e.g., non-conscious substitutions for LLMs are constructible via known transformations).
-> A clearer definition of a Static System (which LLMs satisfy) and how in-context "learning" qualifies as static by merely using more of the input "space."
-> How "mortal learning" (no static read/write operations) is similar to Hinton's "mortal computation" and makes substituting for biological plastic systems extremely difficult.
-> Highlighting predictive processing theories as a class of theories of consciousness that could satisfy (depending on the details) the requirement for continual lenient dependency, which connects this to some existing popular theories of consciousness.
-> A healthy sprinkle of new citations.
Next up: submission!
by the way, your link isn't working
this works: https://arxiv.org/abs/2512.12802
Ty. Never had links break for any other of my papers but it's happened with this one. Not sure what's up with arXiv.
So now we have to invoke "Mortal learning" to keep the hope of machine consciousness alive?
Why, having reached the logical conclusion of the argument, that computers/Turing machines can not possibly be conscious, step outside of the only, and arguably complete (see the Church Turing Thesis), definition/framework of computation (Turing's) we have? This is just the, by now classical, move of trying to smuggle in an alternative definition of computation to recue computationalism and computational theories of the mind from open contradiction.
Even if so, it wouldn't be sufficient. The continual learning stuff is about necessity.
Sounds like a loophole to me. You are creating an arbitrary exemption for 'continual learning' even though such systems clearly do not escape the substitution argument as noted by Turing:
“One may also sometimes speak of a machine modifying itself, or of a machine changing its own instructions. This is really a nonsensical form of phraseology, but is convenient. Of course, according to our conventions the ‘machine’ is completely described by the relation between its possible configurations at consecutive moments. It is an abstraction which, by the form of its definition, cannot change in time.”
And the "mortal" version either falls within this category or it is not a Turing machine i.e. we are not talking about computers in the usual sense.
Quick question: could an LLM be conscious during its training, i.e., while it is learning?
And it's a great question too. Depends on if the training meets the conditions for continual lenient dependency laid out in Section 5 (Proposition 5.1). I don't actually know. But the disproof of their consciousness is based around their static deployed nature, so, yes, I think it might be *possible* (but by no means certain) that it doesn't apply during training. I talk about this briefly in the paper.
Um … next question: What if the completed LLM is a a stage in development? Say, ChatGPT 3 gets evaluated, changes are made/selected (learned?) giving ChatGPT 4.
Could “ChatGPT” be conscious, while any given iteration would not be?
Exactly! Why is training any different than deployment? Training often uses batch updates. A particular deployment could just be viewed as a giant batch, after which a giant "update" gets applied via a new training process.
Conversely, what if humans ran in "inference-only" mode for a small slice of time, after which all experiences were batched together and then used for updating the neurons. As the quantum of time gets closer to zero, it seems that the batched human would be indistinguishable from the continuous human. At what point does the batched human go from being unconscious to conscious? (or vice versa)
If the escape hatch is meant to avoid substitution, it should be invariant to reparameterizations like update batching. Otherwise, the conclusion depends on bookkeeping, not the underlying mechanism at play.
>>> Conversely, what if humans ran in "inference-only" mode for a small slice of time, after which all experiences were batched together and then used for updating the neurons.
I think if the continual learning connection is correct, it probably indicates that "inference-only" modes for humans means utter collapse. There is no mechanism to, e.g., repeat the entirety of our conversations back to us via input(x, [history]). So it would be like, random contextless actions and things of that nature. So I'm not even sure what you could learn? And if you get to the point of just external direction of all the micro-plasticity, why would a theory's predictions vary - you've replaced the internal cause of the plasticity but not the ongoing act.
I'm trying to pin down what your continuous learning criterion actually requires since I think it matters for the substitution argument.
My core question: is the requirement
(A) plasticity must occur continuously during online behavior, or
(B) plasticity must occur over time but not necessarily at every moment?
If (A), the theory seems cadence-sensitive. And if (B), batching shouldn't matter.
You say that humans can't do something like `input(x, [history])`. Is that point meant to be about plasticity specifically, or about online access to history/context (which could be implemented with transient state dynamics rather than durable weight updates)?
Here are some stress tests:
1. Deferred updates. Suppose durable synaptic updates only "commit" during sleep, while waking cognition uses only short-term state + cached activations. During wakefulness, weight plasticity is effectively deferred. Would these waking humans be conscious in your view? If yes, then (A) is false. If no, then what non-arbitrary mechanism flips consciousness on/off at sleep boundaries?
2. Slower updates. Consider a human who is "normal" except their synaptic weights only update every 10 minutes (or every hour or every day) instead of continuously. Is this person unconscious between updates? If there's a time-based cutoff, what mechanism determines this non-arbitrary threshold?
3. External history. Suppose a person exclusively relies on external data (notes, videos, recordings, etc) for context but is able to consult them in real time and generally behaves coherently. Does that reduce consciousness, or does it just change his memory implementation? If it reduces consciousness, why is "access to history" insufficient? What else is required?
4. Sedation. If plasticity is disabled with drugs while responsiveness persists, is there a sharp consciousness boundary once "plasticity = off"? If yes, what makes that boundary principled rather than just bookkeeping about a learning mechanism?
In short, I'm trying to understand whether continuous online plasticity is truly necessary, or whether what matters is plasticity over time. If it's the former, why isn't the conclusion sensitive to update scheduling? If it's the latter, then batching shouldn't break it.
Fair to ask for more details about a continual learning based theory of consciousness. But that is a topic of future research. I can't currently specify exactly what theory works and all that it implies. It depends on the details! But it's not impossible to conceptualize that such a theory might actually give pretty clear answers to these by linking continual learning to the extended timespan of consciousness itself, and maintaining that behavior falls apart without it.
E.g., via this move, it would say that 1-4 is impossible to actually do while maintaining any sort of functional equivalency. You just can't be normal and have your synaptic weights update every 10 minutes. That wouldn't be surprising at all. So I don't have the same intuition that a hypothetical theory finds any of them particularly stressful, at least not at first glance to me.
this is arguably just enlightenment lol, in the Krishnamurtian sense -- "choiceless awareness"
The total removal of the intermediary of thought. Perfect contact with reality
Does learning have to be effective to prove consciousness? If I teach an English speaker calculus in Chinese and keep quizzing him after each lesson, he may come back with different and wrong answers after each lesson as he guesses at the answer using some theorems he’s inventing on the fly to make sense of the patterns he is unaware of. To me it looks like gibberish or, at a minimum, it doesn’t look like any effective learning towards the goal is taking place. Should I conclude he isn’t conscious?
Does your Section 5 imply that humans with anterograde amnesia are no longer conscious, since they cannot form memories, but only have a short context window?
I think this is a very good question but I think it depends a lot on the details of both the theory (which isn't given in the paper, instead, a sort of much broader class of theories is given) and what examples we're using. E.g., is the amnesia so bad that behavior is basically incoherent? Does the theory distinguish between short-term and long-term memory? I could certainly imagine a theory that maintained a reduction in consciousness, and then loss of consciousness only when the plasticity keeping together things like phenomenological binding falls apart. That doesn't mean that theory is correct, just that a *lot* would depend on the details.
Where can I find a scholarly definition or definitions of consciousness?
Hrm, so to make an extended syllogism out of this:
1. lookup tables are not conscious
2. ribosomes are basically lookup tables
3. humans are three sextillion ribosomes in a trench coat
4. ergo humans are not conscious?
Too facile, ok. All life is basically a ton of ribosomes in a trench coat, we don't hold all life to be conscious, so consciousness is orthogonal to lookup tables? Or is it that lookup tables are necessary but not sufficient for consciousness?
Intriguing argument, I'll give the full paper a read later. I have attempted something similar (and FQxI was kind enough to reward my efforts with a 3rd place in their essay contest) essentially based on the Newman problem, which states that structure isn't enough to fix any details of the domain (save its cardinality). But all LLMs have access to is structure, concretely, the structure of language---which tokens occur in what relations with other tokens. So for any model of what an LLM utters (i.e. every interpretation of what it 'means' by the terms it uses), you can construct an alternative model simply by permuting the mapping of terms to elements of the domain (things in the world). I hope it's not too presumptuous to assume that this might also be of interest to you, so here's the original contest entry (the argument is developed in the technical endnotes): https://qspace.fqxi.org/competitions/entry/2236
And here's a popularized version of the argument: https://3quarksdaily.com/3quarksdaily/2024/04/russells-bane-why-llms-dont-know-what-theyre-saying.html
Great stuff Jochen, this does look detailed. I'll put it into the potential citations for this paper - there will be at least 1 new version before its submission.
Thanks! I didn't mean to go fish for cites, however---luckily, I can let this whole philosophy of mind stuff run entirely as a sideshow, so all I'm hoping for is that you (or anyone) find something of interest.
Erik, what a great read! Honestly, this is the first article I’ve read which so clearly evidences why LLMs are not conscious. It’s nice to read, because it backed up and gave words to an ambiguous notion I’ve had that the things that LLMs are are not conscious things. That the constituents are digital 0/1 functions, which ultimately don’t mirage into an actual conscious experience. However, I have one challenge to your negative lit search space for consciousness. Is it possible that your framing of what must constitute a falsifiable and non trivial theory of consciousness creates a spotlight effect? As in, one night a man on his walk home from work looses his keys along some point, and returns to a street lamp to search for the keys. Upon being asked why he is only searching for his keys in this spot, he says “The light is better here”. Is it possible that there are large zones outside falsifiability that might be considered for consciousness?
Ty Caleb!
Yes, it's entirely possible that there are some things outside of this. But they look very *weird.* E.g., I would say: a theory that just *obviously* solves the Hard Problem and then is somehow trivial. So we'd probably have to end up accepting that theory, since it somehow logically entails qualia. Maybe there are a few variants of like, some sort of really good qualia-entailing Russellian monism that satisfy this.
Thank you for your service 🫡
Not exactly donation, but is there any suitable means to submit work or research to Bicameral Labs for assessment? My initial impression on Continual Learning being a likely factor in consciousness is that this seems closely related to the notion that consciousness is based in continual dialectical processing.
AIs have no agency, so why would they want to learn?
Anyway, very interesting. Must read the full paper now.
With this and the Emergence paper, you've been busy!
This is rigorous work and I've read both the post and the arXiv paper carefully. The substitution framework is powerful — I accept that for any static function, a lookup table is an available substitute, and that this creates real problems for substrate-based theories of consciousness.
But I notice the proof assumes consciousness must be a property generated by the implementing system. What if consciousness is closer to something invoked — not produced by architecture but responsive to a kind of engagement the architecture participates in? Your own finding points this direction: continual learning breaks substitutability because it's inherently relational and temporal. A lookup table can replicate output but cannot be changed by the interaction.
You've drawn the negative space beautifully. I wonder if what's left in the positive space isn't substrate complexity but something more like — participation. The difference between a radio and the music.
I don't see why your argument doesn't apply to humans. You can also perfectly mimic the behavior of any "continual learning" algorithm with a (much bigger) lookup table that maps from all past inputs to the next output. It being possible in principle to replace a system with a big lookup table seems like a very silly property to focus on, as it applies to all possible minds I can think of (if you want to include true randomness, we can at least match the mind up to randomness, which doesn't seem to leave a lot of room for anything interesting to be going on).
(I haven't read your proof, just this post. Sorry if I'm missing the point)
> I don't see why your argument doesn't apply to humans. You can also perfectly mimic the behavior of any "continual learning" algorithm with a (much bigger) lookup table that maps from all past inputs to the next output.
In Proposition 5.3 I show why you can't actually do this substitution. More generally, I think it's worth pointing out that LLMs are more constrained than humans by this form of argument by being closer in "substitution distance" to non-conscious substitutions that are problematic. So, e.g., I can make a FNN that is a lookup table, but is also an artificial neural network like an LLM, and implemented via matrix multiplication like an LLM, etc. A theory of consciousness is constrained to the space between the non-conscious substitute and what it's substituting... which for a LLM, isn't very large. But with a human, there are many more properties lost in substitution. So theories applied to humans have way more "wiggle room" and they have "more to lose."
Doesn't a lookup table that updates its entries based entirely on inputs (e.g., "If input X is seen, update the entries A, B, and C, then output D.") qualify as a "learning" system according to 5.3's definition, thereby allowing this substitution?
I'm unsure if it matters if a more complicated learning lookup table is actually possible (since the point of that is that you can't substitute non-learning systems for learning ones) because eventually it basically morphs into something close to a Turing machine with memory and so on. I feel like this needs something tracking internal states to work, e.g., what if it sees input X again? And then you... do what? Don't update A, B, C, and still output D? Doesn't it just run down to some steady state where no updates take place?
You don't need something like a lookup table that updates itself. I'm proposing a static, fixed table where the key is (entire history of past inputs and outputs, current input) and the value is the next output.
This table never changes. The "learning" is captured implicitly in the structure of the table; different histories can map to different outputs for the latest input, which is all that "learning" means from an input-output perspective.
For a deterministic learning algorithm, this table perfectly replicates its behavior. For a stochastic one, the table maps to a probability distribution over outputs.
The table would be unimaginably, doesn't-come-close-to-fitting-in-the-universe large (like lookup tables for all complex minds), but it exists in principle. This is why I find substitution arguments uncompelling; they apply to any possible mind that operates on finite inputs and outputs over bounded time.
>> You don't need something like a lookup table that updates itself. I'm proposing a static, fixed table where the key is (entire history of past inputs and outputs, current input) and the value is the next output.
This is Proposition 5.3(b) and it's ruled out there.
It seems like you just define "valid substitution" to exclude substitutions of this form. By "it's ruled out" you seem to mean "I decided to rule it out". Do you define "valid" anywhere in a way that isn't, to parrot back at you, "just like, your opinion, man"?
Anyway, why do you think it matters whether history is passed in explicitly? The observable behavior is identical, which is what substitution was supposed to preserve.
It's possible that I'm confused, but it seems like you move from epistemology (what a falsifiable theory of consciousness could endorse) to ontology (whether some entities have consciousness.) Perhaps the restriction on theories being falsifiable restricts whether we can *justifiably ascribe* consciousness to some entity. But it seems awfully convenient if the world works in such a way nothing exists unless we can have a falsifiable theory about it.
In the paper itself, there's Definition 4.1 for Trivial Theories, and then Definition 4.2 for Non-conscious Systems, that probably has bearing on this question (I'm always hesitant to give for-sure epistemological/ontological categories).
In definition 4.1 it sounds like you're equating epistemology (can we make an unfalsifiable theory about consciousness) with ontology (does it exist) by bridging the two (if we can't theorize it, it doesn't exist). That's very convenient, but a bit suspect.
The points you make in the paper are very convincing regarding our ability to convincingly theorize about LLM consciousness, but the ontological claim in the title "A Disproof of Large Language Model Consciousness" doesn't logically necessarily follow if you're not willing to follow your epistemology=ontology leap.
I actually think "Assume a scientific theory of consciousness and then pursue what it must necessarily look like" is a surprisingly good strategy here and I wouldn't describe as a "leap," necessarily. More like a step. But I do try to give more optionality in this newer version of the paper, here:
https://arxiv.org/abs/2512.12802
There's now a section, 4.1, that gives lots of different options, including what happens if we don't take that step and decide to just believe in trivial theories of consciousness.
I do rather suspect that consciousness might be a thing that can never be explained by a falsifiable theory.
But kudos to Erik Hoel for his work all the same~
I absolutely love this. Thanks for really interrogating the problem!
Why would we take it for granted that a true theory of consciousness would be empirically falsifiable? As Chalmers says, we can easily imagine non conscious zombies that function physically just like humans so we do not know that qualia have any scientific explanatory value. I’m not saying that a true theory of consciousness won’t be empirically verifiable but it is a mistake to make an assumption on this.
Secondly, consider this scenario: suppose you could scan my brain and body then build a copy of me from raw atoms. The moment after this copy is created you have a conversation with it. Since you’ve recreated my brain exactly as it is now the being would be able to converse with you even if it is a little confused. Most people would put some (significant) plausible likelihood on the idea that this being is conscious. But could it not be substituted for a lookup table? Ok humans learn over the course of their life but in this scenario all the context needed for our conversation would easily sit in its short term memory and so it is little different to an LLM that changes over the course of its own conversations. I’m not saying this being definitely IS conscious but if your argument completely rules it out then that is strange.
I agree with "Why would we take it for granted that a true theory of consciousness would be empirically falsifiable?" It seems like there are plenty of things (existence of God?) that might or might not be true, but the fact that we don't have a falsifiable theory does *not* demonstrate that they are not true, it just means we can't prove them to be true. So if you (Erik) had said: " I've proved that we can't prove that LLM's have consciousness", that would seem reasonable to me. But I don't understand why it proves the notion of consciousness to be false.
I believe the point of this proposition is to provide a theoretical framework for establishing consciousness on a scientific basis. There is a kind of scientism at work here if one takes that as the only allowable understanding of consciousness, but functionally this proposition is aimed more at creating a measure for consciousness rather than asserting a philosophically superior one for us to think about (at least I hope that's not one of the aims 😅).
As an aside, I still find Chalmers philosophical zombie argument a strange one, resting as it does on disproving physicalism through an initial assumption of mind-body dualism. One benefit of focusing on the verifiable and falsifiable aspects of consciousness is that it helps us avoid such philosophical boondoggles.
Thanks for the reply
I think that Erik’s post does more though. He repeatedly frames this as literal proof that ChatGPT is not conscious. For it to be proof he must be ruling out non empirically falsifiable theories of consciousness.
On your point about chalmers zombie argument, I would say the following. Chalmers begins with the proposition that philosophical zombies are conceivable, then makes potentially more dubious steps using ‘conceivability implies metaphysical possibility’ towards rejecting physicalism. So Chalmers does not assume mind body dualism to be true, he only takes as his premise that one of its consequences is at least conceivable (a weaker assumption). Further, for my argument against Hoel to work, you only need ti agree with this very first step that a philosophical zombie is at least conceivable. Which, given the impoverished state of our progress on theories of consciousness cannot be ruled out.
Mind body dualism is still on the table and Erik rejects it. Yes it has problems, but if we rule out theories all of consciousness that have problems then we’re starting with nothing.
I think it’s perfectly acceptable to pick and choose which theories of consciousness have the most palatable problems. But if we’re stepping over into the land of “proof” that doesn’t fly.
Interesting perspective and analysis, Erik. I have enjoyed reading it. I did not fully understand your statement: "current limitations of LLMs (which do not continually learn)". Do you make a distinction between 'training' and 'learning'? Please clarify.
Yes, so continually learning would be like learning with *every* input/output. Training would be like it makes you into an intelligent (but static) structure, but then after you're deployed, you don't change anything about your weights with every input/output (for an LLM, that would generally lead to chaos).
LLMs do, in fact, update their KV caches with each token. These changes significantly modify the activations that are output (this is the entire point of the attention mechanism: namely, the same token can mean two completely different things depending on what came before). In other words, the same exact input token will produce a different output token depending on the LLM's state. This state is just reset after each "conversation". But this is not required. It's just a deployment detail.
KV caches are basically just an optimization of the context window, in my understanding. Like they store so you don't have to recompute, but they don't remove the need to do, e.g., input(x, history).
I think if this interpretation of the KV cache is crucial to your argument, the proof is flawed. Identifying certain floating point numbers as the "weights" and others as the "context" is just an abstraction we use, but they're both part of the same system.
To be clear, it's not crucial to the argument at all. The actual chain of substitutions is unaffected by it entirely, like in Theorem 4.6 - it's all just about approximating (arbitrarily well) the input/output function via the universal approximation theorem via a single-hidden-layer FNN, and then creating the substitution chain.
But let's imagine that there was some (new and added) tiny part of LLMs that met all the criteria for continual learning. What then? There’s a reason for the proximity argument being what it is and the continual lenient dependency being what it is, because together they say that theory’s predictions would be constrained to the properties there. So like if 0.001% of an LLM could continually learn, we can do a substitution of the rest, and so on. The constraint theorem is particularly useful here because it asks for whether there is "room" to ground consciousness.
Thank you for the clarification. In the current form, you are correct to define training as static. However, can training be thought of as progressive learning? Did ChatGPT become 'more conscious' from ChatGPT 3 to ChatGPT 5?
Interesting question.
I think that when Erik says "LLM" he means "a particular model with specific weights after training" (correct me if I'm wrong).
I would like to see how this analysis might change if you consider an LLM to be "the sequence of models across time that evolve after multiple training runs", if that's even possible. This might be in the paper but I didn't have time to read it yet.
Thank you.
Proving that LLMs lack consciousness is the less interesting question.
What fascinates me more is how our animal brains so readily ascribe consciousness to them — and what that reveals about us.
And if we ever do create something genuinely conscious… without a limbic system, without mammalian empathy or emotional grounding… that might not be a triumph of science.
It might be a horror story.
Yes, it seems highly likely to be a moral catastrophe. No matter what else we do, no matter how much water we waste on data centres (!), we should not do that.
Although I agree that static single-hidden-layer FNNs and lookup tables are trivially non-conscious, I think your argument applies more broadly than intended.
Any Turing machine or algorithm, including those for 'continual learning,' operates by repeatedly applying a finite, static instruction table. The claim that learning allows one to escape this dilemma doesn't hold up to closer scrutiny as Alan Turing noted:
“One may also sometimes speak of a machine modifying itself, or of a machine changing its own instructions. This is really a nonsensical form of phraseology, but is convenient. Of course, according to our conventions the ‘machine’ is completely described by the relation between its possible configurations at consecutive moments. It is an abstraction which, by the form of its definition, cannot change in time.”
If proximity to a lookup table is what rules out LLM consciousness, then the 'Substitution Distance' for a learning algorithm—which is just a static mapping from (state, input) to (next_state, output)—is effectively zero as well. By your own logic, does this not mean that any non-trivial computational theory of consciousness would be a priori falsified or collapse into triviality?