79 Comments
User's avatar
VL's avatar

Bravo! Well done, and beautifully explained. But I confess to some minor annoyance at the end that you defer applying this theory to the question of free will. I hope you can rest from your labors for a bit, then gird your loins and come back to help clarify this controversial issue, which has been muddled in just about every possible way, from the microscale to the macroscale....

Erik Hoel's avatar

Thank you. And fair on the annoyance - I just figured that after 5,000 words I'm pushing the limits of readability.

Basically, I think free will is just when your consciousness (or its neural correlates, if we are to be metaphysically neutral) has a strong causal contribution in the hierarchy of scales. Since presumably it must be relatively macroscale, this means that free will requires a "top-heavy" macroscale (or at least, somewhat top-heavy). Predictions would be that, e.g., the brain is much more top-heavy compared to other parts of the body in its operations.

Is this sufficient for free will? No, I think not. But I think it is necessary for it. But, at some limit of necessary conditions I think you get a sufficient condition. So for instance, if a system were: (a) top-heavy in this causally-emergent fashion in a way that corresponded to its neural correlates of consciousness/mind, and (b) had source irreducibility (it cannot be predicted except by its own actions), and (c) had the ability to consider things like counterfactuals... THEN I think this is probably sufficient for free will.

AlexP's avatar

Under these conditions a bunch of things should have free will: corporations, markets, the universe itself (via the principle of least action).

Peter T Hooper's avatar

Yes. And why would you prefer to suggest these do not, if that's your point?

Matt Sigl's avatar

Great article. my issue with free will and this model is that inability to predict is not enough. It has to be the case that the choice can’t be understood as absolutely necessary even after the fact. If our actions are determined but unpredictable they are still determined, just surprisingly so. Still, I see the causal emergence you argue for here as a way to explain the reality of mental causation whether determined or free. But freedom just requires more; it requires that consciousness have the irreducible causal power to contribute to its own transformations in a way that is neither wholly predictable nor retroactively necessary. Things happen because they are determined by choice, or there is no such thing as choice. But nonetheless, understanding how mental causation is possible (as causally irreducible macro-causes) is a necessary (but not sufficient ) condition for freedom. The big question I have for causal emergence is related to IIT: are all causally irreducible macro causes associated with conscious experience? Macroscopic causes as the actual identity of mereological wholes, necessarily only real when irreducibly integrated, and all integrated causal wholes being conscious? (Kelvin McQueen has an amazing article about IIT and mereology that argues for just that.) great work.

VL's avatar

To be clear, I was gently teasing--I'm fully aware that the topic of free will would require considerably more space (and energy) to handle. I agree with your take above, but I think it would be interesting and valuable for your readership to engage the question in a more thorough-going way. Especially since Robert Sapolsky's "Determined" has become rather overwhelmingly popular ...

Peter T Hooper's avatar

Sapolsky is an instance of the dogmatic work of a well-meaning and humane man becoming an engine for great harm.

Surely we are able to see that absolutist determinism puts in place the very foundational argument for hierarchical tyranny after the manner of far-right “intellectual” apologists as Jordan Petersen.

Phil S's avatar

Another vote here for Erik to expand on the implications for free will in a separate paper. This research seems like the most important thing on the topic of free will I've seen in years (maybe ever!)

Peter T Hooper's avatar

As we do this, let us perhaps distinguish notional “free will” from volition (or an active capability to choose and direct effort bounded by inevitable constraints).

Those of us friendly to the idea of purposeful choice are that way because we don’t prefer to cast off relationship-based ethic as we are forced out into the stony desert of absolutist determinism.

Aidan Barbieux's avatar

Gongshi—spirit stones—along with the stone formations of Joshua tree have been great inspirations for me when studying emergence. Glad to see someone else appreciating this connection

Foster Roberts's avatar

Bravo! I wonder about this applied to connectome studies.

Thomas F. Varley, PhD's avatar

Is there a proof that \deltaCP will always be non-negative on the partition lattice?

Erik Hoel's avatar

Hey Thomas! So \deltaCP can be negative or zero. A coarse-graining might make it worse, for instance. It’s only a small subset of scales for which it’s positive, and that’s what gets included in the emergent hierarchy.

Thomas F. Varley, PhD's avatar

Ok, that makes sense (too many years of thinking abut PID has clearly infected my brain). So when \deltaCP is negative, that means that this particularly coarse graining has lower causal effectiveness?

Abel Jansma's avatar

Since CP (effectiveness) is non-monotonic on the lattice, I don’t think any sensible dCP would be nonnegative. It’s also *not* the Möbius inversion of CP, so it’s different from PID in many ways. (Though there are other ways to create a nonnegative decomp of causality: https://arxiv.org/abs/2501.11447)

Nicole Marino's avatar

Erik you’ve sent me into a deep thought spiral with this one and trying to figure out which layers of biology are more causally ‘weighty’

I feel the molecular bio paradigm biases us to think it’s all epiphenomenal but just ‘too complex to model’ whereas here you’re formalising how different layers can be irreducible to the layers below. It would explain a lot, and suggests avenues for control of biology that aren’t simply bottom-up.

Jared Parmer's avatar

This is fascinating, thanks. I'm interested in understanding emergence and social networks and things for my own research interests (ethics and social epistemology, mainly). Beyond the readings you've mentioned, could you recommend any (additional) primers on information theory, network science, complexity science?

Jared Parmer's avatar

Fantastic, thank you so much! This is perfect. Very excited to dive in.

If I may ask for a bit more 'pop explanation' about your latest work: toward the end of this post you discuss 'irreducible causal contributions' at higher scales that amount to emergent behavior of the system. I can't really see how this is possible, given how the levels of the lattice are constructed. As I understood things, you create higher levels by covering all possible partitions of lower-scale states, and construct corresponding TPMs in such a way that the conditional probabilities of the lower-level TPM are respected (for lack of a better word). In your post, for example, you talk about how the probabilities of state transitions from A to D and to E, respectively, are the same whether or not you are at the more microscale system that includes and differentiates B and C, or the more macroscale system that partitions them (black-box-like) as mu. But if this is how causal contribution is measured, how could any of it be irreducible? One could simply de-partition the relevant 'black box', moving down in scale, and uncover the underlying dynamics that explain the higher-scale dynamics. So it seems to me.

Really enjoyed this post -- you've given me a lot to chew on.

Erik Hoel's avatar

> In your post, for example, you talk about how the probabilities of state transitions from A to D and to E, respectively, are the same whether or not you are at the more microscale system that includes and differentiates B and C, or the more macroscale system that partitions them (black-box-like) as mu.

Gotcha. So, this isn't how the causal contributions are measured. This is how the macroscales are constructed / checked for consistency. The causal contributions are assessed via the determinism and degeneracy. So for instance, that grouping there, A at the microscale goes to (let's say) 50/50 B and C. That is indeterminism. A might cause B. It might cause C. So we know that the determinism cannot be high (1). At the macroscale, A goes to Alpha with p = 1 (determinism). So we have some sort of increase in that relationship. However, both causal descriptions still capture the same dynamics, in the first case, A going to either B or C, and in the second, A going to Alpha.

Jared Parmer's avatar

Ah, that's helpful, thanks. Sorry if you said that already and I missed it :)

So in this example, giving things a more qualitative spin, the 'irreducible causal contribution' of the macroscale is something like invariance vis-a-vis the microscale -- the generic outcome Alpha is invariant across multiple possible microinteractions. I'm sure you could also construct interesting 'homeostases' in this way -- macroscale outcomes that look like causal loops with more complicated microscale outcomes underneath.

Rose's avatar

Hi Erik, thank you and congratulations for this amazing paper!

I have a few clarifying questions to make sure I understood correctly:

1. On the applicability on real-life systems - In order to apply your theory to (for exemple) the brain, we would need to set weights and transitioning probabilities to each of the states and scales you described, right? How could we do that?

2. Let's imagine we managed to define the TPM of the brain. A consequence would be that we now know which state is the most useful to explain (for example) consciousness. Could it mean we could make entire research fields obsolete?

3. Would you say that emergence theories are a sub-field of complexity studies?

Erik Hoel's avatar

Ty!

1. Yes, we’d need a very detailed connectome of the brain to apply this in full. And then we’d still need to rely on heuristics. Getting full microscale models is very difficult. However, you could also find ways to estimate it off real data. That’s sort of just the ideal measurement.

2. I think consciousness will make most neuroscience obsolete anyways, since it renders most neuroscience incommensurate.

3. Yes.

Notaarguello's avatar

This reminds me to how Isaac Asimov “psycohistory” science from the Foundation triology works “only for massive populations and cannot predict the actions of individuals because of the noise of individual actions”. The man was truly a visionary…

Erik Hoel's avatar

Great catch! If I'm remembering correctly, I use the same analogy in The World Behind the World too.

Cristhian Ucedo's avatar

Isn't your concept of multiscale descriptions just fractality? People usually conflate fractality with its most famous form, self-similar fractality.

Erik Hoel's avatar

So there's the broad multiscale description (the "un-hewn" set of partitions/scales) and then there's the emergent hierarchy (the "hewn" set of remaining partitions/scales that do causally contribute). Which do you mean here?

I think the closest thing to self-similar fractality would be the case of "literal scale-freeness."

Cristhian Ucedo's avatar

I mean the first one. But i foolishly wrote this comment before reading all your entry. It's great! Sorry and thanks.

Erik Hoel's avatar

Good instincts though!

Gradatim's avatar

The degeneracy is trickier to understand but, in principle, quite similar. Degeneracy would be maximal (= 1) if all the states deterministically led to just one state (i.e., all causes always had the same effect in the system). If every cause led deterministically to a unique effect, then degeneracy would be 0.

i have hard time to understand how degeneracy can be either 1 or 0 for the same situation ? did i miss something ?

Erik Hoel's avatar

Ah, yes, the phrasing there is confusing. Technically correct but, on a re-read, confusing (I'll edit to add a bit of context).

So all causes -> [one effect] with p = 1 would mean that degeneracy = 1. Think of it like a die roll but all the sides are painted with the same number.

If all causes -> [a different effect for each] with p = 1, then degeneracy = 0. So think of it like there are as many effects as there are causes, and each cause "owns" a unique effect.

The inverse of degeneracy is called "specificity," and is more intuitive, in that a high specificity means that each cause has a specific effect.

Gradatim's avatar

thanks for the clarification, and i <3 you !!! you made my day and even more

Shreyal Gupta's avatar

This is brilliant work! I love your style of explanation too.

The European Polymath's avatar

Very interesting, curious how it links to micro scale of a 'quantum' sorts that could introduce more randomness but let's say give some room for free will

RDM's avatar

Best thing I read all night.

Can the raw lattice be re-cast into regions of activity (or impact, or connectedness) other than simply by partition size?

I am looking for a way to stuff the idea of Markov blanket into this construct of yours, and that would involve routine chaining of distinct nodes (or groups of nodes) and nesting of smaller partitions within larger partitions, or a combination of both...

Vague as hell, I know, sorry, will go back and re-read and ponder. Am looking for a way to characterize dynamics on these 'nets'....

And as if that weren't vague enough, a research question: Is there any way to characterize how sensitive (or stable) these mesoscale sweet spots are with respect to the addition/subtraction of a single node?

Absolutely fascinating stuff. Thanks for your work and the time to post....

Erik Hoel's avatar

Ty!

> Is there any way to characterize how sensitive (or stable) these mesoscale sweet spots are with respect to the addition/subtraction of a single node?

Hmm, I think you could do a sort of robustness analysis. But it depends on what "a single node" means: is it a single micro-node (in which case, the macroscale would be very robust) or a single macro-node (in which case, not so much, probably).

Iuval Clejan's avatar

Cool! I hope you win the Turing prize for this.

1. Practically, you can compute dCP for a current level node in a bottom up fashion, and only need to look at significant dCP lower level nodes, not all of them at the lower level (and certainly not 2 or more levels below), right?

2. Is there any "physical" isomorphism to the brain, in going from a network of neurons' adjacency matrix to a TPM? Is this part of how the brain does classification?

3. Is there any physical isomorphism to the brain, in going from a lower level of coarse graining to a higher level and picking out the highest dCP nodes?

4.Is there any physical meaning to the edges between highest dCP nodes between levels?

Erik Hoel's avatar

Kind words, ty!

1. Yes, you can just look at all the paths to the current node.

2. I'm not sure exactly what an isomorphism would imply here, but I would say is I would be very unsurprised if there aren't strong relationships to things like classification and causal emergence.

3. I would say possible maybe concept formation, for the same thing as (2) (or maybe it could be worked out with Neural Darwinism).

4. There's "physical meaning" to every edge in that it represents some refinement, i.e., it represents the spatiotemporal structure becoming coarser (as you go "up" the edge) or finer (as you go "down.") But I don't know of any specific physical meaning other than that.

Niles Loughlin's avatar

Really fascinating work Erik! It’s a shame this wasn’t published a few months earlier; I wrote an essay for the Berggruen Institute competition that you got tapped by and shared the information for here on your blog that draws on your Causal Emergence 1.0 paper with effective information. The basic premise for it argues for using a philosophical framework rooted in dialectical materialism to describe consciousness as an emergent process, and it relates the sort of scientific work in causal emergence you’ve done to that. It’s not published obviously, as I know there are restrictions for the competition, but if you were interested in any sort of philosophical work that draws on causal emergence to support its framework I’d be happy to share the material with you! I think what your work here is arguing for still supports what I wrote about, if not having potential to enhance my argument further.

Ax Ganto's avatar

This is very interesting as always, but I have a hard time understanding how irreducibility is deduced:

-The definitions of causal primitives (Determinism & Degeneracy) lead to a score for each scale for their causal contributions ∆CP.

-if this measure goes up on a particular scale, we then say that this has emergent causality that is not reducible.

But on an intuitive level, we can still get the intermediate scale from the microscale.

So how do you interpret the gain in ∆CP at that scale? Is it that it’s simply a better description of the whole system at that scale or that there is really something being explained at that intermediate level that is simply impossible at the microscale?

Erik Hoel's avatar

I think something is being "added" or "explained" - the error-correction itself.

In terms of interpretation, I would say causal emergence is a kind of "non-mysterious irreducibility."

You might very sensibly question if this is a real category!

I think it is. Let's take a completely different scenario but one that I think inarguably demonstrates "non-mysterious irreducibility," : the joint causation of an XOR gate.

An XOR gate's output cannot be reduced to just one of its inputs. In this, it is different in kind than an AND gate, e.g., where you can get some information about the output based on a single input. If one AND input is 0, then the AND gate must be outputting 0. But if just a single XOR's input is known, we cannot, in principle, say anything about the output. The output is totally irreducible to the lone input. But this irreducibility is "non-mysterious" in that, once you understand how an XOR functions, you can trace exactly how it happens.

Analogously (not identically) I think causal emergence is of this same kind of "non-mysterious irreducibility." The macroscale causal relationships really are [stronger / more powerful / offer better explanations / etc.] but they are so via error-correction. You can trace exactly how that happens, how the noise gets minimized, how the specificity increases, etc., but this doesn't shift the gain down to the microscale, it is just the explanation for how the gain happened.

Ax Ganto's avatar

I guess I would have to think about it more. The first thing that comes to mind (pushing the analogy) is that all Propositional connectors are reducible to NOT(~) and OR.

XOR is then :

(A AND ~B) OR (B AND ~A) the AND being defined from OR and NOT.

The gain at the intermediate scale here, would be that you can build a combination of connectors from basic ones where some property -that was present initially- is no longer valid.

Meaning that initially (with OR and NOT), the input always gives some info on the output, but when we combine them, this is no longer the case. I guess we can call that emergent but I am not sure about causal.