79 Comments
User's avatar
Aidan Barbieux's avatar

Gongshi—spirit stones—along with the stone formations of Joshua tree have been great inspirations for me when studying emergence. Glad to see someone else appreciating this connection

Expand full comment
VL's avatar

Bravo! Well done, and beautifully explained. But I confess to some minor annoyance at the end that you defer applying this theory to the question of free will. I hope you can rest from your labors for a bit, then gird your loins and come back to help clarify this controversial issue, which has been muddled in just about every possible way, from the microscale to the macroscale....

Expand full comment
Erik Hoel's avatar

Thank you. And fair on the annoyance - I just figured that after 5,000 words I'm pushing the limits of readability.

Basically, I think free will is just when your consciousness (or its neural correlates, if we are to be metaphysically neutral) has a strong causal contribution in the hierarchy of scales. Since presumably it must be relatively macroscale, this means that free will requires a "top-heavy" macroscale (or at least, somewhat top-heavy). Predictions would be that, e.g., the brain is much more top-heavy compared to other parts of the body in its operations.

Is this sufficient for free will? No, I think not. But I think it is necessary for it. But, at some limit of necessary conditions I think you get a sufficient condition. So for instance, if a system were: (a) top-heavy in this causally-emergent fashion in a way that corresponded to its neural correlates of consciousness/mind, and (b) had source irreducibility (it cannot be predicted except by its own actions), and (c) had the ability to consider things like counterfactuals... THEN I think this is probably sufficient for free will.

Expand full comment
Alex Pawlowski's avatar

Under these conditions a bunch of things should have free will: corporations, markets, the universe itself (via the principle of least action).

Expand full comment
Peter T Hooper's avatar

Yes. And why would you prefer to suggest these do not, if that's your point?

Expand full comment
Matt Sigl's avatar

Great article. my issue with free will and this model is that inability to predict is not enough. It has to be the case that the choice can’t be understood as absolutely necessary even after the fact. If our actions are determined but unpredictable they are still determined, just surprisingly so. Still, I see the causal emergence you argue for here as a way to explain the reality of mental causation whether determined or free. But freedom just requires more; it requires that consciousness have the irreducible causal power to contribute to its own transformations in a way that is neither wholly predictable nor retroactively necessary. Things happen because they are determined by choice, or there is no such thing as choice. But nonetheless, understanding how mental causation is possible (as causally irreducible macro-causes) is a necessary (but not sufficient ) condition for freedom. The big question I have for causal emergence is related to IIT: are all causally irreducible macro causes associated with conscious experience? Macroscopic causes as the actual identity of mereological wholes, necessarily only real when irreducibly integrated, and all integrated causal wholes being conscious? (Kelvin McQueen has an amazing article about IIT and mereology that argues for just that.) great work.

Expand full comment
VL's avatar

To be clear, I was gently teasing--I'm fully aware that the topic of free will would require considerably more space (and energy) to handle. I agree with your take above, but I think it would be interesting and valuable for your readership to engage the question in a more thorough-going way. Especially since Robert Sapolsky's "Determined" has become rather overwhelmingly popular ...

Expand full comment
Peter T Hooper's avatar

Sapolsky is an instance of the dogmatic work of a well-meaning and humane man becoming an engine for great harm.

Surely we are able to see that absolutist determinism puts in place the very foundational argument for hierarchical tyranny after the manner of far-right “intellectual” apologists as Jordan Petersen.

Expand full comment
Phil S's avatar

Another vote here for Erik to expand on the implications for free will in a separate paper. This research seems like the most important thing on the topic of free will I've seen in years (maybe ever!)

Expand full comment
Peter T Hooper's avatar

As we do this, let us perhaps distinguish notional “free will” from volition (or an active capability to choose and direct effort bounded by inevitable constraints).

Those of us friendly to the idea of purposeful choice are that way because we don’t prefer to cast off relationship-based ethic as we are forced out into the stony desert of absolutist determinism.

Expand full comment
Foster Roberts's avatar

Bravo! I wonder about this applied to connectome studies.

Expand full comment
Abel Jansma's avatar

me too!

Expand full comment
Thomas F. Varley, PhD's avatar

Is there a proof that \deltaCP will always be non-negative on the partition lattice?

Expand full comment
Erik Hoel's avatar

Hey Thomas! So \deltaCP can be negative or zero. A coarse-graining might make it worse, for instance. It’s only a small subset of scales for which it’s positive, and that’s what gets included in the emergent hierarchy.

Expand full comment
Thomas F. Varley, PhD's avatar

Ok, that makes sense (too many years of thinking abut PID has clearly infected my brain). So when \deltaCP is negative, that means that this particularly coarse graining has lower causal effectiveness?

Expand full comment
Abel Jansma's avatar

Since CP (effectiveness) is non-monotonic on the lattice, I don’t think any sensible dCP would be nonnegative. It’s also *not* the Möbius inversion of CP, so it’s different from PID in many ways. (Though there are other ways to create a nonnegative decomp of causality: https://arxiv.org/abs/2501.11447)

Expand full comment
Nicole Marino's avatar

Erik you’ve sent me into a deep thought spiral with this one and trying to figure out which layers of biology are more causally ‘weighty’

I feel the molecular bio paradigm biases us to think it’s all epiphenomenal but just ‘too complex to model’ whereas here you’re formalising how different layers can be irreducible to the layers below. It would explain a lot, and suggests avenues for control of biology that aren’t simply bottom-up.

Expand full comment
Jared Parmer's avatar

This is fascinating, thanks. I'm interested in understanding emergence and social networks and things for my own research interests (ethics and social epistemology, mainly). Beyond the readings you've mentioned, could you recommend any (additional) primers on information theory, network science, complexity science?

Expand full comment
Jared Parmer's avatar

Fantastic, thank you so much! This is perfect. Very excited to dive in.

If I may ask for a bit more 'pop explanation' about your latest work: toward the end of this post you discuss 'irreducible causal contributions' at higher scales that amount to emergent behavior of the system. I can't really see how this is possible, given how the levels of the lattice are constructed. As I understood things, you create higher levels by covering all possible partitions of lower-scale states, and construct corresponding TPMs in such a way that the conditional probabilities of the lower-level TPM are respected (for lack of a better word). In your post, for example, you talk about how the probabilities of state transitions from A to D and to E, respectively, are the same whether or not you are at the more microscale system that includes and differentiates B and C, or the more macroscale system that partitions them (black-box-like) as mu. But if this is how causal contribution is measured, how could any of it be irreducible? One could simply de-partition the relevant 'black box', moving down in scale, and uncover the underlying dynamics that explain the higher-scale dynamics. So it seems to me.

Really enjoyed this post -- you've given me a lot to chew on.

Expand full comment
Erik Hoel's avatar

> In your post, for example, you talk about how the probabilities of state transitions from A to D and to E, respectively, are the same whether or not you are at the more microscale system that includes and differentiates B and C, or the more macroscale system that partitions them (black-box-like) as mu.

Gotcha. So, this isn't how the causal contributions are measured. This is how the macroscales are constructed / checked for consistency. The causal contributions are assessed via the determinism and degeneracy. So for instance, that grouping there, A at the microscale goes to (let's say) 50/50 B and C. That is indeterminism. A might cause B. It might cause C. So we know that the determinism cannot be high (1). At the macroscale, A goes to Alpha with p = 1 (determinism). So we have some sort of increase in that relationship. However, both causal descriptions still capture the same dynamics, in the first case, A going to either B or C, and in the second, A going to Alpha.

Expand full comment
Jared Parmer's avatar

Ah, that's helpful, thanks. Sorry if you said that already and I missed it :)

So in this example, giving things a more qualitative spin, the 'irreducible causal contribution' of the macroscale is something like invariance vis-a-vis the microscale -- the generic outcome Alpha is invariant across multiple possible microinteractions. I'm sure you could also construct interesting 'homeostases' in this way -- macroscale outcomes that look like causal loops with more complicated microscale outcomes underneath.

Expand full comment
Rose's avatar

Hi Erik, thank you and congratulations for this amazing paper!

I have a few clarifying questions to make sure I understood correctly:

1. On the applicability on real-life systems - In order to apply your theory to (for exemple) the brain, we would need to set weights and transitioning probabilities to each of the states and scales you described, right? How could we do that?

2. Let's imagine we managed to define the TPM of the brain. A consequence would be that we now know which state is the most useful to explain (for example) consciousness. Could it mean we could make entire research fields obsolete?

3. Would you say that emergence theories are a sub-field of complexity studies?

Expand full comment
Erik Hoel's avatar

Ty!

1. Yes, we’d need a very detailed connectome of the brain to apply this in full. And then we’d still need to rely on heuristics. Getting full microscale models is very difficult. However, you could also find ways to estimate it off real data. That’s sort of just the ideal measurement.

2. I think consciousness will make most neuroscience obsolete anyways, since it renders most neuroscience incommensurate.

3. Yes.

Expand full comment
Joe Canimal's avatar

Nice exposition and some dazzling pictures, but the headline — “I figured out how to engineer emergence” — overpromises what’s actually shown. Two quick passages in the post illustrate the gap. You write that the squishing “can be done cleverly in such a way that both dynamics and the effects of interventions are preserved,” and your footnote says CE 2.0 “provably crops up… across a bunch of choices of P(C).” Those are strong, general claims; the paper’s methods and demos don’t justify them.

To be explicit and concrete: the formula you give for the score—“determinism + specificity − 1,” with determinism = 1 − H(E|C)/log2(n) and specificity = H(E)/log2(n)—is algebraically just mutual information, CP = [H(E) − H(E|C)]/log2(n) = I(C;E)/log2(n). The method is InfoMax over partitions: search the partition (coarse-graining) lattice of a transition matrix and keep those lumpings that most increase how much the “do(C)” tells you about the next state. That’s a useful exploratory objective, but it isn’t a new causal functional or a general recipe for “engineering” emergence.

That leads to three specific reasons the rhetoric should be toned down. First, the “macro > micro” effect you highlight often comes from changing the intervention prior. In the paper the authors evaluate the micro channel under one convention (e.g., uniform micro do’s) but re-uniformize at each macro level. If instead the micro is allowed its best input (channel capacity) or the macro interventions are required to be the pushforward of micro interventions, then by the data-processing inequality no deterministic downstream summary can beat the micro. Put simply: macro looks stronger mainly when you compare different channels or give the macro a better prior.

Second, the method does not enforce dynamical validity. A legitimate macro variable should carry its own lawful dynamics (exact or ε-approximate lumpability / probabilistic bisimulation). The search lets you pick groupings that make one-step prediction look prettier but may break multi-step predictability or the Markov property. That’s why the claim the squishing “preserves dynamics and the effects of interventions” reads as an assertion in the blog but is not established by the current implementation.

Third, the bookkeeping is rhetorical, not accounting. The post talks about apportioning causation “as if we were cutting up a pie fairly,” but the paper’s Δ-rule (keep partitions with positive bump over ancestors) is a ranking heuristic, not a conserved decomposition. Turning that rhetoric into an actual budget requires a Möbius/Shapley-style decomposition on the partition lattice (or a PID-style treatment). The authors gesture to that future work, but until someone delivers it you don’t have a pie that sums correctly—only a list of interesting slices.

There are a few smaller but still important caveats worth calling out. Normalizing by log2(n) at each scale biases comparisons across different alphabet sizes. The focus on one-step mutual information is easy to game: you can make one-step transitions look deterministic even while wrecking multi-step structure. And “engineering emergence” here is mostly constructive toy work—designing TPMs so the Information-Maximizer picks a chosen level—rather than an end-to-end control recipe for real, physical systems where intervention feasibility and energy costs matter.

That said, don’t throw the baby out with the bathwater. As tooling, this is useful: it’s a convenient, visual way to scout candidate macroscales and to spot where predictability concentrates. Use it as Step 0 in a responsible pipeline: run the partition search to propose candidates; then gate them with (a) lumpability or causal-abstraction checks, (b) intervention-feasibility (pushforward) and energy/precision costs, (c) a conserved apportionment (Möbius/Shapley or PID) if you want to talk budgets, and (d) out-of-sample prediction/control tests reported per bit and per joule. If the selected macroscales survive those gates and improve held-out performance under realistic constraints, you have something substantive. If not, you’ve got a pretty visualization and useful intuition, not a new theory of agency or emergence.

Expand full comment
Erik Hoel's avatar

First of all, thank you for some very detailed critiques. They show an understanding of what’s going on under the hood, so I truly appreciate the thought that went into this.

However, there are strong ripostes to all the central critiques, and the less central critiques turn out to basically just be future research directions we explicitly mentioned.

> "To be explicit and concrete: the formula you give for the score—“determinism + specificity − 1,” with determinism = 1 − H(E|C)/log2(n) and specificity = H(E)/log2(n)—is algebraically just mutual information, CP = [H(E) − H(E|C)]/log2(n) = I(C;E)/log2(n). The method is InfoMax over partitions: search the partition (coarse-graining) lattice of a transition matrix and keep those lumpings that most increase how much the “do(C)” tells you about the next state."

The measure we use is related to the MI, but three things to keep in mind: (a) it’s normalized by log2(n), (b) MI is usually based on the observed distribution when calculated for system dynamics, this is an unusual interpretation based on an intervention distribution, and (c) you can get similar results with just the sufficiency and necessity! I'm not sure this is really a critique at all, in fact: one could equally reframe this to “It’s cool this research shows how MI (under some assumptions / normalization) has such a close relationship to measures of causation!”

Perhaps most substantially as an objection to this critique: what we are doing is objectively not InfoMaxing. Where is the multiscale structure in that scenario? We’d just be finding the scale with maximal CP. And we're not doing that. So I think this is not much of an “objection from triviality" in the end.

> "That leads to three specific reasons the rhetoric should be toned down. First, the “macro > micro” effect you highlight often comes from changing the intervention prior. In the paper the authors evaluate the micro channel under one convention (e.g., uniform micro do’s) but re-uniformize at each macro level. If instead the micro is allowed its best input (channel capacity) or the macro interventions are required to be the pushforward of micro interventions, then by the data-processing inequality no deterministic downstream summary can beat the micro. Put simply: macro looks stronger mainly when you compare different channels or give the macro a better prior."

Let’s imagine doing as you say, and maximizing the channel capacity down at the microscale. This ends up right back at something just like CE 1.0.

To show this, let’s imagine a particular case, wherein reaching max capacity involves a large amount of “drop out” wherein we remove many of the microstates from P(C). What do we call that? A macroscale! It’s purposefully doing a dimension reduction on interventions. It’s equivalent to just a priori not considering microscale counterfactuals (or it’s equivalent to putting them in groups of different probabilities). It would be just defining a macroscale causal model and not calling it that. For where does that maximizing structure in the intervention distribution come from at the microscale? Is it reflecting arbitrarily what the observer wants? No, it is a property of the channel itself, and matches stronger causes and effects of a macroscale you can define the channel as being. So let’s just call it that.

However, just as importantly, you write “often comes” and “mainly” because there are clear cases where a difference in intervention prior isn’t necessary (in the first CE 2.0 paper there’s such a case shown explicitly). So then this objection seems weak, being over degree of emergence, not the actual existence of it. How do you account for cases where that’s not necessary? Perhaps more generally: why should causal measures be bounded by the data processing inequality? Simply because one of the them has a nice mathematical relationship to a normalized MI following an intervention distribution? I don’t see how that logically follows. We could just use the sufficiency and necessity instead and get similar results, etc.

> "Second, the method does not enforce dynamical validity. A legitimate macro variable should carry its own lawful dynamics (exact or ε-approximate lumpability / probabilistic bisimulation). The search lets you pick groupings that make one-step prediction look prettier but may break multi-step predictability or the Markov property. That’s why the claim the squishing “preserves dynamics and the effects of interventions” reads as an assertion in the blog but is not established by the current implementation."

Checking for dynamical consistency (you call it “validity”) is discussed in the paper: it’s an option you can turn on or off in the code. The first CE 2.0 paper enforces it. We didn’t on this one because it’s more to explain and calculate and it didn’t seem to impact the results much. We’re certainly planning on adding in an SI part before submission.

> "Third, the bookkeeping is rhetorical, not accounting. The post talks about apportioning causation “as if we were cutting up a pie fairly,” but the paper’s Δ-rule (keep partitions with positive bump over ancestors) is a ranking heuristic, not a conserved decomposition. Turning that rhetoric into an actual budget requires a Möbius/Shapley-style decomposition on the partition lattice (or a PID-style treatment). The authors gesture to that future work, but until someone delivers it you don’t have a pie that sums correctly—only a list of interesting slices."

True about that sentence in this blog post (although one could say there is 'fairness' vs. 'perfect fairness'), but regarding the paper, this criticism is also equivalent to “This causal apportioning schema isn’t perfect for reasons you yourself point out as future research directions.” Moreover, I think in practice our apportioning method's lack of perfection likely doesn’t matter. In fact, I suspect that a conserved decomposition won’t lead to significantly different emergent hierarchies than the ones we find. Would top vs. bottom heavy ones flip? Probably not. Would the emergent complexity values even be substantially different? Probably not (remember, when those are calculated, the ΔCP values are themselves normalized anyways).

> "The focus on one-step mutual information is easy to game: you can make one-step transitions look deterministic even while wrecking multi-step structure."

This is basically the same point about dynamical consistency, but I will say that, in actuality, it’s harder to game in the way you're thinking, in that the deterministic ones are usually consistent.

> "And “engineering emergence” here is mostly constructive toy work—designing TPMs so the Information-Maximizer picks a chosen level—rather than an end-to-end control recipe for real, physical systems where intervention feasibility and energy costs matter."

Another fine criticism that's really more like a “what about future directions?” point. It’s a surprising fact at all that “balloons” are possible, especially given we are doing so many "does this reduce?" comparisons, and that alone would deserve its own paper far before something like what you’re talking about, and we can only put out one paper at a time.

Again though, I appreciate the depth of thought here.

Expand full comment
Joe Canimal's avatar

Thanks for the thoughtful response, and for putting out stimulating work. On “InfoMax”: I wasn’t saying you pick a single arg-max scale; I meant the objective you score at every partition is mutual information (up to normalization). With your definitions, CP = [H(E) − H(E|C)] / log2(n) = I(C;E)/log2(n). Normalizing and using a do-distribution specify how MI is computed; they don’t change that the score itself is MI. The multiscale story then ranks partitions by MI and keeps positive Δ’s.

On capacity, priors, and DPI: maximizing capacity at micro is a policy over existing micro interventions, not a new variable. Setting some actions to zero probability doesn’t coarse-grain; it just chooses how often you press which buttons. If the macro cause is a (possibly stochastic) function of the micro cause, then under the same pushed-forward policy H(E|macro) ≥ H(E|micro), hence I(macro;E) ≤ I(micro;E). So “macro > micro” requires either a policy change (e.g., re-uniformizing at each scale) or different actuators/channels. If you have a case with macro as a deterministic coarse-graining, macro interventions exactly the pushforward of the micro policy, same E, yet I(macro;E) > I(micro;E), I’d love to see it. Swapping CP for “sufficiency/necessity” doesn’t avoid this: they’re built from the same H(E) and H(E|C).

On dynamics: seems easy to clear up in future revisions.

On “apportioning”: I'll look forward to your future work. Perfectly fair response.

A single figure showing (i) macro interventions as pushforwards of the micro policy, (ii) the lumpability/causal-abstraction gate on versus off, and (iii) a Mobius-Shapley (or PID) decomposition alongside Δ-ranking would settle the open questions completely.

Thanks again! I'm eager for the follow-up papers.

Expand full comment
Erik Hoel's avatar

> “Thanks for the thoughtful response, and for putting out stimulating work.”

Well, thank you again for the thoughtful comments. Let me try to argue the case more, for I do think I understand your position.

> “On capacity, priors, and DPI: maximizing capacity at micro is a policy over existing micro interventions, not a new variable.”

What is a new variable other than a way to intervene on and observe the system? Yes, nothing springs into existence like magic. But what a “macroscale causal model” means is precisely “the dimension-reduced set of interventions and observations that give us a new set of variable’s counterfactuals and relationships and so determine the causal measures’ values (like the determinism/degeneracy or the sufficiency/necessity).”

> “Setting some actions to zero probability doesn’t coarse-grain; it just chooses how often you press which buttons.”

That example was chosen purposefully because, while it isn't a coarse-graining, it is a dimension-reduction and so is a macroscale. We call it “black boxing:” when you leave variables out of your causal model. So all that remains is a choice: keep calling a dimension-reduced intervention distribution and dimension-reduced observation function a microscale causal model… or just call it a macroscale causal model.

> “If the macro cause is a (possibly stochastic) function of the micro cause, then under the same pushed-forward policy H(E|macro) ≥ H(E|micro), hence I(macro;E) ≤ I(micro;E). So “macro > micro” requires either a policy change (e.g., re-uniformizing at each scale) or different actuators/channels. If you have a case with macro as a deterministic coarse-graining, macro interventions exactly the pushforward of the micro policy, same E, yet I(macro;E) > I(micro;E), I’d love to see it.”

Is there not such an example in Figure 4 of this paper first introducing the earlier single-path version of CE 2.0?

https://arxiv.org/pdf/2503.13395

Unless by “same E” you mean you coarse-grain the effects for both the microscale and the macroscale? In which case, again, why call this a microscale model? I think it’s very evocative/appropriate that the gains can actually come from coarse-graining (or more broadly, dimension-reducing) via grouping either interventions or effects. You can isolate the contribution where the coarse-graining of the effects is what matters by holding the interventions steady, as it were, and cases like Figure 4 are such an example. So it is not “this is just from changing an intervention distribution in a way that corresponds to some grouping” it is “this is from that and also grouping over an observation distribution” which just screams “okay, that’s a macroscale causal model” to me.

Expand full comment
Joe Canimal's avatar

Thanks for the thoughtful reply—and for pointing at Fig. 4. I think we’re mainly crossing wires on what’s being compared. With your own definitions, the score at any partition is mutual information up to a scale factor:

CP = [H(E) − H(E|C)] / log2(n) = I(C;E) / log2(n).

Under that identity, the Data-Processing Inequality matters only under a specific comparison: same physical channel (same effect E), and the macro variable M is a deterministic coarse-graining of C with the macro prior taken as the push-forward of the micro prior. In that DPI-fair case, I(M;E) ≤ I(C;E). If a macro’s CP looks higher there, the lift must come from normalization (the log2(n) denominator), not extra information.

On Fig. 4 specifically: if “macro” means grouping effects as well (E′ = g(E)) or re-uniformizing the intervention prior at each level, then you’ve designed a new channel. That can absolutely produce a “balloon” (and a higher normalized CP), but the improvement flows from the changed observation map and/or prior plus the smaller denominator—again, not surplus causal power over the same E. Read that way, Fig. 4 is a nice design-fair demo: careful choices of interventions and observation maps yield a clean macro on its own terms. If the stronger claim is intended—that a deterministic coarse-graining out-informs its microparent under the same pushed-forward policy and same E—the minimal way to make that precise (on the Fig. 4 TPM) would be to report raw I(C;E) at micro and raw I(M;E) at macro under that single policy and target. If the point is the engineering one, I’d just say so explicitly and note the role of normalization and channel choice.

For physical agency, two brief clarifiers would align the story with real systems. First, a sentence distinguishing DPI-fair (same pushed-forward policy, same E) from design-fair (new priors and/or new observation maps) comparisons so readers don’t hear “surplus causal power” when the channel has changed. Second, a nod to costs: re-choosing priors or sensors is control work with energy/latency budgets (policy resets, precision, KL/Landauer), so the natural target becomes efficiency (bits or controllability per joule), not raw CP under a freshly chosen channel. Framed that way, your partition search looks like a useful representation/visualization front-end; frameworks like the Free Energy Principle provide a priced, dynamics-respecting back-end. Thanks again for the engagement! I am looking forward to what you post next.

Expand full comment
Notaarguello's avatar

This reminds me to how Isaac Asimov “psycohistory” science from the Foundation triology works “only for massive populations and cannot predict the actions of individuals because of the noise of individual actions”. The man was truly a visionary…

Expand full comment
Erik Hoel's avatar

Great catch! If I'm remembering correctly, I use the same analogy in The World Behind the World too.

Expand full comment
Cristhian Ucedo's avatar

Isn't your concept of multiscale descriptions just fractality? People usually conflate fractality with its most famous form, self-similar fractality.

Expand full comment
Erik Hoel's avatar

So there's the broad multiscale description (the "un-hewn" set of partitions/scales) and then there's the emergent hierarchy (the "hewn" set of remaining partitions/scales that do causally contribute). Which do you mean here?

I think the closest thing to self-similar fractality would be the case of "literal scale-freeness."

Expand full comment
Cristhian Ucedo's avatar

I mean the first one. But i foolishly wrote this comment before reading all your entry. It's great! Sorry and thanks.

Expand full comment
Erik Hoel's avatar

Good instincts though!

Expand full comment
Gradatim's avatar

The degeneracy is trickier to understand but, in principle, quite similar. Degeneracy would be maximal (= 1) if all the states deterministically led to just one state (i.e., all causes always had the same effect in the system). If every cause led deterministically to a unique effect, then degeneracy would be 0.

i have hard time to understand how degeneracy can be either 1 or 0 for the same situation ? did i miss something ?

Expand full comment
Erik Hoel's avatar

Ah, yes, the phrasing there is confusing. Technically correct but, on a re-read, confusing (I'll edit to add a bit of context).

So all causes -> [one effect] with p = 1 would mean that degeneracy = 1. Think of it like a die roll but all the sides are painted with the same number.

If all causes -> [a different effect for each] with p = 1, then degeneracy = 0. So think of it like there are as many effects as there are causes, and each cause "owns" a unique effect.

The inverse of degeneracy is called "specificity," and is more intuitive, in that a high specificity means that each cause has a specific effect.

Expand full comment
Gradatim's avatar

thanks for the clarification, and i <3 you !!! you made my day and even more

Expand full comment
Shreyal Gupta's avatar

This is brilliant work! I love your style of explanation too.

Expand full comment
The European Polymath's avatar

Very interesting, curious how it links to micro scale of a 'quantum' sorts that could introduce more randomness but let's say give some room for free will

Expand full comment
RDM's avatar

Best thing I read all night.

Can the raw lattice be re-cast into regions of activity (or impact, or connectedness) other than simply by partition size?

I am looking for a way to stuff the idea of Markov blanket into this construct of yours, and that would involve routine chaining of distinct nodes (or groups of nodes) and nesting of smaller partitions within larger partitions, or a combination of both...

Vague as hell, I know, sorry, will go back and re-read and ponder. Am looking for a way to characterize dynamics on these 'nets'....

And as if that weren't vague enough, a research question: Is there any way to characterize how sensitive (or stable) these mesoscale sweet spots are with respect to the addition/subtraction of a single node?

Absolutely fascinating stuff. Thanks for your work and the time to post....

Expand full comment
Erik Hoel's avatar

Ty!

> Is there any way to characterize how sensitive (or stable) these mesoscale sweet spots are with respect to the addition/subtraction of a single node?

Hmm, I think you could do a sort of robustness analysis. But it depends on what "a single node" means: is it a single micro-node (in which case, the macroscale would be very robust) or a single macro-node (in which case, not so much, probably).

Expand full comment
Iuval Clejan's avatar

Cool! I hope you win the Turing prize for this.

1. Practically, you can compute dCP for a current level node in a bottom up fashion, and only need to look at significant dCP lower level nodes, not all of them at the lower level (and certainly not 2 or more levels below), right?

2. Is there any "physical" isomorphism to the brain, in going from a network of neurons' adjacency matrix to a TPM? Is this part of how the brain does classification?

3. Is there any physical isomorphism to the brain, in going from a lower level of coarse graining to a higher level and picking out the highest dCP nodes?

4.Is there any physical meaning to the edges between highest dCP nodes between levels?

Expand full comment
Erik Hoel's avatar

Kind words, ty!

1. Yes, you can just look at all the paths to the current node.

2. I'm not sure exactly what an isomorphism would imply here, but I would say is I would be very unsurprised if there aren't strong relationships to things like classification and causal emergence.

3. I would say possible maybe concept formation, for the same thing as (2) (or maybe it could be worked out with Neural Darwinism).

4. There's "physical meaning" to every edge in that it represents some refinement, i.e., it represents the spatiotemporal structure becoming coarser (as you go "up" the edge) or finer (as you go "down.") But I don't know of any specific physical meaning other than that.

Expand full comment
Niles Loughlin's avatar

Really fascinating work Erik! It’s a shame this wasn’t published a few months earlier; I wrote an essay for the Berggruen Institute competition that you got tapped by and shared the information for here on your blog that draws on your Causal Emergence 1.0 paper with effective information. The basic premise for it argues for using a philosophical framework rooted in dialectical materialism to describe consciousness as an emergent process, and it relates the sort of scientific work in causal emergence you’ve done to that. It’s not published obviously, as I know there are restrictions for the competition, but if you were interested in any sort of philosophical work that draws on causal emergence to support its framework I’d be happy to share the material with you! I think what your work here is arguing for still supports what I wrote about, if not having potential to enhance my argument further.

Expand full comment