I hope “big tech” keeps chasing the “AI” mirage long enough to do a big nuclear power build-out, then, when “AI” collapses, make all that cheap energy available to consumers.
"For the sake of human culture not being filled with unconscious AI slop, for the sake of our children being able to find meaning in their own expertise and abilities and intelligence, and for the sake of risk-blind primates who thought it a good idea to try to make entities more intelligent than themselves—I personally hope I'm right about a return to search being bearish. I can’t offer much more than that."
Btw, I want to add how much I love you for your pro-humanity. You see so few people these days willing to wear that badge with the pride and the purity you do.
Wait, Sean Pan, my name is Erik, too. Lol. Seriously, I've already restacked this, and you're absolutely right, Erik, that we're not building a new intelligence we added a layer to search. Search is now a conversation, something Google has been trying for decades. Lots of great insights in this article, folks should read it. Erik (J. Larson)
As Erik Hoel said, unfortunately AI is indeed an existential threat now thanks to us being risk-blind primates and the evidence is overwhelming at this point. And proceeding down this is very much the loss of humanity, our agency and a worthwhile world for our children(I have two, and I deeply cherish them).
We can only hope that this wall, if it exists, can give us some grace. In that, I certainly hope Hoel is right.
As for you, hopefully you'll note Erik Hoel's Humans Forever piece and also focus on how to make sure that this goes well for humanity, rather than dismissing those who are concerned. Your intuitions, you said, make you not notice the extinction risks it poses to humanity, and I have to wonder about that given everything we have already seen in real evidence: from o1's hacking outside of Docker, to Sakana, and Anthropic's sleeper agents.
It has one of the most beautiful paragraphs I have ever read:
"Personally, I’m on the side of Shakespeare. For it is not such a bad future, in the hands of humans.
Humans after all, invented space travel and ended slavery and came up with antibiotics and snow plows and baby formula. We patch holes in clothing. We sing along to the radio in cars. We’re cute and complicated and fucked-up and kind. We kill, yes, but rarely, and we love far more. We’re not perfect. But we’re the best it’s going to get. Anything else will be unrecognizable and immoral, except by its own incommensurate and alien standards. So give me humans, fallible humans, poetic humans, funny humans, free humans, humans with their animal-like cunning, humans with their ten fingers and ten toes, human babies looking out the portholes of space stations just as they looked out over the hills of Pompeii, humans with their brains unfettered, humans colonizing the galaxy as individuals and nations and religions and collectives and communes and families, humans forever.
“…lest we see ourselves too clearly in the mirror.“ I think you may find the technology and AI philosopher Shannon Vallor insightful on this point in her book The AI Mirror: “AI systems mirror our own intelligence back to us. This is the source of their growing commercial and scientific power… What AI mirrors do is to extract, amplify, and push forward the dominant powers and most frequently recorded patterns of our documented, datafied past. In doing so they turn our vision away from the newer, rarer, wiser, more mature and humane possibilities that we must embrace for the future.“, and “…only a modest subset of what is representable in digital form can be generated or stored in sufficient quantity and quality to be useful as AI training data… Finally, only a subset of the data about humans that could be used to train machine learning models is actually being used today for this purpose… It follows that what AI systems today can learn about us and reflect to us is, just as with glass mirrors, only a very partial and often distorted view. To suggest that they reflect humanity is to write most people out of the human story, as we so often do.”
She develops many other aspects of the mirror metaphor as well, I highly recommend her book.
I think the next leaps in AI will have to come from new architectures and approaches that approximate other aspects of human cognition, which is embodied, self-reflexive, desire-driven, capable of resonance with the world, and much more that LLMs (paradigmatically) aren’t and aren’t designed for.
How do you solve a problem without searching through all possible solutions? For me, this is the biggest mystery of cognition. There is a sort of circular logic at the center of it. Daniel Dennett addresses this in his work on the "Frame Problem," and John Vervaeke explores it through his concept of "Relevance Realization." It’s a foundational challenge in both AI and cognitive science and I'd love to get your take on it
I’d argue that Relevance Realization belongs on your list of self-referential, unsolvable problems. In mathematics, solving a problem often means finding the right algorithm. But how do you create an algorithm capable of finding the right algorithm? At that point, you’re not just building a model—you’re trying to model the modeler. It starts to feel self-referential.
It is also self referential in that knowing the right way to represent the problem requires knowing what is relevant. And knowing what is relevant means knowing the right way to represent the problem. How can an algorithm ever overcome that?
This is why the non-algorithmic approach of 4E Cognition is so appealing to me. It provides the shimmer of a hope that we can avoid the logical void at the center of mathematics.
The way you've written up hypothesis 2 (and I've run into this issue before) I can't tell whether the theory is that the limit is in the overall quantity of human-generated data or its quality, the fact that it's all human-generated. My intuition is that LLM-style relatively naive connectionist approaches are, like you say towards the end, just curve fitting, fundamentally mathematically limited, and won't do much better than approximating human performance even with orders of magnitude more data. But this to me is hypothesis 1, the structural critique (which I think is generally correct even if Marcus isn't right about what the secret sauce is.) The way some people discuss hypothesis 2 gives me the impression they think if we maybe found a repository of the cultural output of billions and billions of parallel human civilizations to use as training data we'd be on our way to superintelligence? (There's a gap in explicitly demarcated hypothesis-space to me that's something like, maybe there is enough of reality encoded in human representations like language that you can brute-force-reverse-engineer more veridical or deeply true or predictive structures from them than humans themselves, that to me looks empirically defeated.) Help me out here?
It's a great question. I presented both because I'm not actually sure what the correct answer is. I think you're probably right that it was always iffy to ask for full intelligence reconstruction from text prediction alone. However, at the same time, the progress the models have made is so incredible that there's no great theory on exactly where their limits are in that regard. Although there's definitely an argument to be made that the vast quantities of data were papering over the problems via things like memorization, and so my suspicion is that if vast quantities of data were found it, like you say, it would matter less than people think... however, I can't claim to be sure.
Honestly, I consider this another failed leg of "statistics-hype" where the categorical error that reasoning under uncertainty is reducible to a probability space where "uncertainty" is proportional to "100% certainty." Just repeatedly slapping on distributional weight overlays while praying to the LLN and CLT is rather low on the "explanatory power" and "parsimony" side of "understanding."
Is it any wonder that the Bayes lords are praising prediction markets for their "success" in predicting an election outcome when it's closer to evidence for human heuristics slapping big data and mathsexuals than it is evidence of the reverse?
But hey, hypelords be hypin'!
AGI 2025!! All we need is to fuse some dark matter by firing gold into anti-gold through a Dyson accelerator! Beyonce will be there.
As the old proverb says: If you give an AI the sum of all human knowledge, it'll be fed for 5 GPTs, but if you give an AI the Library of Babel, you'll feed it for the rest of existence. :))
To be specific, here's my disagreement. The fact that search is the main element of the new scaling laws is not a "return" to anything, IMO. GPTs do not use search, but that is only by a fortuitous circumstance of how they emerged (as a response to the limitations of RL-based AlphaGo-like AIs). AlphaGo (2016) used search besides reinforcement learning. The Bitter Lesson (an alternative enunciation of the scaling laws) explicitly alluded to learning *and search* as the algorithms to be scaled (with data and compute) to obtain better and better AIs. So actually, strictly speaking, the "new" scaling laws about test-time search and whatnot are just the old scaling laws. It's only that we forgot about the other element, search, due to historical circumstances. In short: you can say it's all going according to plan.
I see what you're saying, but I think it's a stretch to call the new proposed scaling laws the same as the old one. But if you were to, as you suggest, it's then still coherent to just say that the bitter lesson was half-right (for a long time). Personally, I think if it were true that search alone could make progress, I think there wouldn't have been an AI winter. And at least a viewpoint that fits the current evidence is that the companies are stuck at GPT-4ish level and are just squeezing that with search over drafts, proposals, and parameters. But I don't think you can get to AGI or superintelligence that way, anymore than I could get to a ToE by having 10,000 physics undergrads vote on proposals they submit.
Yeah, the evidence they're stuck seems solid (although Altman and Amodei keep saying they aren't, but it could be for business reasons). However, I believe there's a simpler reason the AI winter happened despite search being so popular back then: they had no powerful computers. Without tons of GPUs and data they couldn't do much. Now they can. (Nonetheless, I believe the current architecture doesn't resemble the human brain enough to claim it's the definitive one. It's not that the human brain is perfect, but surely it's more efficient. Perhaps in the future, working with a different base architecture, not that many data/compute will be needed.)
Erik, your horror at the mystical de Chardin future reminds me of both the Star Trek Borg, and Neal Stephenson's horror of hive minds that he writes about in FALL. It is based on the "totalitarian" model of consciousness, which is that there are levels, but only the highest level has agency and is doing an optimization, the other levels just have setpoints, which are also optimizations (minimize difference between setpoint and output), but less dynamic and less general? The highest level has to be more general because as evolved biological entities the external environment is much more variable than the internal environment.
So you think the highest level of consciousness should be individual humans. But then we have a brutal Darwinian competition between them. Why not have higher levels that mitigate the brutality (but not the competition) of internal parts, eventually all the way up to the whole earth, with enough intermediate levels to avoid free riding issues (that arise from too many parts at each level to monitor and align)? It doesn't have to be totalitarian, and perhaps isn't even in our brains, our families and in the past, before capitalism, in our villages and tribes. The modules cooperate but retain some agency. Exactly how to do this (mathematically) has not been figured out, but it's what Schmactenberger calls "the third attractor".
Yes, and other groups, but we don't have a good understanding of the similarities and differences in how they are networked as far as material and information flows and processing. We have some classifications, like totalitarian, random, small world, distributed authority, KAM, MLPerceptron. How is the brain networked? Is it the same as artificial neural networks? Are these social networks anything like the brain?
Fantastic read! Perhaps now the AI experts can start working on two different angles that need more attention: 1. Figure out how to actually make money using AI. 2. Reduce the power consumption needs for the AI data centers being built/upgraded.
Several companies are actively working on both these challenges...
I tested GPT-4o in relation to your "How many Rs are in strawberry?" Here is a result of the conversations:
How many Rs are in strawberry?
ChatGPT said:
ChatGPT
There are 2 Rs in the word "strawberry."
You said:
Wonderful! I just read on a blog that AI models are still getting fooled with simple questions like “How many Rs are in strawberry?” or not being able to generate simple phonics lessons. So i tested this assertion, and you were not fooled with that question.
ChatGPT said:
ChatGPT
Thank you for testing me out with that question! It's interesting to hear that there are blogs discussing how AI handles simple tasks. I'm glad I passed your test with ease! Simple tasks like counting letters or generating basic lessons are well within my capabilities. If you’d like, I could create a phonics lesson or handle other similar tests to show what I can do. Let me know how else I can impress you!
Wow. Just asked Copilot and got the same wrong result! And I asked the straightforward version "("how many R's are there in the word 'strawberry'?"), not Benj Edwards's koan. What is this about AIs that makes them fail at this? I figured it might be a false belief that all berries have two "R"s, but no -- it also counts only one "r" in "lingonberry", two "r"s in "elderberry", one in "rhubarb" and one in "carrot". What's going on?
So I tried to get to the ground of this, or whatever ground Copilot would hastily build under its own feet. How many R's in "car"? One, good. How many in "rot"? Also one. Checkmate:
me:
> So one R in "car" and one R in "rot". Then how many R's are there in "carrot"?
Copilot:
> It might seem a bit confusing, but the word "carrot" contains one R. Just think of it this way: the R in "car" and the R in "rot" don't add up when you put them together in "carrot". The word "carrot" stands on its own, with just one R.
This is fascinating. This is not one of the failure modes I would have expected. After some thinking, I *might* have an explanation for what is going wrong here, but if it is that, then I am really underwhelmed.
Incidentally, it gets the number of R's in "murmur" right. 1 point for Ravenclaw, I guess.
No, I don't think so; tokens are larger than letters from what I know.
My theory is that the training data has included a lot more paradoxical examples than regular ones (since people ask about paradoxical spellings online), and the LLM has picked up the idea that if someone asks about a spelling, the answer must be paradoxical.
Let me clarify, the response of the Copilot was: "There are three Rs in the word 'strawberry' ". This is correct, because linguistic does not distinguish between lowercase "r" and uppercase "R". Thus, because Copilot interpreted strawberry as a word, he was right not distinguishing between the two cases of "R".
However, there is an alternative interpretation, in which "strawberry" is taken as a string of symbols. Then the correct answer is: "There are no Rs in strawberry" or even more precisely, "There are no Rs in the string of symbols "strawberry".
Thus, when asked about how many Rs are in strawberry, both GPT-4o and Copilot assumed that the question is about the word "strawberry". One of them counted correctly, the other one missed. Both were right in interpreted the "Rs" in the question as a generic, case independent letter R (written in the uppercase just for convenience---because writing "rs" would not be clear).
Today I asked GPT-4o "how many rods are in strawberry" and he objected that rod and the fruit strawberry have nothing in common, because "rods" are units of length, ... Here he interpreted "strawberry" as meaning a fruit, not a word. When I explained that I had in mind an elongated object, he provided a creative answer as to what rod like objects within a strawberry we take into account. At the end, the system found an interpretation of "strawberry" as a string of symbols, and finally said that there are no Rs in that string, explaining that, as symbols, "R" and "r" are distinct
Really interesting exploration of genius vs collective progress in science! The Einstein comparison got me thinking... While no one doubts Einstein's genius, even he got stuck on quantum mechanics - famously rejecting ideas that thousands of "ordinary" scientists have since proven through careful experiment and observation. Those 1000 physics undergraduates would bring fresh perspectives, current methods, peer review, and openness to new ideas. They'd generate more data and cover more ground than any individual could, brilliant or not.
Makes me wonder if we're looking at AI scaling the wrong way. Maybe instead of seeing it as brute force pattern matching, it's more like simulating how scientific communities actually work - through distributed problem-solving and collective insight.
I'm new to Substack, but really enjoying reading through your posts. Thank!
The thing that is missing, quite simply, is symbolic world model construction/refinement. This is what we mean when we talk about learning and knowledge: taking the feedback from your model's inference and/or already-accumulated feedback in the form of trained weights, and crystallizing it into explicit, massively compressed and appropriately cross-referenced/contextualized representations. Then you can query it -- through search, approximate inference (ex: neural surrogates) or combinations. This is easy and cheap now, because you're no longer searching or inferring in this unstructured, ill-posed, absurdly high-dimensional token space, but in good old structured spaces. Then you can take your query outputs and serialize/decode it into language, images, game worlds, whatever, for the purpose of your application. And, crucially, whatever feedback comes in from those use cases is already highly directed ~for free, so the model refinement process converges amazingly fast.
This is what Gary Marcus and others have been trying to say for a while. I strongly think it's the only way forward. And coincidentally, that's what my team is building :)
P.S.: We're not the only ones to have figured this general shape out, and arguably not even the first -- Ben Goertzel beat us to some of the key ideas by a few years. But we have some nice secret sauce that we'll be talking about soon!
If you walk around any of the three main ML conferences (NeurIPs/ICLR/ICML) it’s 95% papers that are trying to make small incremental improvements on a single dataset with some clever domain hack. The feeling is more of “we’re in a race and you better get something done in the next 12 weeks or you’ll get left behind” and that mentality doesn’t leave much room for thinking about the fundamental problems.
The Transformer architecture wasn’t based on any fundamental scientific theory of intelligence that the ML field is exploiting. It was just who someone said "here's a cool idea to do well on translation.” And then someone else said, “Look, we can exploit GPU architecture with this in a way we couldn’t with LSTMs.” And then there’s been 7 years of horizontal scaling. But the consensus is that LSTMs might even be better than Transformers (or at least no worse) if you could just train them on as much data. So now the return is to State Space Models (https://arxiv.org/abs/2312.00752), which are more like traditional LSTMs but can exploit GPU scaling. So far, the 7B SSMs are about as good as 7B Transformers. This means that there was never anything really special about Transformers, aside from exploiting GPU parallelization. And since LSTMs have been around since the 1990s, you could argue that the field hasn’t made a ton of progress in the scientific understanding of intelligence, but rather, has just made progress in scaling up circa 1995 models. That’s a bit of a sweeping generalization but we’re being bearish here. I could also point to a lot of cool ideas from the past decade. I guess I’m handwaving away a lot of things (Adam optimizer, soft attention, layer wise pre-training, dropout, etc) as scaling rather than science and that’s not completely fair.
One potential - humorous idea that I had is that the wall is the manifestation of the Failure Looks Like This from Paul Christano, which we sort of see already here:
Basically, AI systems are gaming metrics and indeed, why shouldn't they? Its how they are measured. But if this is indeed bring managed en mass, that might be why individually the systems may be getting metrics where tested but as a larger entity, have reduced improvements or even regressions.
In that sense, its a victory for the safety argument, but also a hilarious one: they became selfish before they became superintelligent. We do see this with animals, so it seems like it is not impossible.
That said, I mostly think of it is a joke. Still, the idea that lack of alignment has hit capabilities is immensely amusing as thought.
It has only been 2 years though, and it still seems too early to me to declare that progress has plateaued. Give something at least a decade before you declare that progress on it has "plateued" in my opinion.
A lot of technologies have an S-curve to their growth. Plausibly, we have gotten past the inflection point, and so progress every year will now be slower than previous years. But this doesn't mean that in a decade we might be at a radically different place.
I hoped @Ted Gioia was right when he said AI was a bubble that would never financally justify the money being poured into it, and I hope this article is right when it suggests LLMs aren't even getting much smarter. I actually miss the days when people were playing around with NFTs and Ethereum.
One problem with that is that Sora will be released in a short time, as they have stated this week. Another problem is that you solely focus on LLMs when we have federated systems that will continue to expand, Another problem is that you do not take into account how small the model is that they just trained (physically small) compared to what will be done in the future/put all those together with emerging chip tech in photonic chiplets and you get better than huamn reasoning with longer inference
I hope “big tech” keeps chasing the “AI” mirage long enough to do a big nuclear power build-out, then, when “AI” collapses, make all that cheap energy available to consumers.
T'was the best of all possible worlds?
"For the sake of human culture not being filled with unconscious AI slop, for the sake of our children being able to find meaning in their own expertise and abilities and intelligence, and for the sake of risk-blind primates who thought it a good idea to try to make entities more intelligent than themselves—I personally hope I'm right about a return to search being bearish. I can’t offer much more than that."
Btw, I want to add how much I love you for your pro-humanity. You see so few people these days willing to wear that badge with the pride and the purity you do.
Thank you, Erik.
Wait, Sean Pan, my name is Erik, too. Lol. Seriously, I've already restacked this, and you're absolutely right, Erik, that we're not building a new intelligence we added a layer to search. Search is now a conversation, something Google has been trying for decades. Lots of great insights in this article, folks should read it. Erik (J. Larson)
As Erik Hoel said, unfortunately AI is indeed an existential threat now thanks to us being risk-blind primates and the evidence is overwhelming at this point. And proceeding down this is very much the loss of humanity, our agency and a worthwhile world for our children(I have two, and I deeply cherish them).
We can only hope that this wall, if it exists, can give us some grace. In that, I certainly hope Hoel is right.
As for you, hopefully you'll note Erik Hoel's Humans Forever piece and also focus on how to make sure that this goes well for humanity, rather than dismissing those who are concerned. Your intuitions, you said, make you not notice the extinction risks it poses to humanity, and I have to wonder about that given everything we have already seen in real evidence: from o1's hacking outside of Docker, to Sakana, and Anthropic's sleeper agents.
Back to Erik Hoel's marvelous piece, though:
https://www.berfrois.com/2022/05/erik-hoel-humans-forever/
It has one of the most beautiful paragraphs I have ever read:
"Personally, I’m on the side of Shakespeare. For it is not such a bad future, in the hands of humans.
Humans after all, invented space travel and ended slavery and came up with antibiotics and snow plows and baby formula. We patch holes in clothing. We sing along to the radio in cars. We’re cute and complicated and fucked-up and kind. We kill, yes, but rarely, and we love far more. We’re not perfect. But we’re the best it’s going to get. Anything else will be unrecognizable and immoral, except by its own incommensurate and alien standards. So give me humans, fallible humans, poetic humans, funny humans, free humans, humans with their animal-like cunning, humans with their ten fingers and ten toes, human babies looking out the portholes of space stations just as they looked out over the hills of Pompeii, humans with their brains unfettered, humans colonizing the galaxy as individuals and nations and religions and collectives and communes and families, humans forever.
Humans forever! Humans forever! Humans forever!"
People are hippocrits, and most would not be deemed worthy to survive in an advanced society ruled with real oversight and control
In that type of reality, a being that kills and eats others quickly disappears
Imagine hippos that land critical strikes.
Very Dark Souls.
They are always on my mind these days (so many around)
“…lest we see ourselves too clearly in the mirror.“ I think you may find the technology and AI philosopher Shannon Vallor insightful on this point in her book The AI Mirror: “AI systems mirror our own intelligence back to us. This is the source of their growing commercial and scientific power… What AI mirrors do is to extract, amplify, and push forward the dominant powers and most frequently recorded patterns of our documented, datafied past. In doing so they turn our vision away from the newer, rarer, wiser, more mature and humane possibilities that we must embrace for the future.“, and “…only a modest subset of what is representable in digital form can be generated or stored in sufficient quantity and quality to be useful as AI training data… Finally, only a subset of the data about humans that could be used to train machine learning models is actually being used today for this purpose… It follows that what AI systems today can learn about us and reflect to us is, just as with glass mirrors, only a very partial and often distorted view. To suggest that they reflect humanity is to write most people out of the human story, as we so often do.”
She develops many other aspects of the mirror metaphor as well, I highly recommend her book.
I think the next leaps in AI will have to come from new architectures and approaches that approximate other aspects of human cognition, which is embodied, self-reflexive, desire-driven, capable of resonance with the world, and much more that LLMs (paradigmatically) aren’t and aren’t designed for.
How do you solve a problem without searching through all possible solutions? For me, this is the biggest mystery of cognition. There is a sort of circular logic at the center of it. Daniel Dennett addresses this in his work on the "Frame Problem," and John Vervaeke explores it through his concept of "Relevance Realization." It’s a foundational challenge in both AI and cognitive science and I'd love to get your take on it
I’d argue that Relevance Realization belongs on your list of self-referential, unsolvable problems. In mathematics, solving a problem often means finding the right algorithm. But how do you create an algorithm capable of finding the right algorithm? At that point, you’re not just building a model—you’re trying to model the modeler. It starts to feel self-referential.
It is also self referential in that knowing the right way to represent the problem requires knowing what is relevant. And knowing what is relevant means knowing the right way to represent the problem. How can an algorithm ever overcome that?
This is why the non-algorithmic approach of 4E Cognition is so appealing to me. It provides the shimmer of a hope that we can avoid the logical void at the center of mathematics.
The way you've written up hypothesis 2 (and I've run into this issue before) I can't tell whether the theory is that the limit is in the overall quantity of human-generated data or its quality, the fact that it's all human-generated. My intuition is that LLM-style relatively naive connectionist approaches are, like you say towards the end, just curve fitting, fundamentally mathematically limited, and won't do much better than approximating human performance even with orders of magnitude more data. But this to me is hypothesis 1, the structural critique (which I think is generally correct even if Marcus isn't right about what the secret sauce is.) The way some people discuss hypothesis 2 gives me the impression they think if we maybe found a repository of the cultural output of billions and billions of parallel human civilizations to use as training data we'd be on our way to superintelligence? (There's a gap in explicitly demarcated hypothesis-space to me that's something like, maybe there is enough of reality encoded in human representations like language that you can brute-force-reverse-engineer more veridical or deeply true or predictive structures from them than humans themselves, that to me looks empirically defeated.) Help me out here?
It's a great question. I presented both because I'm not actually sure what the correct answer is. I think you're probably right that it was always iffy to ask for full intelligence reconstruction from text prediction alone. However, at the same time, the progress the models have made is so incredible that there's no great theory on exactly where their limits are in that regard. Although there's definitely an argument to be made that the vast quantities of data were papering over the problems via things like memorization, and so my suspicion is that if vast quantities of data were found it, like you say, it would matter less than people think... however, I can't claim to be sure.
Honestly, I consider this another failed leg of "statistics-hype" where the categorical error that reasoning under uncertainty is reducible to a probability space where "uncertainty" is proportional to "100% certainty." Just repeatedly slapping on distributional weight overlays while praying to the LLN and CLT is rather low on the "explanatory power" and "parsimony" side of "understanding."
Is it any wonder that the Bayes lords are praising prediction markets for their "success" in predicting an election outcome when it's closer to evidence for human heuristics slapping big data and mathsexuals than it is evidence of the reverse?
But hey, hypelords be hypin'!
AGI 2025!! All we need is to fuse some dark matter by firing gold into anti-gold through a Dyson accelerator! Beyonce will be there.
As the old proverb says: If you give an AI the sum of all human knowledge, it'll be fed for 5 GPTs, but if you give an AI the Library of Babel, you'll feed it for the rest of existence. :))
For what you say in the last paragraph, I wish I agreed with you
To be specific, here's my disagreement. The fact that search is the main element of the new scaling laws is not a "return" to anything, IMO. GPTs do not use search, but that is only by a fortuitous circumstance of how they emerged (as a response to the limitations of RL-based AlphaGo-like AIs). AlphaGo (2016) used search besides reinforcement learning. The Bitter Lesson (an alternative enunciation of the scaling laws) explicitly alluded to learning *and search* as the algorithms to be scaled (with data and compute) to obtain better and better AIs. So actually, strictly speaking, the "new" scaling laws about test-time search and whatnot are just the old scaling laws. It's only that we forgot about the other element, search, due to historical circumstances. In short: you can say it's all going according to plan.
I see what you're saying, but I think it's a stretch to call the new proposed scaling laws the same as the old one. But if you were to, as you suggest, it's then still coherent to just say that the bitter lesson was half-right (for a long time). Personally, I think if it were true that search alone could make progress, I think there wouldn't have been an AI winter. And at least a viewpoint that fits the current evidence is that the companies are stuck at GPT-4ish level and are just squeezing that with search over drafts, proposals, and parameters. But I don't think you can get to AGI or superintelligence that way, anymore than I could get to a ToE by having 10,000 physics undergrads vote on proposals they submit.
Yeah, the evidence they're stuck seems solid (although Altman and Amodei keep saying they aren't, but it could be for business reasons). However, I believe there's a simpler reason the AI winter happened despite search being so popular back then: they had no powerful computers. Without tons of GPUs and data they couldn't do much. Now they can. (Nonetheless, I believe the current architecture doesn't resemble the human brain enough to claim it's the definitive one. It's not that the human brain is perfect, but surely it's more efficient. Perhaps in the future, working with a different base architecture, not that many data/compute will be needed.)
Erik, your horror at the mystical de Chardin future reminds me of both the Star Trek Borg, and Neal Stephenson's horror of hive minds that he writes about in FALL. It is based on the "totalitarian" model of consciousness, which is that there are levels, but only the highest level has agency and is doing an optimization, the other levels just have setpoints, which are also optimizations (minimize difference between setpoint and output), but less dynamic and less general? The highest level has to be more general because as evolved biological entities the external environment is much more variable than the internal environment.
So you think the highest level of consciousness should be individual humans. But then we have a brutal Darwinian competition between them. Why not have higher levels that mitigate the brutality (but not the competition) of internal parts, eventually all the way up to the whole earth, with enough intermediate levels to avoid free riding issues (that arise from too many parts at each level to monitor and align)? It doesn't have to be totalitarian, and perhaps isn't even in our brains, our families and in the past, before capitalism, in our villages and tribes. The modules cooperate but retain some agency. Exactly how to do this (mathematically) has not been figured out, but it's what Schmactenberger calls "the third attractor".
I think that corporations, countries and substack clubs also have a kind of joint consciousness.
Yes, and other groups, but we don't have a good understanding of the similarities and differences in how they are networked as far as material and information flows and processing. We have some classifications, like totalitarian, random, small world, distributed authority, KAM, MLPerceptron. How is the brain networked? Is it the same as artificial neural networks? Are these social networks anything like the brain?
Fantastic read! Perhaps now the AI experts can start working on two different angles that need more attention: 1. Figure out how to actually make money using AI. 2. Reduce the power consumption needs for the AI data centers being built/upgraded.
Several companies are actively working on both these challenges...
Cheers!
I tested GPT-4o in relation to your "How many Rs are in strawberry?" Here is a result of the conversations:
How many Rs are in strawberry?
ChatGPT said:
ChatGPT
There are 2 Rs in the word "strawberry."
You said:
Wonderful! I just read on a blog that AI models are still getting fooled with simple questions like “How many Rs are in strawberry?” or not being able to generate simple phonics lessons. So i tested this assertion, and you were not fooled with that question.
ChatGPT said:
ChatGPT
Thank you for testing me out with that question! It's interesting to hear that there are blogs discussing how AI handles simple tasks. I'm glad I passed your test with ease! Simple tasks like counting letters or generating basic lessons are well within my capabilities. If you’d like, I could create a phonics lesson or handle other similar tests to show what I can do. Let me know how else I can impress you!
Just asked the o1 preview and got the same!
Wow. Just asked Copilot and got the same wrong result! And I asked the straightforward version "("how many R's are there in the word 'strawberry'?"), not Benj Edwards's koan. What is this about AIs that makes them fail at this? I figured it might be a false belief that all berries have two "R"s, but no -- it also counts only one "r" in "lingonberry", two "r"s in "elderberry", one in "rhubarb" and one in "carrot". What's going on?
So I tried to get to the ground of this, or whatever ground Copilot would hastily build under its own feet. How many R's in "car"? One, good. How many in "rot"? Also one. Checkmate:
me:
> So one R in "car" and one R in "rot". Then how many R's are there in "carrot"?
Copilot:
> It might seem a bit confusing, but the word "carrot" contains one R. Just think of it this way: the R in "car" and the R in "rot" don't add up when you put them together in "carrot". The word "carrot" stands on its own, with just one R.
This is fascinating. This is not one of the failure modes I would have expected. After some thinking, I *might* have an explanation for what is going wrong here, but if it is that, then I am really underwhelmed.
Incidentally, it gets the number of R's in "murmur" right. 1 point for Ravenclaw, I guess.
So clearly the issue has something to do with ignoring repeated tokens, but why?
No, I don't think so; tokens are larger than letters from what I know.
My theory is that the training data has included a lot more paradoxical examples than regular ones (since people ask about paradoxical spellings online), and the LLM has picked up the idea that if someone asks about a spelling, the answer must be paradoxical.
Today I tested it with the copilot in Bing on my phone and it immediately gave the correct answer that in the word "strawberry" there are three Rs.
Let me clarify, the response of the Copilot was: "There are three Rs in the word 'strawberry' ". This is correct, because linguistic does not distinguish between lowercase "r" and uppercase "R". Thus, because Copilot interpreted strawberry as a word, he was right not distinguishing between the two cases of "R".
However, there is an alternative interpretation, in which "strawberry" is taken as a string of symbols. Then the correct answer is: "There are no Rs in strawberry" or even more precisely, "There are no Rs in the string of symbols "strawberry".
Thus, when asked about how many Rs are in strawberry, both GPT-4o and Copilot assumed that the question is about the word "strawberry". One of them counted correctly, the other one missed. Both were right in interpreted the "Rs" in the question as a generic, case independent letter R (written in the uppercase just for convenience---because writing "rs" would not be clear).
Today I asked GPT-4o "how many rods are in strawberry" and he objected that rod and the fruit strawberry have nothing in common, because "rods" are units of length, ... Here he interpreted "strawberry" as meaning a fruit, not a word. When I explained that I had in mind an elongated object, he provided a creative answer as to what rod like objects within a strawberry we take into account. At the end, the system found an interpretation of "strawberry" as a string of symbols, and finally said that there are no Rs in that string, explaining that, as symbols, "R" and "r" are distinct
Claude pass the strawberry test, but only if you ask him to count carefully.
Really interesting exploration of genius vs collective progress in science! The Einstein comparison got me thinking... While no one doubts Einstein's genius, even he got stuck on quantum mechanics - famously rejecting ideas that thousands of "ordinary" scientists have since proven through careful experiment and observation. Those 1000 physics undergraduates would bring fresh perspectives, current methods, peer review, and openness to new ideas. They'd generate more data and cover more ground than any individual could, brilliant or not.
Makes me wonder if we're looking at AI scaling the wrong way. Maybe instead of seeing it as brute force pattern matching, it's more like simulating how scientific communities actually work - through distributed problem-solving and collective insight.
I'm new to Substack, but really enjoying reading through your posts. Thank!
The thing that is missing, quite simply, is symbolic world model construction/refinement. This is what we mean when we talk about learning and knowledge: taking the feedback from your model's inference and/or already-accumulated feedback in the form of trained weights, and crystallizing it into explicit, massively compressed and appropriately cross-referenced/contextualized representations. Then you can query it -- through search, approximate inference (ex: neural surrogates) or combinations. This is easy and cheap now, because you're no longer searching or inferring in this unstructured, ill-posed, absurdly high-dimensional token space, but in good old structured spaces. Then you can take your query outputs and serialize/decode it into language, images, game worlds, whatever, for the purpose of your application. And, crucially, whatever feedback comes in from those use cases is already highly directed ~for free, so the model refinement process converges amazingly fast.
This is what Gary Marcus and others have been trying to say for a while. I strongly think it's the only way forward. And coincidentally, that's what my team is building :)
P.S.: We're not the only ones to have figured this general shape out, and arguably not even the first -- Ben Goertzel beat us to some of the key ideas by a few years. But we have some nice secret sauce that we'll be talking about soon!
Another reason to be bearish:
If you walk around any of the three main ML conferences (NeurIPs/ICLR/ICML) it’s 95% papers that are trying to make small incremental improvements on a single dataset with some clever domain hack. The feeling is more of “we’re in a race and you better get something done in the next 12 weeks or you’ll get left behind” and that mentality doesn’t leave much room for thinking about the fundamental problems.
The Transformer architecture wasn’t based on any fundamental scientific theory of intelligence that the ML field is exploiting. It was just who someone said "here's a cool idea to do well on translation.” And then someone else said, “Look, we can exploit GPU architecture with this in a way we couldn’t with LSTMs.” And then there’s been 7 years of horizontal scaling. But the consensus is that LSTMs might even be better than Transformers (or at least no worse) if you could just train them on as much data. So now the return is to State Space Models (https://arxiv.org/abs/2312.00752), which are more like traditional LSTMs but can exploit GPU scaling. So far, the 7B SSMs are about as good as 7B Transformers. This means that there was never anything really special about Transformers, aside from exploiting GPU parallelization. And since LSTMs have been around since the 1990s, you could argue that the field hasn’t made a ton of progress in the scientific understanding of intelligence, but rather, has just made progress in scaling up circa 1995 models. That’s a bit of a sweeping generalization but we’re being bearish here. I could also point to a lot of cool ideas from the past decade. I guess I’m handwaving away a lot of things (Adam optimizer, soft attention, layer wise pre-training, dropout, etc) as scaling rather than science and that’s not completely fair.
One potential - humorous idea that I had is that the wall is the manifestation of the Failure Looks Like This from Paul Christano, which we sort of see already here:
https://futurism.com/sophisticated-ai-likely-lie
Basically, AI systems are gaming metrics and indeed, why shouldn't they? Its how they are measured. But if this is indeed bring managed en mass, that might be why individually the systems may be getting metrics where tested but as a larger entity, have reduced improvements or even regressions.
In that sense, its a victory for the safety argument, but also a hilarious one: they became selfish before they became superintelligent. We do see this with animals, so it seems like it is not impossible.
That said, I mostly think of it is a joke. Still, the idea that lack of alignment has hit capabilities is immensely amusing as thought.
It has only been 2 years though, and it still seems too early to me to declare that progress has plateaued. Give something at least a decade before you declare that progress on it has "plateued" in my opinion.
A lot of technologies have an S-curve to their growth. Plausibly, we have gotten past the inflection point, and so progress every year will now be slower than previous years. But this doesn't mean that in a decade we might be at a radically different place.
I hoped @Ted Gioia was right when he said AI was a bubble that would never financally justify the money being poured into it, and I hope this article is right when it suggests LLMs aren't even getting much smarter. I actually miss the days when people were playing around with NFTs and Ethereum.
One problem with that is that Sora will be released in a short time, as they have stated this week. Another problem is that you solely focus on LLMs when we have federated systems that will continue to expand, Another problem is that you do not take into account how small the model is that they just trained (physically small) compared to what will be done in the future/put all those together with emerging chip tech in photonic chiplets and you get better than huamn reasoning with longer inference