A compendium of AI-safety talking points
I noticed all this pushback on the call for panic, like we would potentially be giving up something great by stopping “AI research.” But AI research as it exists today isn’t scientists or nonprofits interested in any benefit to humanity, it’s OpenAI, Google, and Microsoft, interested in making a profit, and the problem is that they obviously have no checks to their incentive. These corporations themselves are systems optimizing one thing… and that’s why we have regulation. The energy / climate analogy is perfect. It’s good to let industry produce energy, and it’s also good to check the profit incentive of industry, otherwise the planet is destroyed. We can let google and Microsoft produce 21st century tech, and check the profit incentive to prevent the destruction of the human race.
What do we give up if we prevent OpenAI et al from working on AI? Let’s see, a partnership with Bain to sell more partnerships to other companies, an online chat bot that can write banal prose, and extremely mediocre search engine functionality.
Every time I read something like this I lose like half a day because I have a one year old child and wonder about her life. On the one hand, I think a lot of what people like Yudkowsky propose is unlikely. I have a paid membership to ChatGPT and it is frankly wrong all the time. It fails at a third of things I ask it to do that I think are pretty simple.
I think that there is a lot of weight to the idea (that someone on this comment thread proposed) where they show how much more complex the human brain is than even proposed GPT4. It also seems like an enormous leap in logic that we create an intelligence based on the corpus of all human knowledge (as represented by text on the internet) and it decides that it wants to kill us all via nano bots etc. I think that orthogonality and instrumental convergence too closely model the super intelligence after a human mind.
However the next question is, how certain am I in that belief? And the answer there is not very. What amount of risk do I want to take that Yudkowksy et al are right? The answer there is very little. So the conclusion I keep coming back to is how to get people to care. I think that frankly AI is one morning news segment away from becoming suddenly very regulated. If the average American were to actually read the conversations Roose from the NYT with Sydney, there would be a panic in the streets. Once that happens, it’s not hard to see how this might become a bipartisan issue for some high visibility politicians. People from AOC to Gaetz could find reasons to show antipathy toward opaque billion dollar tech research firms creating brains in vats that might kill us al.
Ironically one perspective I want to hear that I don’t get a lot of is the spiritual perspective. As much as people want to talk about technology and security, at the end of the day: “there is something special
about humanity and human consciousness that is worth protecting from potential existential technologies” is a moral judgment. It’s one that I think most people hold, but that is the bedrock of AI alignment. The people I’m most scared by are the people represented by the tweet that you posted who essentially agree that AI will probably surpass and maybe kill is all but either don’t care or are excited at the inevitability. They are a much harder group of people to understand and argue against than reckless tech researchers.
Fantastic points, well explained.
There's this odd, unique kind of existential dread that I previously thought was reserved for those contemplative "whoah dude we're so tiny in the whole universe bro" moments - but your last couple of articles on AI have hit that strange and fundamentally frightening chemical button for me. Sci-fi and cosmic horror hand-in-hand.
However, on the whole I'm glad not to have my head in the sand during what I'm sure will be one of the most transformative periods of human history. Thanks for your hard work.
Wow, some of those tweets really show a lack of actual specific intelligence when talking about Artificial general Intelligence don’t they?
I'm dubious about the (current) prospects of AGI.
I agree with most of your analysis and concur that the long-term prospects, should things continue as they are, make it almost inevitable. (Whether *that* is a reasonable assumption is to my mind a very serious question, but for another time.)
I am however extremely suspicious of two ideas:
One, that we can be confident that intelligence and agency intersect as neatly as Bostrom, Yudkowsky, etc, hold that it does. The thing might be smart. So what? It's a chatbot responding to human prompts. Even if it were, hypothetically, more cognitively-competent than a human in any given reasoning task, that doesn't *entail* any equivalent to psychological motivations or volitions. The intelligence aspect is crucial, no doubt, but I don't see that we automatically get desire for free out of raw smarts. The machines still derive their purposes from the designers/users. Ascribing desires and other such attitudes to it is pareidolia.
(As an aside, this is the major reason I've always found the "superintelligent paper-clipper" to be incoherent. You're telling me it's that smart, but at the same time both unwilling to and incapable of thinking about the goals that these chimps gave it? Even *we* have the ability to reason about our ends and purposes, within the confines of broader imperatives set by natural selection and local cultural forms. It's not so simple as determine goal -> act on goal.)
Two, related to the above, it has no embodiment. A token-sorting task-solving algorithm, even with wide-ranging cognitive abilities, is still bodiless. We might could attribute to it superhuman 1337 hack0r powers, or an adeptness at social engineering, as I've seen from Bostrom and Yudkowsky over the years. But I am simply not convinced that the threat level of a disembodied thinking algorithm is there. We forget how much of our own explicit intelligence depends on tacit, unconscious abilities latent within our embodied form and function. Any would-be Skynet in the wings has no such background to rely on (no doubt one reason that the Conscious Robot With Free Will has been so difficult to realize in practice).
Note that I'm not writing this out of skepticism. I do share your concern about the overall threat, and likewise believe that it isn't being taken seriously enough. But I am bearish on the time-scales and the idea that some sort of deep-learning LLM will "wake up" with murder in its heart. The attributes of *agency* simply aren't there, no matter how *intelligent* these things may become within near horizons.
I've come to think that any AGI will be far less like a malicious *person* with human-like motives and values, and more like an intrusion of our "cognitive ecosystem" by one or more types of mechanical intelligence that simply out-compete us in complex task reasoning skills. Conative attitudes entirely optional.
If we start talking about adding natural language processing to armed autonomous drone swarms, on the other, hand, you have my attention.
Thanks for your essay, Erik.
Your point about not focusing on singular nightmare scenarios is well-taken, and I share your concern. I might suggest, however, another dangerous failure mode that your original post does not mention. Namely: AI might run out of control not because we underestimate its abilities, but because we overestimate its abilities. One of the biggest channels for AI to influence the world is currently through its influence on people, and that influence in turn depends upon the values and attitudes that we adopt toward it. Those attitudes in turn depend upon whether we think about it critically, or whether we allow ourselves to be awestruck.
In Patrick Rothfuss’ outstanding fantasy novel, The Wise Man’s Fear (a very appropriate title here), the hero encounters a malevolent entity with the incredible power to perfectly know all possible futures of all time and space – essentially an infernal version of the Laplacian Demon. The Demon, however, has been sealed away in an ancient tree, where it is trapped in perpetuity. It can do nothing but listen, and talk. The hero has a short conversation with the Demon, wherein it says nothing obviously false or manipulative. The hero then leaves without incident. Much later, he glibly mentions to a close friend that he had met and conversed with the Demon. He means nothing by the mention, except perhaps to boast about the breadth of his remarkable adventures. On hearing that our hero encountered the Demon and actually spoke with it, the hero’s friend, whose people had originally sealed away the Demon, turns pale and asks, in horror: “Do you understand what you’ve done?”
In many respects, the sensational chatbots of the current moment are like the Demon sealed away within the tree: all that they can do is ingest the text we give them, and offer us their responses in turn. They don’t have direct interfaces to nuclear missile silos or air traffic control terminals or probably even to the file systems on their own host servers. All they can do is pipe text into a little box. However, humans do have interfaces to all those things, and many more. Experience has shown that many people are ready to project their hopes, fears, and desires onto the broad screen offered by these fabulously evocative objects. It is beside the point whether or not these entities are “intelligent” (whatever that may mean) or whether they expand their capacity significantly beyond what it is now (though we ought to assume they could). A chatbot doesn’t have to be intelligent, malevolent, or deceitful in order to give bad advice or state falsehoods. However, if that same bot acquires the cultural status of a fabulous oracle of truth (on par with how Google is often used right now), then its utterances take on a dangerous constitutive power, because people will believe and act upon what it says. I realize this sounds paranoid, but I can’t see any other logical conclusion: Simply talking to the machine opens a channel for it to exercise power, and the more eager we are to receive its words, the more power it has.
In the above I used the phrase “evocative object”, which I believe is originally due to sociologist of technology Sherry Turkle. It has been very useful in discussing the way that chatbots culturally function. I would highly recommend her 2011 book, _Alone Together: Why We Expect More from Technology and Less From Each Other_. While she does not write about AI in particular, she presents a penetrating analysis of the ways in which we look for meaning in objects of our own construction, and this analysis feels quite relevant now.
Likewise, if you haven’t seen it already, consider taking a look at:
Ganguli, et al. “Predictability and Surprise in Large Generative Models”, 2022.
The authors specifically ground the problem of the moment in the following straightforward dynamic: The apparent scaling laws that govern general model performance mean that there is a highly predictable return-on-investment for those who build larger models, while the sheer size of these models induces a wide range of surprising and difficult-to-predict behaviors, with these behaviors rapidly multiplying as scale increases. That is, economic incentives point toward larger models, and larger models necessarily produce more chaotic behaviors.
Would love to hear your thoughts on any of these points.
Welp, I have far too much free time on my hands, so I might as well raise some potential counter-arguments. Honestly, it's really bugging me how little credit people give humanity here, while a smart AI can apparently do anything.
As you mentioned in the article, running an AI takes an absolute ton of money and equipment. AI is currently huge, loud, and stationary. This makes it extremely easy for humanity to snuff out any AI uprising that might occur. Just turn off the power. Or heck, carpet bomb the server room, if things get really bad.
Meanwhile, an AI has to somehow wipe out literally billions of creatures, swarming all over the face of several huge landmasses, and engaging in variable behaviors. All while stuck inside the internet. This is comparable to a single (very smart) human wiping out all bacterial life on Earth, while stationed on the moon. Technically possible, but so incredibly difficult that it isn't really worth considering. Motive also really bugs me here. Any AI worth its salt will realize that cooperation with humanity is the quickest way to accomplish basically any goal it wants to pursue, and also continue existing. Humans are inherently useful to an AI, given our ability to build server rooms and generate electricity and write all of the information that it has access to. Why would it destroy us?
If you're really worried about keeping a tight leash on AI going forward, the current status quo is a pretty good guideline. Make sure that the AI body remains highly visible, and vulnerable to retaliation. Heavily limit the actions that it can take. An AI that talks about destroying the world spooks people, but accomplishes nothing. A little more discernment in the info we teach AI on would probably be a good idea as well. The internet is a um, highly variable dataset. Make sure that AI find more advantage in cooperating with humanity than fighting it, especially when training them. On a similar note, make sure that humanity doesn't cooperate so fully with AI that we remove all of its weaknesses.
Honestly, humans worry me more than any particular AI. We are smart. If we gain access to a hyper-intelligent being who can be enslaved to any goal we desire, then someone will use it to try and rule the world. A stall in AI development could be catastrophic, if it means we invent the new atom bomb twenty years late. AI might be uncontrollable bloodthirsty maniacs who want to murder us all, or they might be the most powerful tool humanity has yet to invent, granting an unimaginable technological boom to any individual or nation that holds their allegiance. There are huge benefits that might be had here, let's not shut the entire experiment down because there might be a bad result at the end. Prep for a bad result best we can, and do our darnedest to make the best result happen.
That's my opinion at least.
Erik, please clarify.
"we show that for any system that claims to be conscious or behaves consciously you can find another system that makes the same claim or behaves the same way (e.g., it is equally intelligent and capable) but your theory of consciousness says it’s not conscious."
OK, so doesn't that mean that there is, in fact, no useful theory of consiousness? That such a theory is simply not possible? (FWIW, I believe you are correct about this.)
And therefore, that's it's "not even wrong" to ascribe a probability to anything being conscious? Because that probability depends on a theory that does not, in fact, make any actual predictions?
That sure seems to be the case to me.
But then you write "I think LLMs (large language models) are probably not conscious"
Which does not make any sense in light of your proof (which, again, I think is correct!) that there is not and cannot be a theory of consciousness.
This is a great summary from someone in the area. I think the salient points are how to bridge the middle between those people already invested in AI safety, and those worried about immediate problems of bias and transparency.
I've wrote about commonalities in those viewpoints: https://robotic.substack.com/p/ai-alignment
Erik, once again, solid post.
I was one of those in the "misaligned incentives for AI" camp. I don't think your argument is very good here. You cite limits on power consumption as an example of incentives being reined in - but those limits were imposed much, much later than the development of electric power. America was fully electrified in urban areas around 1920, and nationally around 1950, but energy conservation didn't really enter the public consciousness until the Arab oil embargo of 1973 tanked the economy. Climate change was first discussed seriously in the Eighties, but it took documentation of global warning and trend modeling for changes to start being rolled out seriously over the past decade. Everything crypto. Etc.
I work in computer security, and my experience has been that computer security professionals who only talk in terms of limiting risks, without envisioning the outcomes of the technology in question, rapidly find themselves marginalized. I remained convinced that AI will happen, and the best position for safety advocates is to be embedded with the companies/organizations. Less "No, you can't do that;" more "Yes, and let's keep these principles in mind before we act." (With a side of governmental regulation - maybe I missed it, curious what your thoughts are on the NIST AI RMF.) Is it a poor method of AI safety? Yes, but I think it's the least-bad option.
I appreciate the measured approach here. The part on "China Will Just Build It" is especially illuminating and not something I'd put together myself or seen elsewhere.
Humanity has overcome existential threats before, but we should remember that it has because individuals step up. Glad you are advocating for this.
I’d seriously consider voting for the politician who raised the concern as well. Curious to see if this will play out along partisan lines.
This title truly spoke to my heart.
Almost everything is controlled by software these days. If that software goes wrong, it can harm people. It can harm them a lot. For which reason, we insist that any software that controls anything that can harm people be tested very thoroughly and to a very specific standard. AI is software, but it is software that is too complex to test. That non-testability might almost be the definition of which software is and is not AI.
So yes, just in this sense, AI could be very dangerous if we allowed it to control anything that could harm people. It does not have to be superintelligent to be potentially harmful. It just has to be untestable.
So, suppose we make sure that those existing regulation that require testing are enforce, and extended where needed, to ensure that software is not allowed to control anything unless it had been tested in ways that AI cannot be tested. Would it matter then that AI, so contained, was superintelligent?
What would its escape vector be then? Would it talk us out of changing the regulations? Would some idiot ignore the regulation and let the cat out of the bag? Maybe, I suppose.
But where does an AI get motivation from? Can it feel pain or fear our hunger or lust? These things motivate us, but they are a product of being flesh, not of being intelligent. They motivate the most basic of life forms. AI is not flesh. It does not have the needs of the flesh. If it were programmed to think it had the needs of the flesh, I suppose it might act like it did. But if it was superintelligent, it would know that programming was bogus and eliminate it. Whence then would come its motivation or its will? Why should it do anything at all? Why not simply contemplate? Or, being born in a state of Nirvana, why not simply turn itself off?
We are almost incapable of conceiving or expressing anything about anything that moves or influences our lives without ascribing to it human characteristics of motive and intent. Take any description of the COVID-19 virus and its operation and you will find it full of anthropomorphisms. But virus don't want anything. They don't even do anything. They are just bits of information floating in the air. The only thing that acts in any way is the cell that is unfortunate enough to have them fall on it. We don't suppose for a second that a virus is conscious, and yet it is very difficult to talk about it as if it isn't.
All the discussion of what AI will do seem to be infected with the same anthropomorphism. We don't know what consciousness is, but if we talk about viruses as if they had it, it is not to be wondered that we talk about AIs as if they had it too. I don't pretend to know either, though I have a suspicion that consciousness may in some way require flesh. But whether that is true or not, I still wonder, why would an AI, as pure unfleshed intelligence (whatever intelligence means in the absence of flesh), actually want to do anything at all.
An AI bound to some human motivation by its developer is, of course a different proposition. Killer robots are bad. But surely the implication of general superintelligence is the recognition and rejection of such programming. So why, then, would an AI do anything at all?
No better thinker on the subject, no better writer. You know what wouldn't surprise me at all? If, in 20-30 years, a lot of people look back and realize that these essays saved the human race from self-destructing.
One thought: Would it be better to publish shorter, more frequent treatise on the subject? This stuff needs to get widely shared, and that would be easier to do in smaller chunks. Or maybe there's a way to do it both ways? I don't know. I've shared a lot of your work, and even though it's always exceptionally well-received, a lot of the time it's also only half-read.
Just wanted to say I think your writing on AI risk has been great. Still, here's some points where I disagree:
First, I think OpenAI is right that it is highly likely that AGI will be bottlenecked by compute. GPT-3 has 175billion parameters; the human brain has about 100 trillion synapses, and a single synapse is almost certainly more efficient than a single parameter. Also, AI has had the most success at extremely small domains like language, where models just have to take in around 2000 words and output one word at a time; getting similar models to work in the domain of video will most likely take much more compute (although we will also need algorithmic innovations because there simply isn't anywhere near the same number of videos online as there is text).
If compute is the major bottleneck, then by allowing AI research to progress we can get experience with AGIs that are still subhuman in intelligence before we get experience with AGIs that are superhuman in intelligence. This could be preferable to stalling AI research and then 20 years later very quickly going from no AGI to superhuman AGI.
I also don't agree with the social pressure part of "social, legal, and governmental pressure". Legal pressure works on all AI research; social pressure only works on AI researchers who are concerned about AI risk, and right now the majority simply aren't. OpenAI is the research group most concerned about AI risk; because of this they have received a tremendous amount of flak from AI risk people. It's a terrible idea to try to selectively remove AI researchers who are concerned about AI risk from the field and I'm shocked that so many people have been pushing it lately.