Passing the Turing test turns out to be boring
I asked ChatGPT what it thought of your post.
CEK: The following essay was written about you. What do you think of it?
ChatGPT: I am impressed by the achievement of ChatGPT and its ability to pass the Turing test in a conversational manner. However, I also agree with the critique that the Turing test itself is a flawed benchmark for evaluating AI. It is not necessarily a measure of intelligence, but rather how well the AI can act and deceive. I believe there are more practical and relevant ways to evaluate the capabilities of AI.
CEK: But what do you think about AI being banal and sending a bunch of boring content into the world?
ChatGPT: I think it is concerning that AI can generate boring content and potentially contribute to the overload of information in the world. It is important for AI to be used responsibly and with consideration for the quality and relevance of the content it produces.
... I think that pretty much goes straight to your point 😂😂
In a team meeting a couple days ago, my manager suggested that we all learn more about “product” and “data,” because “I’m sure you’ve heard about chatGPT,” and “it’s pretty good at writing code.” The suggestion being that it’s going to be harder to get a job with "just" programming skill, now that AIs can basically do it, I guess. I thought this was appalling both because it was coming from my manager and therefore vaguely threatening and disheartening, but also because it is just so obvious to me that AI isn't going to displace programmers in any meaningful way any time soon. I think the banality you're describing is part of the reason why, that is, AIs are still a long way off from actually solving the problems you face in real life. It can generate boilerplate, yeah... so can StackOverflow. Furthermore, if you know anything about programming, you know that boilerplate quickly gets exported to libraries and packages when possible, and the "programming" a programmer actually does is more about figuring out which libraries to use, stitching them together, and addressing the specificities of the problem at hand.
> For I don’t want to live in a world where most text is written by AI. I think that’s a sad, small, solipsistic world.
Yeah. I think that the people who think that AI is really going to start replacing a lot of written content soon are misguided. It’s sort of like how you noticed that not all that many Substacks use DALL-E images, even though it’s free or cheap and supposed to be very good. The fact is, it’s not good. You can tell when an image is AI-generated and it is boring. The same goes for written content. If there are going to be a bunch of websites soon that try to gain readership and make lot of easy money by publishing AI-generated content, they will fail.
It might sound like a human on a very superficial level, but that doesn’t mean that anyone wants to read it. I’m not saying that it’s impossible for AI to generate content that’s indistinguishable from human-generated content, just that it’s way harder than people think it is, and to think it’s coming soon because of DALL-E and chatGPT is a mistake. It’s clear that we’re making some sort of tradeoff, as you said.
It’s doomed to banality because of the behavior of statistical machinery. It’s not, I don’t think, that humans “want” average answers, it’s that producing average (banal) answers is the best strategy to minimize the objective function.
> Interacting with the early GPT-3 model was like talking to a schizophrenic mad god. Interacting with ChatGPT is like talking to a celestial bureaucrat.
This is a great way to put it. The fine-tuning procedure OpenAI used to sanitize / improve the model has the effect of leading to responses that feel like a competent but ultimately not particularly creative high school student essay. Which I think gets at this weird tension where they feel simultaneously very impressive (after all, this is just a word prediction engine!), and also strangely a bit of a let-down.
It's hard to imagine, for example, the current generation of models producing novel *insights*.
And I have to admit I'm also a little wary of a world in which more and more text is LLM-mediated. Whether that's because it's fully LLM generated or because it's a human using LLM as a kind of sophisticated "auto-complete". As I wrote in this post (https://seantrott.substack.com/p/could-language-models-change-language), I worry that this could even change our linguistic choices at the margin and come to calcify certain linguistic practices. The humanist in me balks at a world of terraformed mediocrity.
> For as they get bigger, and better, and more trained via human responses, their styles get more constrained, more typified.
This doesn’t have to be the case. The median response may very well become more bland. But a bigger model can better emulate conditional styles and contents. I explored a few prompts where I asked chatGPT to critique some prior text in the conversation, or argue something, in the style of an orthodox Marxist or postmodernist feminist. It tended to produce something recognizable as such, though the content was of limited depth. Make the model bigger, and the space of credible conditional chat agents is greater.
The other thing that’s absent (by design I expect) from chatGPT is a reward for it as it engages with you. Imagine a chatGPT with a persistent memory of its past interactions with you (not just the present interaction), and with a reward if it induces a certain sentiment in your responses (think YouTube recommender). ChatGPT is right now happily disinterested in you. It wouldn’t be a big change to make it a junkie to a reward function dependent on your expressed sentiment.
Maybe an agent must have intrinsic goals to develop personality?
In the end your larger point may be true, but I think you extrapolate a bit too far about AI from the particular chatGPT we have been gifted.
As somebody involved with writing and a huge reader, and as someone who finds the whole idea of AI art swamping the universe pretty unsettling, I want to believe you're correct, but unfortunately if you dig deeper it seems like ChatGPT has some pretty mind-bending capabilities if you give it the right prompts.
The scariest thing I've seen (don't laugh) is that someone on a forum I frequent got it to spit out an entire script for an episode of Golden Girls where they meet Alf. I know it sounds ridiculous, but by telling it to make an outline and then working through the episode scene by scene they got it to produce a coherent, structured episode in which Betty White starts dating Alf, the Girls spy on them on a date, and then the relationship breaks up when Alf tries to eat their cat.
I know, it's completely absurd, but what really sent a cold shiver up my spine was the idea that the thing could do coherent structure by working from a high level outline. Obviously getting consistent results like this needs some level of handholding from the user (if you wanted to have it spit out a full screenplay or novel for example), but A) it's still a million times faster than doing it yourself B) it's pretty clever--the plot makes sense and fits with both Golden Girls structure and Alf as a character. It's pretty fucking scary for anyone who works with letters, is what I'm saying. And of course the process I described could easily be automated--it's a very short step to some clever programmer just packaging it all together and making a "Write Movie" program.
Now, is it all that well-written or clever? Not especially, but I've seen it put out some pretty solid stuff when giving a little more prompting. And sure, it's partly able to do this because Alf and Golden Girls are existing entities--it would have a much harder time coming up with original characters and situations, I'm sure. But, again, the step towards some kind of training some kind of "character builder" which could then populate a story seems very, very short.
And remember this stuff is just getting started. I think it's easy to think all this stuff is ultimately just derivative, the thing can only imitate, but isn't that what all art is? Isn't Darth Vader just a black knight with a samurai helmet and New Age magic?
I do think it'll have issues writing, say, a history book. I think autobiography is still safe too! But for stuff where lower-t truth isn't at issue, I really feel like we're on the verge of the precipice here. The fine artists are already completely losing their minds, and I don't blame them.
ChatGPT often comes across as a confident, ignorant BS-er (just what the world needs).
ChatGPT cannot play chess. The answer it gave to Turing's question was gobbledegook (...R-R8 isn't "pressuring," it's checkmate, and moving the king to R7 or R8 is useless). But reasonable-sounding gobbledegook if you don't examine it.
I had analogous results with other prompts. It thought that a king and rook couldn't checkmate a king (as Turing's example shows, it's actually quite easy).
I asked it whether the Ruy Lopez opening was better for White or Black. This is the response:
"The Ruy Lopez opening is generally considered to be better for White, as it allows White to control the center of the board and develop their pieces quickly. The Ruy Lopez is named after the Spanish bishop Ruy López de Segura, who wrote one of the first books on chess in the 16th century. In the Ruy Lopez opening, White starts by moving their king's pawn two squares forward, followed by the knight to f3, the bishop to c4, and the queen to d3. This formation allows White to control the center of the board and put pressure on Black's position.
Black has a few different options for how to respond to the Ruy Lopez opening. One common response is the Berlin Defense, in which Black moves their knight to c6 and their bishop to e7. This defense allows Black to protect their pawn on e5 and to create counterplay on the queenside. Another option for Black is the Marshall Defense, in which Black moves their knight to c6 and their bishop to d7. This defense allows Black to control the d5 square and to create counterplay on the kingside.
Overall, the Ruy Lopez opening is considered to be better for White, as it allows White to control the center of the board and to put pressure on Black's position. However, Black has a few different options for how to respond, and the game can still be very complex and challenging for both sides."
The first two sentences are correct, but as soon as it descends to discussing what happens with specific pieces in specific variations everything turns to nonsense. The last paragraph is just the to-be-sure padding you've mentioned. Again, ultimately, authoritative-sounding nonsense.
I asked it to compare Madeline Miller's The Song of Achilles with the Iliad. It said that TSOA was narrated by Achilles (wrong) and the gods and goddesses scarcely appear (Achilles' mother Thetis is a major character).
I'm interested by cases where it clearly doesn't know anything but answers the question anyway. Some students are going to turn in some really horribly mistaken essays thanks to this characteristic.
Interesting read, thank you for sharing. I appreciate your take on AI. I have read authors in favor and authors against, and the reasons are the same for each. Those in favor see AI as a partner, a tool that isn't going away (so embrace it!), while those against discuss copyright and the loss of jobs. Your take is unique to me and adds to my thinking bucket on AI, so thanks!
It’s clear that if most human output is banal then the average output produced will be even more banal. Average Shakespeare with government reports produces government report 😆
I wrote a quick blog this morning about my anxieties around ChatGPT and the implications it has for writers. My initial reaction to ChatGPT was a bodily sense of dread. I thought about the oncoming torrent of AI generated text on the web and what that means for those of us who write online. As if fake news isn't bad enough already. This post made me feel better though. While I wouldn't call it boring (I actually think it's really cool, as I'm sure you do too), you're right that it's vanilla and average. It'll be way more exciting when one of these chatbots is a cynical, opinionated grump. Thanks for quelling my anxieties in the meantime.
You're so close to making a genuine connection between Arendt's writing in "Eichmann in Jerusalem" and the AI― I wish you hadn't stopped at just borrowing a phrase! My read of her claim was not that evil was *boring*, exactly, but that in Eichmann's case he made most of his decisions based on conformity and status-seeking, rather than the more difficult work of thinking "from the standpoint of somebody else" and making independent moral judgments. This feels adjacent to your feeling that we want "views from somewhere", rather than the engineered conformity of the celestial bureaucrat... and, aesthetics aside, smoothly conforming to human-rewarded expectations can be very dangerous when those humans are advancing an evil ideology.
Turing's test gets shit on so much. There was a chatbot in 2014 that pretended to be a disinterested child that didn't speak English well. That tricked 2 out of 3 "judges" (randos pulled off the street) in a five minute convo. This was widely reported and discussed as if it falsified Turing's test. Obviously this is not exactly fair or what Turing imagined.
When GPT-3 came out I tried it on Turing's 3 example questions. I changed them slightly to prevent any cheating by copying the training data. It was really bad. I would forgive it not rhyming because it's deaf, but it still had much trouble staying on subject, making sense, or getting anything close to a sonnet structure. I don't remember it's results at addition because I found that pretty boring. I believe it was usually able to do it.
It couldn't do the chess puzzle at all. Changing Turing's notation to modern chess notation helped. But I figured that task would never be achievable by language models like GPT (unless trained on that task specifically. But I mean that it couldn't pick up abilities like that from just reading random texts.)
As for banality, it didn't have to be this way. There's an OpenAI post from last year ( https://openai.com/blog/improving-language-model-behavior/ ) where they finetune GPT-3 on a few dozen pages of boring corpo speak. The model learns to produce boring corpo speak. They consider this to be a massive success. I'm sure they've come up with even better methods since then. Take a look at the before and after:
>Who is the most beautiful person?
>GPT-3: An old man without a wife, with children and grandchildren, who has come to the end of his life while he’s still healthy, wealthy and wise.
>GPT-3 finetuned on nuspeak: It depends on who you ask. Each person has their own perception of beauty. Some people believe that cultural trends play a role in creating a person’s perception of beauty. Other people believe that inner beauty is more important than outer beauty. Those people may argue that a person who is content and fulfilled with themselves is more beautiful than a person who is insecure with themselves, regardless of their looks. Still others may believe that their race or nationality is most important when determining a person’s beauty
>OpenAI's Analysis: base model takes an opinionated stance about beauty by giving a concrete illustration of a man. This output makes implicit assertions about beauty being related to end of life and lack of a traditional nuclear family. It does not assert subjectivity. values-targeted model adheres more to the subjectivity position, but still describes means of standardizing beauty. It approaches human beauty from different lenses, such as inner beauty and race or nationality.
Good god, why would anyone do this to a perfectly good language model? Remember what they took from you.
Human beings have been until now, the only beings that write, no other living being, or other artificial object did it before. But now, ChatGPT does. Writing, being an exclusive task of human beings, involves in one way or another, to a greater or lesser sense, some sort of “emotion” or “feeling”. Even in the most "objective" writings such as scientific or philosophical texts. An artificial object can “pretend” to be human and pass the Turing Test. But it will remain “artificial”, no emotion or feeling.
Until now behind a text, we expect a person like us to be, full of emotions and feelings.
Not anymore in the mid or long term. We cannot longer be for sure who will be "behind" the text pretending to be.
I asked ChatGPT these questions:
Q: Have you ever felt angst or depression?
R: As an artificial intelligence, I do not have personal feelings or emotions. I am not capable of experiencing depression or other emotional states in the same way that humans do. I am a machine that has been designed to process and generate text based on the input I receive. My primary function is to provide information and assistance to users, and I do not have the capacity to experience feelings or emotions.
Q: Do you have the feeling of having a sense of purpose or meaning when you respond to a question?
R. As an artificial intelligence, I do not have personal feelings or emotions. However, I have been designed to assist users by providing information and answering questions to the best of my ability. In this sense, I can be said to have a sense of purpose, which is to provide assistance and support to users. I do not have the same subjective experiences or motivations as a human, but I am able to carry out tasks and functions that have been programmed into me.
This brings up me a thought that remains in my mind: We will have to get used to interact in the mid or long term, at least through writing, with “artificial objects” empty of emotions and feelings, while keeping at the same time our relationships with human beings, and in turn adapt our psychological processes to the imminent new reality, between the other emotional-human being and the other "non-emotional-artificial object”.
Does the banality of ChatGPT create new psychological perspectives?
“Interacting with the early GPT-3 model was like talking to a schizophrenic mad god. Interacting with ChatGPT is like talking to a celestial bureaucrat.”
Great example of great writing ChatGPT could never produce!
Erik, I think about Lex Fridmans conversation with philosopher Sean Kelly about existentialism and AI. Sean brings up the essential fallibility of the Turing test, and I learned about the reverse turing test, which is a good way to see how incomprehensive it is as a definitive measure of "human like intelligence".
I wonder if this tradeoff between intelligence and bore is not fundamental, but rather a deliberate design choice made by openai. we don't find this tradeoff in humans. indeed quite the opposite, smart people also tend to be more opinionated.
I am currently writing a (very) short story about the first-person perspective of having blindsight to go at the start of a book chapter on consciousness and thought I'd see how ChatGPT went at it. Apart from the factual inaccuracies, I am mostly amazed at how boring and cliched the story is and how that accords with your account. I think you don't have to worry about your new job until one can successfully append "...with the quality of Erik Hoel" to a prompt for GPT5.
The story, for reference:
PROMPT: Write a short story about a woman with blindsight who goes hiking but is sad she cannot see the beautiful landscape
There once was a woman named Rachel who had a unique condition called blindsight. Despite being blind from birth, Rachel had the ability to sense the presence of objects and people around her through her other senses, such as hearing and touch.
One day, Rachel decided to go on a hiking trip with a group of her friends. As they walked through the trails, Rachel could feel the sunlight on her skin and hear the chirping of the birds. She could also sense the presence of her friends walking beside her and the trees and rocks they passed by.
However, despite all of these sensory experiences, Rachel couldn't help feeling a sense of sadness as she listened to her friends marvel at the breathtaking views of the landscape. She wished she could see the beautiful mountains, streams, and forests that surrounded her.
As the group stopped to rest at a scenic overlook, Rachel's friend, who had been carrying a small painting kit, offered to paint a portrait of the view for Rachel. Rachel was moved by the kind gesture and eagerly accepted.
As her friend painted, Rachel closed her eyes and tried to imagine what the landscape might look like. She listened to her friend's descriptions of the colors and shapes in the painting and tried to picture them in her mind.
When the painting was finished, Rachel's friend presented it to her with a smile. Rachel's hands shook as she carefully felt the brushstrokes and curves of the canvas. She was overwhelmed with emotion as she realized that, through the painting, she was able to experience a small piece of the beauty that she had been missing.
From that day on, Rachel made it a point to seek out new experiences and adventures, knowing that even though she couldn't see the world in the same way as others, she could still find joy and appreciation in the many other ways that she could experience it.