NYT vs. OpenAI lawsuit, why Hemingway drank, Substack's supposed Nazi problem

Desiderata #19: links and commentary

Erik Hoel

Dec 28, 2023

∙ Paid

The Desiderata series is a regular roundup of links and thoughts, as well as an open thread and ongoing AMA in the comments for paid subscribers (along with extra links).

1/10. Hope everyone’s holiday lull is going well. Since the last Desiderata, The Intrinsic Perspective published:

Excuse me, but the industries AI is disrupting are not lucrative. Gemini and the supply paradox of AI
(🔒) Screen time for toddlers. Disgust vs. technology
Your EXTREMELY ONLINE holiday gift guide. 2023's unfair snark-fest

2/10. Yesterday The New York Times went public with a bombshell lawsuit against OpenAI, accusing ChatGPT of regurgitating its training data when it autocompletes text—it is, in other words, a plagiarist. It turns out the archive of the Times is one of the major training sets for ChatGPT, and, as described by the Times itself:

The complaint cites several examples when a chatbot provided users with near-verbatim excerpts from Times articles... It asserts that OpenAI and Microsoft placed particular emphasis on the use of Times journalism in training their A.I. programs because of the perceived reliability and accuracy of the material.

This claim of plagiarism is the most powerful thing the Times has on its side, and why I expect this lawsuit to have legs. Here’s an example:

For years people have claimed that Large Language Models like ChatGPT don’t just memorize the training set. How could they, given how much data is in it? It’s supposed to be mathematically implausible. And yet, here are examples people have found from the latest version of Midjourney.

Is it possible that as more and more parameters are added, along with more and more training data, the networks do end up secretly memorizing far more than first appears? If so, many of the “Oh my god!” moments you can experience with these models become a bit more… suspicious.

Of course, one outcome is that people like me get their text ripped and fed into OpenAI for Microsoft’s profit, but then larger outlets with legal muscle can get lucrative fees just to be included. In fact, this may already be happening in undisclosed deals. Again according to the Times own reporting:

Some news outlets have already reached agreements for the use of their journalism: The Associated Press struck a licensing deal in July with OpenAI, and Axel Springer, the German publisher that owns Politico and Business Insider, did likewise this month. Terms for those deals were not disclosed.

3/10. Relatedly, someone on Reddit came across this prescient poem by Shel Silverstein that may as well have been about generative AI. I found the full version with the illustration and it’s top-form Shel.

Shel Silverstein's “Homework Machine” – The Douglas and Judith Krupp Library — source

I love Silverstein’s art—it’s so strange, so insularly weird. I find contemporary children books to be heavy on simple images composed of primitive shapes, characters constructed of perfect circles and lines digitally rendered. It’s all broad swaths of monochrome color; meanwhile, older children’s books are more ornate, rich in features and imagery. One of my favorite illustrators is Jan Brett, author of The Mitten and The Hat, in which the art drips with detail, a painting on each page, with all sorts of hidden activity to ferret out.

The Cookie Train: The Mitten by Jan Brett ( + Activity) — source

Why the change?

4/10. Even if illustrations for children changes with the times, the fundamental chord of our relationship to them doesn’t. This affecting image went viral on X about a 4,000-year-old find at a dig site:

It immediately reminded me of the poet Philip Larkins most famous line: “What will survive of us is love.”

While the original viral description is indeed poignant, it turns out to be incorrect. Genetic analysis reveals that the child in her arms was not her own. The closest she could be was an aunt, but maybe not even that. It’s possible they weren’t related at all, they were just together at the end. And I think that makes it even more heartbreaking, even more affecting, even more humanity at its finest; for here I am admiring a Bronze Age Chinese woman who did the right thing in her final moments, all those millennia ago.

5/10. Substack (the company itself) was recently under fire in The Atlantic for having a “Nazi problem,” and some writers on Substack started a petition asking Substack to significantly overhaul its moderation policies (however others, even some prominent left-wing writers, refused to sign).

In response to all this, Elle Griffin organized a counter-letter by writers defending Substack’s decision to put moderation in the hands of its writers and communities rather than the company. I was reached out to and asked if I was interested in signing that counter-letter. First, I wanted to do my homework. Is there actually a Nazi problem on Substack?

The Intrinsic Perspective

NYT vs. OpenAI lawsuit, why Hemingway drank, Substack's supposed Nazi problem

Desiderata #19: links and commentary

The Desiderata series is a regular roundup of links and thoughts, as well as an open thread and ongoing AMA in the comments for paid subscribers (along with extra links).

This post is for paid subscribers