Is 2024 the year of reckoning for academia?
Or, How Steven Pinker plagiarized and why it's fine
2023 was not a good year for the public opinion of elite universities. They were rife with scandals, from controversies around hate speech, issues around affirmative action, and multiple high-profile instances of academic misconduct. At this point, public opinion has cratered, especially for elite institutions. Harvard’s early admission applications was down 17% this year. Increasingly, our society appears to operate via waves of outrages, one coming and then cresting and then falling, only to be replaced by another. Often starting with noble intent, justice is just as often overshot. Perhaps 2024 is academia’s turn. There are drums. Drums in the deep.
To start with the triggering scandal: of the dozen or so papers of Harvard President, Claudine Gay, the majority of those turn out to contain, altogether, almost 50 alleged examples of plagiarism. Gay resigned yesterday afternoon, after Harvard weathered a month of news about it, even clearing her at one point with an internal investigation and threatening journalists with lawsuits.
The accusation of her plagiarism turned out out to be an unexpected “scissor issue” among some academics. The alleged examples include both (a) sentences where all Gay did was use the same stock phrase, not even the full sentence, while at the same time (b) some of these excerpts are quite long, and there’s also the sheer number of them. The scholars that Gray allegedly plagiarized were split. Some publicly condemned the acts. Others said it did not amount to plagiarism.
What ultimately brought down Gay, what made the whole thing undeniable, was the sheer number of examples (she resigned shortly after another six were found). Even if many were excusable or were only minor offenses when considered individually, collectively they showed a pattern of lazy (at best) or reckless scholarship, especially the cases involving near-full paragraphs, the copied sources of which are never cited anywhere (e.g., in the example below, Gay does not cite Palmquist and Voss in the text at all.)
It’s for these reasons that an anonymous member of Harvard’s Honor Council, which votes on instances of plagiarism by undergraduates (if found guilty, they can’t get a degree) wrote an op-ed a few days ago saying that Gay’s work would be found guilty by their standards.
Did conservative investigative reporters dig into her academic work in response to Gay’s testimony in congress about hate speech policies? Yes. Does that excuse so many examples across so many papers? No. Journalists are now asking for tips about plagiarism in the early work of elite college presidents, and the Associated Press’ headline declared that:
The thing is, putting aside the politics and optics of it all, if I am being honest, a few of the alleged examples from Gay are indeed debatable when considered individually. If you cite another scholar’s conclusion, or use a similar method of analysis as them, do you have a duty to completely rewrite how that conclusion or method is described? If using the same sentences is obviously bad, what about the same clauses? Same phrases about the same topics? Here’s one of the examples from yesterday’s most recent batch of allegations that concerns a phrase between Gay and another scholar, who she does cite in the paper (although not right at the end of this sentence).
At the end of an otherwise-rewritten sentence, the phrase reads like a pretty stock Wikipedia-esque description of the Tax Reform Act of 1986. By itself, it wouldn’t be national-news-worthy.
What made Gay’s case so problematic was the longer excerpts and their number. It’s going to be easy to forget the importance of that. Already, there is evidence that academic scandals are quick to trigger a lot of outrage, even when the person at the center is actually innocent. Consider the other big academic misconduct scandal of 2023, in which
Stanford’s president was booted without good reason.
Stanford’s President, neuroscientist Marc Tessier-Lavigne, stepped down from his position last summer after a panel investigated claims against five of his papers. The case was covered in The New York Times by the student journalist who popularized the fraud (although it had been discovered years ago on X). Buried in the Times piece we find that:
The Stanford investigation did not find that Dr. Tessier-Lavigne personally altered data or pasted pieces of experimental images together.
In other words, there was a known source to the errors from another scientist, and Tessier-Lavigne was just the co-author. According to Tessier-Lavigne on his blog, he tried to make corrections to some of the papers but the journals themselves ignored him. As he wrote:
Based on information available at the time, Cell declined to publish a correction, stating a correction was “not necessary or appropriate.” Science agreed to publish corrections that I submitted, but then failed to do so.
Perhaps he could have been more pushily proactive, but if the journal itself didn’t want a correction, how much can we blame an individual scientist for not producing one? Instead, the Times found another angle:
What isn’t common, of course, is the “frequency of manipulation of research data and/or substandard scientific practices” in the labs Dr. Tessier-Lavigne ran, the Stanford report concluded.
Yet, is that even true? What’s the base rate of scientific fraud? People who produce or do a lot of “X” are more likely to have done, sometimes even accidentally, “the bad thing in X.” E.g., people who meet a lot of other people, like politicians, are more likely to be accused of bad behavior. In some cases, it’s true, but in others, it’s just that if you work with hundreds of people, and meet thousands more, you are likely to annoy some small percentage, give them the wrong impression, say something they take extremely personally that another would pass over, have them bring a negative perception or narrative to an encounter, or just end up being a crazy person. Similarly, those who produce hundreds of scientific papers with dozens or hundreds of co-authors will, almost inevitably, turn up the bad apple of a fraudulent co-author (and it’s impossible to check the raw data and re-do the analysis all ourself; you simply have to trust your co-authors as a scientist). Here the Times contradicts itself:
A 2016 study by a handful of prominent research misconduct investigators… showed that around 3.8 percent of published studies include “problematic figures,” with at least half of those showing signs of “deliberate manipulation.”
Yet, if 3.8% of published studies include “problematic figures” with half showing deliberate manipulation, what is half of 3.8% of the 150 papers that Dr. Tessier-Lavigne has authored? The answer is roughly around three papers. Now, Tessier-Lavigne did have a total of five papers under question. But the source of three of those papers was a single person who (ahem, allegedly) fabricated data. Additionally, another individual fabricated data in a paper from 20 years ago (again, not Tessier-Lavigne). For the remaining fifth paper, the issue wasn’t data fabrication, it was just kind of a shitty paper, and instead of correcting it Tessier-Lavigne did follow-up work (essentially correcting it in further papers) in such a way that the Stanford report says:
The Panel also found that the approach of publishing follow-on papers was “within the boundaries of normal scientific practice,” but that it was “suboptimal” not to additionally retract or directly correct the paper.
Because none of it was directly Tessier-Lavigne’s fault, the criticism ultimately pivoted to the ephemeral, pinning the crime on “lab culture.”
… [the report] found that he had presided over a lab culture that “tended to reward the ‘winners’ (that is, postdocs who could generate favorable results) and marginalize or diminish the ‘losers’ (that is, postdocs who were unable or struggled to generate such data).”
I paused at this (especially given that many lab members, according to the same Stanford’s investigation itself, reported a very positive lab environment). It’s just an objective truth that a lab rewards the winners! And not just labs, the institutions as well. E.g., for my PhD program in neuroscience at the University of Wisconsin-Madison, you needed a minimum of three papers to graduate. So if you didn’t publish a paper, like if all your experiments failed to find anything interesting and you had nothing publishable, then guess what? You didn’t graduate.
Now, a big part of science is learning when to call off a project, or pivoting to some other idea. Maybe change the experimental design to pursue an effect you did see! There’s plenty of wiggle room. But you have to make use of it. The system is unequivocally set up at an institutional level to reward people who get high-profile papers with big results, and punish those who don’t. Unless we implement some sort of scientific communism, wherein everyone gets equal funding, and failing to get results is just as celebrated as success, by definition the winners of the scientific rat race will be those who get “favorable” results. Tessier-Lavigne stepped down (while still retaining his professorship and lab—as it should have been, given his contributions) based on an idealized version of science that no one actually abides by, for mistakes it’s unclear how he ever could have avoided. The whole investigation elided that
academic scandals are real, but also academics are humans!
Ultimately, smart people sometimes say dumb things or make mistakes, and it’s often forgivable when they do. E.g., I read a review in Current Affairs of Yuval Noah Harari’s work claiming he was a “fraud”—a strong accusation—and one of the pieces of evidence they presented is that:
Harari’s assertion that chimpanzees “hunt together and fight shoulder to shoulder against baboons, cheetahs and enemy chimpanzees” cannot be true because cheetahs and chimpanzees don’t live in the same parts of Africa.
Did you know that? I didn’t. And maybe if the book, Sapiens, were about cheetahs or chimpanzees then this would be a problem. It’s fine to note his mistake, but regardless of what you think of Harari in general, such little errors (dare we call them “hallucinations?”) aren’t broadly reflective of an author’s work, unless there are dozens of them.
Just as you can find mistakes in every thinker’s work, I’d contend that almost all famous academics are a bit of a plagiarist to some minor degree, and someone like Gay deserves opprobrium based on if an aspect shows intent (in her case, the sheer number in proportion to her total work) in ways that go beyond the unavoidable errors that will necessarily occur. How can I demonstrate this? What about picking one of the remaining well-respected famous academics at random (it’s a small pool) and finding evidence of some sort of underlying “base rate” instances that could be alleged as plagiarism. Let’s see… why don’t we ask
is Harvard’s Steven Pinker a plagiarist?
A Google search reveals at least one example already discovered. Large swathes of Chapter 2 in Pinkers’ book Enlightenment Now are uncredited paragraphs from an Edge.org article he wrote in 2017. To see the full text (it’s a bunch, not like one sentence but entire pages), check out this obscure blog post from 2018.
I double-checked my own physical copy of the book and indeed there’s no attribution I can find. So here’s eviiiiillll (self-)plagiarism by Steven Pinker. Let’s check his books for more, maybe he has even worse sins. It’ll only take a second… okay, found another example! On pg. 31 of his 2021 book, Rationality, Pinker writes:
The human visual system is one of the wonders of the world.
And yet, that’s also the first sentence in a 2018 book on neural networks and deep learning (one not cited by Pinker).
Got him! It’d be pretty easy to lump these two instances together with far more debatable criticisms, using content cherry-picked from his huge corpus of work, and put a bow on it.
Hopefully I’m coming across as sarcastic. Because here we should pause, and see what we’re doing. Frankly, the process of digging around like that in another scholar’s work made my skin crawl. The one phrase I did find is something one might naturally come up with, or that an AI might autocomplete if given half the sentence. No, Pinker’s not a plagiarist.
Rather, books are big complicated things to publish and get out, difficult in ways that those who’ve never done it don’t fully understand. You’re wrangling hundreds of citations, tens to hundreds of thousands of words, and no one is helping you except Microsoft Word and your note-keeping system where everything is labeled “Final_Final_version_7.” Even for Steven Pinker, there’s no super team of professionals employed by the publisher checking your work.
In the first instance of self-plagiarism, what happened was that Pinker (or his research assistant) did a copy/paste to fill out the chapter because it concerned a subject Pinker had already written an entire essay on. Then, likely months later, after a dozen formatting changes and versions and edits, Pinker either forgot to reach out to Edge to get permission, or did, and simply forgot to include in tiny text at the end of the book: “Parts of Chapter 2 were once published in Edge in 2017” (tiny text no one on Earth would ever read).
In the second instance of the identical sentence, it’s either an obvious-enough phrasing to have re-occurred to Pinker, or, alternatively, some research assistant once wrote it down in their notes somewhere and Pinker kept it in his draft because he liked it, not knowing it was from somewhere else. Finding this out didn’t change my opinion of Steven Pinker. Pinker has published nine popular science books, an immense mass of text. If someone dug deeper than I was willing to and found a couple more attribution errors or similar-enough phrasings spaced out somewhere within that half a million words, I still wouldn’t be worried. Gay’s total scholastic output, small as it was, was significantly, like 0.1% or something, plagiarized; even in a worst case scenario, Pinker’s might be some vanishingly-small fraction of his much more massive output—call it the “Pinker base rate of mistakes” for productive scholars. I don’t know what the base rate really is, but everyone should keep in mind it’s non-zero.
It helps that Pinker himself has been equanimous when people have supposedly plagiarized him. He spoke to Vice about one such instance:
"I’m not particularly upset, and would not press the Globe and Mail for a retraction. The rewordings are close but not exact, and [the author] gives me plenty of credit, so this is borderline plagiarism at worst, more likely falling into the grey zone."
My point here is just that, when it comes to academic scandals, they must go beyond the Pinker base rate of human errors or coincidences or cryptomnesia. So often, either too much leniency is given, or not enough. For sometimes a response like the one Wittgenstein gave to his department after he forgot his sources seems most appropriate:
Dear Moore,
Your letter annoyed me. When I wrote Logik I didn’t consult the Regulations, and therefore I think it would only be fair if you gave me my degree without consulting them so much either! As to a Preface and Notes I think my examiners will easily see how much I have cribbed from Bosanquet. If I’m not worth your making an exception for me even in some stupid details then I may as well go to Hell directly; and if I am worth it and you don’t it then—by God—you might go there. The whole business is too stupid and too beastly to go on writing about it.
L. W.
So at the risk of being too beastly and too stupid, and going on far too long, I’d like to make clear the standards I wish more held around academic conduct. In wishful thinking, I’ll call them:
Eight better epistemic standards for intellectual life
1. Individual instances of plagiarism, normally treated as the the most grievous sin in academic or intellectual work, are not a big deal if they are small, debatable, a vanishingly small part of a large otherwise-original oeuvre, and best explained by honest mistakes in the production process or as coming from alternative sources (e.g., like Tessier-Lavigne’s problematic papers or Steven Pinker’s examples). Accidental plagiarism due to lazy rewrites of common topics is likely far more common than people realize, even by respected authors or college presidents. However, many repeated instances, especially between differing outputs or projects, indicates a bad scholar or even a scammer. What’s the line? It can’t just be one or two sentences found somewhere that aren’t transformed enough, or one or two missing attributions. I think it requires more like a half-dozen examples, ideally spanning more than one work, and they should be identical hefty chunks of paragraphs (as some of Gay’s examples are), not just a a couple identical phrases to other authors spaced out with different original text in-between (as some of Gay’s examples also are). Maybe this seems a high bar to some, but I think it’s necessary to not confuse the base rate “noise” of mistakes we should expect in large corpuses with actual signal.
2. “Self-plagiarism” is kind of a vapid accusation (especially instances like the methodologies of scientific papers, where you have to describe a technique you’ve used before). It should be avoided if possible, but individual instances aren’t a problem unless there’s a pattern of malicious and haphazard recycling for personal gain, rather than a natural outcome of repeatedly examining the same topics, using the same phrasing, and with fallible human memory.
3. Any work, be it a scientific paper, a book, or an article, can be criticized by painting it in the worst possible light. This is why I don’t do many critical reviews or “contra” posts here. If I wanted to, instead of using Pinker as an innocent example of how small mistakes are unavoidable in a large enough scholarly oeuvre, I could have written a hatchet job. But I could do that to anyone. I could pick names out of a hat and do it—after all, that’s essentially what I did for Pinker! Rarely, someone deserves it, but most don’t. The only reason the majority of intellectual output isn’t matched with its own personal brutal hatchet job is because the time and energy to produce hatchet jobs is limited. While the art is rare, the potential application of the art is not. In turn, the ubiquitous possibility of hatchet jobs makes them uninteresting. It’s better to focus on doing something original.
4. At the same time, no idea is truly original. As Khekheperre-Sonbu wrote during the reign of Senusret II 4,000 years ago:
Would that I had words that are unknown, utterances and sayings in a new language, that hath not yet passed away, and without that which hath been said repeatedly — not an utterance that hath grown stale, what the ancestors have already said.
A lot of people will agree that no idea is truly original, but they ignore what logically follows: that no set of citations in an essay, book, scientific paper, documentary, or anything else, are complete. For any new idea or theory, you can always find someone else who said something kind of similar, at least enough so that some can potentially to be outraged. E.g., in “Who invented memes?” I pointed out that, if Richard Dawkins is credited for proposing the idea of cultural units of replication (“memes”) in The Selfish Gene, how come I can find on the back cover of Gregory Bateson’s book Steps to an Ecology of Mind written in huge letters “Is there some sort of natural selection which allows one idea to live, and another to die?” That’s literally saying the idea of memes, and it was published before Dawkin’s book, and Dawkins likely knew who Bateson was.
Yet having read both books, I still think Dawkins deserves the credit for memes. For usually (a) the supposedly absent citation didn’t actually say the idea directly, or they said it obliquely (as Bateson did) or didn’t emphasize it enough, and also (b) often no one knew about the original proposal anyways.
E.g., in my latest book The World Behind the World: Consciousness, Free Will, and the Limits of Science I give what I originally thought was a novel philosophical argument about why philosophical zombies are paradoxically impossible. However, in a thorough search of the literature, I discovered a version of the argument in an undergraduate thesis by a woman who left philosophy. The paper had only two citations, as obscure as obscure gets. Even if you went to graduate school for philosophy of mind, even if you specialized in the zombie argument, you would likely never have heard of that undergraduate thesis. While there are were some differences, I acknowledged the similarity in the book (in fact, I reached out to hear her story, but couldn’t get in contact). If I had missed that paper in my search by a hair, as I almost did, would that be significantly blameworthy? Citation and originality are always in conflict, and while that isn’t license to steal other’s ideas, in a world requiring perfect citations everyone is 100% satisfied with the incentive for originality withers to almost nothing.
5. Proving someone 100% wrong is usually impossible. People are lawyers for their own ideas, and good lawyers can keep a dispute alive for as long as Dickens’ Jarndyce and Jarndyce.
6. Challenging someone to a debate is the “Let’s step outside” of intellectual interaction. It favors the more aggressive and those with less to lose. Not wanting to debate proves nothing about the strength of someone’s ideas, it’s more often that debating is a lose-lose proposition since the side most eager to do it always has the most to gain.
7. Bets between thinkers are just a lesser form of challenging someone to a debate. In theory it can work, but in practice the majority end up being showy and unnecessary (and favoring the already-wealthy).
8. I’ll end on a common myth we all need to move beyond, which is that books you can find in a bookstore are all fact-checked and contain only golden glowing truths. This isn’t true. Except in rare cases, publishers don’t hire fact checkers, and there’s no peer review either, for popular science books. Just think on it: even in the science section, and even in a discipline like physics, you can find one book saying that the universe is infinite sitting on the shelf right next to another saying the universe is finite. Well, which is it? Let me know, would you.
We can step up #2 even further. Plagiarism is taking intellectual credit for someone else's work. Self-plagiarism by definition doesn't exist. There are some other 'academic honesty' crimes other than plagiarism, like citation manipulation, and republishing the same work under different titles can be for monetary or professional gain without being plagiarism. Having the same word for two different problems is... well, something bad, but not plagiarism, thankfully.
Lots to think about here, thank you!
I’m thinking about...
When I was in college, I was assigned a report on the Philippine-American War. I wrote “The Philippine-American War, also known as the Filipino-American War, took place from 1899 to 1902.” The professor had me stay after class. “There are sentences very close to yours on the internet,” he said. “I’m sure,” I replied. “There are only so many ways to state that fact in one sentence...”
Maybe some people are so hyper aware of plagiarism and they start to see it everywhere?