Your IQ isn't 160. No one's is.

Stratospheric IQs are like leprechauns, unicorns, or mermaids

May 09, 2023

Art for *The Intrinsic Perspective* is by Alexander Naughton

One cannot help but run into people who clearly fantasize about the following scenario: All the great geniuses of the past sit down and take some sort of culture-invariant IQ test, and then we get to line up the numbers and compare them, finally settling once and for all who was the greatest genius of humanity.

In this fantasy they imagine Voltaire in his study, finishing his fortieth cup of coffee (he used to drink around 50 a day), sharpening a #2 pencil at his desk, getting ready to fill in all those little bubbles. Is it A? Or D? Hmm, hasn’t been A in a while. . .

How the great geniuses of the past would do filling in these ovals perennially fascinates. Einstein would get a 160! Darwin a 180! Aristotle, 190! And while speculating about the numerical ranking of the long dead, IQ enthusiasts will refer quite regularly to IQs of 150 or even 200, presumably thinking that intelligence can actually be tracked at those numbers. You can find all sorts of SEO traps that give nonsensical rankings like this:

So pervasive is this thinking that typing into Google “Did Einstein ever take an IQ test?” gives this result as Google’s own sent-to-the-top answer:

Einstein never took a modern IQ test, but it's believed that he had an IQ of 160, the same score as Hawking. Only 1 percent of those who sit the Mensa test achieve the maximum mark, and the average score is 100. A 'genius' test score is generally considered to be anything over 140.

Wow! Except that

IQ numbers for historical figures are made up.

In the above Google-approved quote, the scores for both Einstein and Hawking are imaginary. Here’s from Newsweek:

When asked in a 2004 interview with The New York Times what his IQ is, Hawking gave a curt reply: "I have no idea. People who boast about their IQ are losers."

Now, it’s worth noting that if you want to know your own IQ, and you’ve never taken an official test, just think about whatever your SAT scores were. The two are very well-correlated. If you “test well” then you probably have a high IQ (and people are quite comfortable estimating IQ off of test scores in the modern age).

But this gives us an easy question: did Einstein and other great geniuses of the past “test well?” If they did, then they probably had a high IQ. This method isn’t perfect, but since we lack any actual IQ data on the majority of historical geniuses, it can at least point us in the right direction. E.g., Einstein

did flunk the entrance exam to the Zurich Polytechnic when he first took it — when he was about 1 1/2 years away from graduating high school, at age 16, and hadn’t had a lot of French, the language in which the exam was given. He did fine on the math section but failed the language, botany and zoology sections, according to history.com. A 1984 New York Times story says that the essay Einstein wrote for this exam was “full of errors” but pointed to his later interests.

And yes, he was taking it in a second language, and trying to get into college early, but still, failing botany and zoology? Just “fine” on the math section? Hard to imagine most AP students now-a-days getting those sort of scores. While we can’t possibly know if the standardized tests at the time are as closely correlated to IQ as SAT scores are today, surely they correlate to some degree?

What about Einstein’s grades? Current evidence tells us that grades are “strongly positively correlated” with IQ. If someone gets very high grades, we’d expect them to score high on an IQ test. And while there certainly may be differences between now and then, Einstein was not getting top grades. Here’s his high school report card, with grades spanning from 1-6 (not all sixes):

One reply might be that Einstein was merely a mathematical genius, and so of course he didn’t score well on other subjects. Yet, his grades in college, when he could take mostly mathematical subjects, are around the same—in fact he didn’t get a single maximum grade. This is again on a 1-6 scale:

Einstein seemed to be a mostly B+ student in college, even in math. My guess is this holds for most historical figures now widely considered geniuses, and only in rare cases would the historical record show them consistently acing tests and getting extremely high GPAs in their high schools, things which are, in the modern day, strongly associated with very high IQ.

If we play this game of hypothetical oval-filling, just based on his actual academic record I would estimate that Einstein would get in the 700s on the math section of the SATs, and maybe in the 600s on the verbal section. Ball-parking it, as one must, I think Einstein’s IQ was therefore probably more around 120 or 130 than 160. Indeed very high! But maybe not even “genius level.” He would have scored similarly to Feynman, one of the few geniuses we for sure have a modern IQ for, which was “merely” 125. This conclusion fits well with how

the studies correlating IQ to genius are mostly bad science.

Consider a book from the 1950s, The Making of a Scientist by psychologist and Harvard professor Anne Roe, in which she supposedly measured the IQ of Nobel Prize winners. The book is occasionally dug up and used as evidence that Nobel Prize winners have an extremely high IQ, like 160 plus. But it’s really an example of how many studies of genius are methodologically deeply flawed. In the book, the claims and numbers verge on the obviously ridiculous (e.g., Roe cites someone who claims that Goethe had an IQ of 210, noting that this beat out Leibniz at 205).

Yet, Roe never used an official IQ tests on her subjects, the Nobel Prize winners. Rather, she made up her test, simply a timed test that used SAT questions of the day. Why? Because most IQ tests have ceilings (you can only score like a 130 or 140 on them) and Roe thought—without any evidence or testing—that would be too low for the Nobel Prize winners. And while she got some help with this from the organization that created the SATs, she admits:

The test I used is not one that has been used before, at least in this form.

And furthermore:

I was not particularly concerned at the outset over the fact that I had no norms for this test. That is, I had no idea what any other population would do on the same test.

In other words, she had an untested set of SAT questions that she gave to Nobel prize winners not knowing how anyone else would do on them. This is pretty problematic. Normally IQ tests try to achieve some form of group-level neutrality; e.g., many of the major modern IQ tests are constructed from the outset so as not show any average difference between male and female takers, to be as culturally-invariant as possible, etc. And while Roe didn’t publish without any comparison group to her chosen geniuses whatsoever, the comparison that she did use was only a graduating class of PhD students (sample size unknown, as far as I can tell) who also took some other more standard IQ tests of the day, and she basically just converted from their scores on the other tests to scores on her make-shift test of SAT questions. Yet, here are the raw numbers of how the Nobel-prize winners do on the test she created:

Notice anything? The Nobel Prize winners all scored rather average. In fact, pretty low, in some cases. But Roe then goes on to claim that their IQ is extremely high, based on her statistical transformations:

I must caution that these equivalents have been arrived at by a series of statistical transformations based on assumptions which are generally valid for this type of material but which have not been specifically checked for these data. Nevertheless I believe that they are meaningful and a fair guide to what the situation is. The median score of this group on this verbal test is approximately equivalent to an IQ of 166.

Wait a minute. How did this conversion to a median IQ of 166 take place? After all, the scientists are scoring in the middle of the range on the test. They are getting a lot of questions wrong. E.g., Biologists who won the Nobel Prize got a 56.6 on the Verbal but we know that was far from the maximum score, Experimental Physicists got an even lower 46.6, etc. How then did she arrive at the group altogether having an astoundingly-high median verbal IQ of 166? Assuming that those at the upper range of scoring got close to most of the questions right (she mentions this is true, some only missed 4-10 questions at the maximum range), then how can getting only roughly two-thirds of the questions right translate to an IQ in the 160s?

Perhaps these SAT questions were just impossibly hard? Judge for yourself. Here’s one of the two examples she gives of the type of questions the Nobel Prize winners answered:

In each item in the first section, four words were given, and the subject had to pick the two which were most nearly opposite in meaning and underline them.
Here is one of the items: 1. Predictable 2. Precarious 3. Stable 4. Laborious.

This. . . isn’t very hard (spoiler: 2 & 3). So the conclusion of a median verbal IQ of 166 is deeply questionable, and totally reliant on this mysterious conversion she performed.

This sort of experimental setup would never fly today (my guess is the statistical conversion had all sorts of problems, e.g., Roe mentions extraordinarily high IQ numbers for PhD students at the time that don’t make sense, like an avg. IQ of 140). A far more natural reading of her results is to remove the mysterious conversion and look at the raw data, which is that the Nobel-prize-winning scientists scored well but not amazingly on SAT questions, indicating that Nobel Prize winners would get test scores above average but would not ace the SATs, since the average was far below the top of the possible range.

Now that I’ve finished being hard on poor Anne Roe, long deceased and unable to defend herself, an interesting woman who lived and worked when science was more of a Wild West, and who was the 9th tenured female professor at Harvard, it’s worth pointing out that a lot of the other stuff in her book is fascinating, like her examination of Nobel Prize winners’ habits and backgrounds. So to be fair to those once here and now gone, I’ll let Anne end this section in her own words, written in a footnote of The Making of a Scientist:

It is now being considered whether we might not do better to work at the problem in a different way and try to include other factors such as motivation. I strongly endorse this. For some time I have been convinced that there is no such thing as “creative ability” as a unit factor which some people have and some do not, and I strongly suspect that I will soon come to the conclusion that the same thing applies to intellectual ability as a thing apart.

Anne was right, because we now know that

IQ is changeable.

Practice works wonders for IQ tests, just as it does the SATs. There is a limited set of types of questions IQ tests ask, always variations on a theme. The more you familiarize yourself with Raven’s Progressive Matrices, the better you’ll do on them. Even just practice in general problem solving can boost IQ scores. Consider an experiment conducted by R. Kvashchev in former Yugoslavia:

In an effort to improve performance of high school students on intelligence tests, a large-scale study involving 296 students was carried out. Members of the experimental group (N = 149) were given exercises in creative problem solving 3 to 4 times a week over a period of 3 years and performance was assessed on four occasions. . . The test battery contained 28 measures of fluid and crystallized intelligence.

In a reanalysis of the data published in 2020 in the Journal of Intelligence the authors argued that

with the properly defined measures of fluid and crystallized intelligence, the experimental group showed a 15 IQ points higher increase than the control group. We concluded that prolonged intensive training in creative problem-solving can lead to substantial and positive effects on intelligence during late adolescence (ages 18–19).

Regardless of if IQ numbers really are this changeable so late in development, I think people who supposedly score incredibly high on specialized IQ tests, like Chris Langan, a bouncer with an “IQ of 200,” are simply people who practice IQ tests and know them in and out, treating such IQ tests much the way Jeopardy contests treat the show—as a subject of obsession and study. And just like how being a good Jeopardy contestant has no connection to real genius, so too with those who score extremely high on IQ tests. Because what’s important to keep in mind is that

IQ gets less defined the higher you go.

You can find all over the internet sites giving some form of this claim: “IQ is one of the most valid and reliable psychological constructs.” And this is true. . . by the standards of psychology. Don’t mistake this for being what a normal person would refer to as “reliable.” In the field of psychology, almost nothing is reliable. Effects regularly cannot be replicated, and those that can inevitably decrease in their effect size, often shrinking to the barely observable. Psychology struggles as a discipline to achieve even close to the same tensile strength in its hypotheses as other scientific fields, like physics or biology. Yet, sometimes IQ is treated as if it rises, miraculously, above these problems.

It doesn’t. As IQ gets higher, it gets less definite. Rankings of Person A and B will swap places depending on what test they take. Meaning that IQ is “valid and reliable” at the level that psychologists care about, which is being able to get significant values for their p-values across large data sets. But here’s how actual scores look for individuals if they take different IQ tests:

J, who received a 101 on one test and an 86 on another, is either completely average or so dumb he almost cannot serve in the army (where the IQ cutoff is 83). L is either ready for her PhD at 124, or, alternatively, she and J are identical (102 vs. 101). There are a few people who are surprisingly stable, like F, but the majority vary by big point spreads. According to some estimates, the standard error of measurement of IQ tests is around seven points, meaning you should regularly expect things like 10-20 point spreads, just as we see here. (Broadly we can think of this error as being either across tests or upon re-taking the same test. The exact error might not even be a definable thing, but what’s relevant is that it can lead to large two-digit spreads, like J’s 15-point spread or L’s 22-point spread or B’s 20-point spread).

The situation is even worse for IQs of 140 plus. First, the number of tests that have higher ceilings and can reach stratospheric numbers is low. Which means the tests are not as well-established or researched, and instead are often ad hoc or not appropriately normed. The consequence is that the amount of uncertainty, the 20-point spreads, is for the normal range of scores, e.g., for scores below 125 or 130. Once you start climbing beyond that the variation in scores get larger and larger. It doubles. No, it quadruples! This effect has been known for a very long time, at least since 1937. Here’s from the more recent “Identification of Students for Gifted and Talented Services: Theory into Practice:”

The concerns associated with SEMs [standard errors of measurement] are actually substantially worse for scores at the extremes of the distribution, especially when scores approach the maximum possible on a test. . . when students answer most of the items correctly. In these cases, errors of measurement for scale scores will increase substantially at the extremes of the distribution. Commonly the SEM is from two to four times larger for very high scores than for scores near the mean.

Two to four times larger?! This means that spreads of 20-40 points should be the norm, and could get truly crazy beyond that. So we should expect Jack taking a high-ceiling IQ test and getting a 160, and then taking another high-ceiling IQ test and getting a 120. And there’s not an infinite battery of high-ceiling IQ tests we can throw at people to narrow this down (especially since small changes to the tests themselves will likely catapult us along some other axis of variance). This increasing incoherence of higher scores shows up in studies of the real-world impact of IQ, where the

benefits of a high IQ vanish past a certain point.

This was precisely Nassim Taleb’s point when he wrote the anti-IQ screed “IQ is largely a pseudoscientific swindle.” Taleb, never afraid to praise Taleb, recently said in a Tweet that “No piece in history has been more influential in fighting racism, eugenism, & racial mandarinism” (these debates are long-standing and significantly predate Taleb’s essay). In the piece itself he argued, in traditional swing-for-the-fences fashion, that IQ tests are

via negativa not via positiva. Designed for learning disabilities. . . it ends up selecting for exam-takers, paper shufflers, obedient IYIs (intellectuals yet idiots), ill adapted for “real life”.

This means that Taleb, instead of defending the motte (IQ at high levels doesn’t tell us about genius, in fact, high-IQ differentials don’t matter much if at all, and are immeasurable anyways) Taleb ended up defending the easily-attackable bailey (IQ tells us nothing!). The more tenable position is to be a believer in the first, and a disbeliever in the second. Yet most of Taleb’s critics took him at his word, and considered it a satisfying rebuttal that IQs above 100 scaled in correlation with anything at all—grades, income, whatever—no matter how weakly.

But given its known measurement variance, IQs mattering less and less at higher scales almost has to be true, since the variance alone injects huge amounts of noise into any study. From a statistical level it would be shocking to get really clear results differentiating any real-world factor between IQs of 130 vs. 150, simply because the error is so large, and the number of people even satisfying those conditions is so small (in fact, it’s quite likely that, due to variance and practice, the average Mensa member is far under the actual IQ requirement). Consider a recent example: the debate over a big modern study published this January tracking 59,000 men.

The problem with these debates, and leaning too much on one study, is that there is a huge literature on IQ, which means that people on either side of the debate can pull up a dozen studies showing whatever they want (there are a lot of fields like this). Yet, from what I can tell, the issue of test variance appears a barrier impossible to overcome. E.g., one might counter the above study showing that IQ ceases to matter for income with another, earlier study, that claims to answer “Can You Ever Be Too Smart for Your Own Good?” in the negative. In that earlier study they correlated 214 life outcomes (things like educational achievement and income) to IQ, and find that

Given these results, greater cognitive ability does not cease to remain beneficial for individuals with above average ability or with scores greater than IQ = 120.

Well, that settles it, right? Nope. Because as usual, there’s a lot of motte and bailey switching here. What they actually find is that the difference in the hundreds of life outcomes gets tiny at the upper echelons of IQ.

Finally, to check the possibility that only very high intelligence is detrimental, we tested for outcome differences between individuals within the top 10% and top 20% of ability scores. . . We performed a median split within each group (top 10% and top 20%) and compared outcome scores for individuals above or below the median using a simple t test or χ2 test of proportions. In only a minority of cases did we detect a significant difference (p < .05) within the top 10% (20 out of 214 comparisons, 9%) or top 20% (48 out of 214 comparisons, 22%) of cognitive ability scores.

So if we actually look at the numbers, there are statistically significant differences in only 48 out of 214 life outcomes in the top 20%, and only 20 out of 214 in the top 10%—and the effects are small too (despite starting from with a very large data set of ~50,000 individuals). Some of the effects are even negative! But this attenuating effect of IQ differentials correlating less and less to outcomes is papered over—instead, victory is declared that there is any detectable difference for IQs above 120 whatsoever.

If we put the increasingly absurd measurement error together with the lack of clear and replicable real-world difference, the simplest explanation when it comes to IQs of, e.g., 150, 160, 170, is that they simply aren’t real. At higher levels, Jack and Jill swap places endlessly, a game of musical chairs as they jump around 30-point spreads with no way to reliably reduce the variance. And what chair they happen to be sitting in for a particular test, ahead or behind, matters not at all to their life outcomes.

So if someone regularly talks about IQs significantly above 140 like these were actual measurable and reliable numbers that have a real-world effect, know that they are talking about a fantasy. And if they make claims that various historical figures possessed such numbers, then they’re talking unscientific nonsense. If they’re bragging about themselves, well. . . it’s like someone talking about their astrological sign. Stratospheric IQs are about as real as leprechauns, unicorns, mermaids—they’re fun to tell tales about, but the evidence for them being a repeatedly measurable phenomenon that matters in any meaningful sense of the word is zip, zero, zilch.

And yet, I am not Stephen J. Gould.

In fact, on re-reading it after originally planning to cite it for this essay, I was struck by Gould’s weak and judgy arguments in his The Mismeasure of Man, a book that supposedly takes down research on IQ. I don’t think that IQ is like measuring skull circumference, nor do I think just talking about IQ, or researching it, is bad, or evil, or inherently racist, or dumb, or whatever other accusation one can throw. I would definitely never say something like “IQ doesn’t matter at all.” I wouldn’t even say “IQ is unimportant.” I think it is important, in that it’s one of the only measurements we have that does an okay job at capturing intelligence, in that it’s not too bad at this when it comes to the center of the distribution, although it gets increasingly bad at it at the tails.

And, from a practical perspective, there is a sense in which I’m actually very pro-IQ tests! I recently bemoaned the quiet dropping of the SAT and GRE from college admissions, writing that:

This nation-wide change being officially enshrined this year troubles me in particular because it’s now no longer possible to get a middling high school GPA at a public school, get a top-notch SAT score, get to choose between a couple good colleges, and then have a successful career afterward. Which is how my life went. Of course, no one can really know the true counterfactuals, but it’s likely that the SAT is why I have a career as a scientist and author at all.

But holding that opinion that doesn’t mean that I think really high IQ numbers are actually real, or that IQ is determinate past a certain point in judging “academic potential.”

A simple way of saying it: When I read Tolstoy, what I think is that the man was a genius. If he scored a 120 on an IQ test, that would reflect on IQ tests, not Tolstoy. This holds true in my personal experience as well. I’ve known a couple people in life who got perfect or near-perfect SAT scores and went on to places like Harvard and MIT. I would consider none of them geniuses. They just didn’t have it. On the other hand, I’ve also been lucky enough to be able to meet, and occasionally work alongside, people I would consider scientific geniuses. Yet never once did I feel that an IQ test would capture these people operating at the highest level of intellectual output (who I will avoid the embarrassment of name dropping). Many of these intellectual stars were not even quick-witted. Sure, all of them were smart, obviously so, and all of them would score above average, likely well above average, on an IQ test. But their individual rankings on those tests, when compared to each other, would mean absolutely nothing. It would just be dead information. Far more important was how they were deep creative thinkers with good instincts for what questions were fecund, coupled with an obsessive drive to pursue those questions. They had elegant minds, deep pools of expertise, and often voracious cultural knowledge outside their chosen discipline. They were people on fire with thought. So if you only did pretty good on the SAT, don’t worry too much. The evidence says you can still win the Nobel Prize.

85 Comments

Billy

May 9, 2023Liked by Erik Hoel

I think it was Charles Murray who compared IQ scores to the weight of offensive lineman - there's a minimum threshold that you need to meet to really compete at the higher levels, but once you pass that threshold, more does not correlate at all with higher performance. Winning at those higher levels requires those things that are difficult or impossible to measure, the intangibles, that being "on fire with thought"... or to carry on the football analogy, being on fire with pushing those other dudes out of the way.

Expand full comment

6 replies

Chasing Ennui

I appreciate your touching on the middle ground between "IQ tests are gospel" and "IQ is made up." It seems pretty clear to me that: a) people differ in their intellectual abilities in ways that are at least somewhat innate (though it's also possible to excel or lag in some areas but not others); b) it's not entirely innate; and c) it's hard to have an entirely unbiased test intellectual ability, let alone tease out innate from acquired ability, particularly in a given person. However, there seem to be a lot of people who like to use (b) and (c) to "prove" (a) isn't true.

83 more comments...

The Intrinsic Perspective