28 Comments
User's avatar
Dawson Eliasen's avatar

There are also proofs for all provable theorems in the Library of Babel. I'm not seeing how this is meaningfully different from finding one there.

Erik Hoel's avatar

I think it’s really been underrated to what degree certain problems are searchable (as opposed to just verifiable). So some might be extremely amenable to search! But in other cases, the stuff in the Library might be extremely spread out, so even if you search and search and search you don’t get anything.

Robson's avatar

Searchable vs verifiable has been formalized as P!=NP for many decades.

James McDermott's avatar

It's different because human mathematicians found it interesting enough to state it and work on it.

Carlos's avatar

Like you said, I think science is the domain where ASI fails to materialize, because there is no amount of intelligence that allows you to skip having to do real world experiments, and I think we're pretty far away from fully automated laboratories. Science really is very different from math and programming.

Ben Schulz's avatar

When simulations do enough to show that they were correct, science fields will also fall. Protein folding will eventually be solved. Chemistry and materials science as well. Biology and Psychology are likely the last to go as they're numerous pitfalls and unknown unknowns.

1123581321's avatar

In real world, e.g., materials science, simulations can’t “show that they were correct”. Only experiments can. This puts a natural ceiling on possible progress rates. Basically, Singularity is impossible and we’re not going to colonize the stars.

Graham L's avatar

People "believe" funny things, don't they. "Humans will never achieve heavier than air flight" (just before the Wright Brothers). "Physics has basically all been discovered and all that's left is to tidy up the details" (just before Einstein). "Rockets for space travel are too expensive for non-governmental organizations" (just before Elon). "We're not going to colonize the stars" - yeah, right. If we don't destroy ourselves first, we're going to colonize the stars. Imagination. Bring someone from the 18th century into this world and find out just how many things we already do that he would tell you are "impossible" or "magic". And we're only 8000 years into growing civilization. Wait till it's 20,000 years and see what we're doing.

1123581321's avatar

Ok not within anyone’s lifetime.

But cheers to tech optimism!

Ben Schulz's avatar

As simulations improve, we can perfectly predict certain qualities and performance; obviously checked by real experiments. Superconductivity is a good example. No one predicted the "dancing" found in Cooper pairs. If we can get simulations that would have predicted those movements, we would have made better atomic lattice. Colonizing stars sysyems doesn't require any huge new additional technology knowledge, by the way. It's basically time, energy, and desire.

1123581321's avatar

Real experiments is the whole point though. Example: I’m sure AI will come up with novel battery chemistries and construction. Still need months of charge cycling to know how they perform. Many such cases.

Sure, time and energy. In such massively enormous amounts that new tech is needed, you don’t expect the Starship to actually scale up for a trip to Sirius, do you?

Ben Schulz's avatar

Starship is not an interstellar transport. The engines and hull aren't designed for it. There's numerous books and proposals that have nothing to do with Elon's vision. His goals are not the only way. O'Neil cylinders are just one example.

1123581321's avatar

I mean, the point is that the amount of energy needed is so enormous that chemical rockets are not going to cut it unless we’re expecting to travel for millennia. We don’t have that tech either….

Gordon P's avatar

Your last paragraph for me brings to mind the replication crisis and the similar hype and disappointment experienced around many of those "findings." The tasks of generating new information, and determining whether that information is true, are different if related things. The scientific method provides a framework to help them move together, but as a problem grows in complexity, it becomes easier for methodological errors to hide, and thus takes more time to replicate and verify. Due to limited attention, we have historically leaned on authority and process to mitigate this, with results varying as widely as the politics and cultures of our institutions. But increasing the available attention doesn't remove the problem, it only changes it.

If our benchmarks are saturated, and as AI continues to beat our self-imposed human benchmarks, we rapidly approach the boundaries of knowledge. Alignment questions aside, whether it's humans, humans+AI, or just AI, new knowledge must be dissected and integrated into a larger world model. Is it true? How do you prove it? What resources or underlying knowledge do you need to prove it? What kinds of mistakes might someone make along the way? Is it just a search problem, or are there critical new insights or techniques required? What other knowledge must be updated as a consequence? And so on.

Of course, people like "good enough," and we accept the risks of application failures as a cost of doing business. If it's important enough, we will engineer and iterate our way to success. But if attention is less of a bottleneck, then maybe more can be done before you get to that point. Or perhaps must be done, if you want to avoid a future of smoke and mirrors covering up unverifiable slop.

Alec Pritzos's avatar

The control-group point is the actual hinge of this argument. Demonstrating a frontier capability step-up requires showing the prior production model couldn't surface this disproof given the same search budget and compute envelope. Without that comparison, the announcement reads as a lucky hit across an open-problem set, not an architecture jump. The Gowers and Litt names confirm the disproof itself meets a top-journal standard; neither speaks to whether the previous model class would have surfaced the same construction under identical search ceilings, which is the comparison OpenAI would need to publish to convert this from a marketing release into a measurable capability claim.

Wabi Sabi's avatar

This is the very definition of a strong-link situation, isn't it - if our best mathematicians are still better than AI our whole species feels better, even if we personally can't understand the maths/science/coding problems, let alone the solutions.

Niles Loughlin's avatar

Like all human-made tools based on human-led labor, this is not exactly surprising. Impressive and noteworthy that a novel solution was elicited and produced by AI! But to say AI alone solved this problem would be to say that humans are not the ones who have solved problems or improved on multiple fields of study with computers - that computers are the intelligent recipient.

A verifiable AI solution to a mathematical problem - one rapidly improved upon by non-AI human iteration - proves by a posteriori necessity that such a problem always was solvable. An extremely useful tool in the repertoire that aids our fields of labor, but which itself cannot derive productive value without human input or predicate knowledge.

James McDermott's avatar

For people who want to be AI-bearish, the argument about feathers should not be a successful cope. Progress is just as fast as it seems (but yes, gradual, not a single lightbulb moment). Anyone super surprised by this result has not been paying attention.

Some people think they are creating a rigorous, bright-line experiment when they set up a scenario like "an AI will never be able to X" where in this case X is "produce novel mathematical research (defined as Y) with only a small non-specific prompt Z" etc. But there is no bright line, just a continuum.

Nick Hounsome's avatar

This does sound a bit like when people used to say: "yes well, okay a program can now beat a chess master but it will never beat the best human chess player"

Paul Topping's avatar

Those viewing that chart with the cloud of AGI definition points should realize that it only includes AGI definitions imagined by AI fanboys. The true human-equivalent AGI definition is way off the chart. It's a bit like a solar system diagram with true AGI as Proxima Centauri. If your chart shows Pluto at the edge of the paper, the closest star is about a mile away.

Paul Topping's avatar

I'm just saying that the chart seems to imply that current AI is somewhere close to being AGI depending on which definition you like. Current AI is not close to AGI which is defined as human-level cognition.

Wabi Sabi's avatar

Thanks. The way I'd put it is that AI doesn't even operate *like* the human brain, so that to talk of it thinking would be like talking about a submarine swimming, to borrow Chomsky's (already borrowed?) metaphor.

Hamed Al-khateeb's avatar

What about dissecting Timothy Gowers's Words and tone of speaking? What does he want to promote? Either way, mathematicians should study math as they used to, maybe research methods will change.. But he, as a respected mathematician, had a disappointing tone

Matt's avatar

It seems to me that any time I hear (read) someone saying something hyperbolic about AI, that once you cut thru the hype, there is almost always an element of anthropomorphizing that underlies their blindness. I get it. It is a lot easier to imagine how a consciousness arising in an AI could then have super human intelligence and out perform humans at everything - it makes it feel more plausible because we are human! And even when this hyperbolic person states they know what they are experiencing is not real (e.g. AI consciousness) the fact that it feels so real allows them to forget it's lack of realness. Can someone with literary clout please write a parable for this pitfall?! I imagine something akin to looking in a magic mirror and seeing a reflection that isnt real but everyone starts to think its real, and the mirror company says "sure looks real to me, you should buy some...". Or something like that =)

Kafulu's avatar
1dEdited

Dunno. Writing good prose feels like a much harder problem than good math, at least from the perspective of AI. For truly good prose (be it in high literature, journalism or your average subreddit) you need to "get" some subtleties of the context and be able formulate some insights and ideas which feel meaningful enough for people to do the proverbial "share and upvote". I think that's very hard to do without some level of conscious understanding. Math in contrast, feels a bit more like a mechanics problem, in the sense that it doesn't have to be interesting or meaningful in the same way, it just have to work.

It just so happens that mediocre prose written by someone who has memorized the entire internet is a lot more useful and valuable than mediocre math (or even mediocre programming code). I mean, mediocre math was already "solved" by computers decades ago. In 2026, who'd possibly be sufficiently impressed to pay premium subscriptions for a calculator that is right 99% of the time?

As for groundbreaking new math; who is going to actually verify new AI proofs? And how will OpenAI programmers be able to tell a bogus proof from the real deal when dealing with serious unsolved problems that actually keep mathematicians awake at night? Mathematical proofs in the 21st central tend to be contentious, to say the least. Somehow, I feel it is very convenient of them to come up with a "disproof" rather than a "proof". Nice and clean, just show that "a" is inferior to "b". Now, what if there is no conjunture "b" and the AI has to prove that the existing conjuction "a" is theoretically "correct" or otherwise unsurpassable or unresolvable? That would be a lot less clear-cut, and I suspect that OpenAI in this process actually came up with many more AI-proofs and conjectures whose veracity they can't vouch for without a lot more work - human mathematician work. I wouldn't be worried of losing my livelihood to AI if I was a mathematician working on unsolved math problems, in fact, the odds of landing a lucrative job/contract with a tech company in the next decade probably increased a lot.

Lucas's avatar

It's a bit result. However it's seems that it falls in the category of things that computer are good at (are suitable for) and human are not. Quoted from Bloom (commentary from the published paper): “[the AI’s] success here echoes previous achievements: it often produces the most surprising results by persevering down the paths that a human may have dismissed as not worth their time to explore, combining superhuman levels of patience with familiarity with a vast array of technical machinery".

Ben Schulz's avatar

Solving math conjecture rely on tools, tricks, logic, calculations, and other proofs. What any human or AI will need is new tools and proofs to solve remaining conjectures. The logic and computation is now nearly on par. So, what we will see are new conjectures posed by AI that are as interesting and useful as Erdős.