For years, advanced mathematics has been the last sanctuary of human-only reasoning. That wall is starting to crack. New-generation AI systems are now producing credible solutions to research-grade problems, with independent verification tools showing the arguments hold up—at least in a growing number of cases.

Recent experiments by software engineer and former quant researcher Neel Somani highlight the shift. After giving an OpenAI model extended time to reason through a number theory question inspired by Paul Erdős, he returned to a complete proof. With the help of Harmonic’s formalization tool Aristotle, the argument was translated into machine-checkable form and verified. The surprise wasn’t just that the answer was right; it was that the approach diverged from known solutions while remaining sound.

Table of Contents

A Step-Change in Reasoning for AI Proof Generation
Counting the Wins from AI-Assisted Erdős Problems
The Formalization Turn in Lean and Proof Assistants
Where AI Helps and Where It Stumbles in Mathematics
Implications for Research and Training in Mathematics

A Step-Change in Reasoning for AI Proof Generation

What changed? Models released over the past few weeks combine longer-context reasoning, retrieval across the mathematical literature, and deliberate multi-step search. In Somani’s tests, the system cited classical tools such as Legendre’s formula, Bertrand’s postulate, and the Star of David theorem while triangulating a path to a proof. It even surfaced a 2013 MathOverflow thread where Harvard mathematician Noam Elkies outlined a related argument—then proceeded with a different, more general route tailored to the problem at hand.

These capabilities aren’t appearing in a vacuum. Google-affiliated efforts like the Gemini-powered AlphaEvolve have shown early traction on structured problem sets, and OpenAI’s deep-research features are being used to scan archives like arXiv and MathSciNet. The net result is not merely faster computation but a practical workflow: draft a proof, formalize it, and check it with a proof assistant before a human referee ever reads the first line.

Counting the Wins from AI-Assisted Erdős Problems

The scoreboard is starting to reflect the change. Since the holidays, curators of the online Erdős problem list have moved 15 entries from open to solved, with 11 of those explicitly crediting AI participation. UCLA mathematician Terence Tao, who has been tracking the activity, tallies eight cases in which AI made autonomous, substantive progress on an Erdős problem, plus six more where models accelerated discovery by locating and building on prior work.

No one is claiming that large language models can replace mathematicians. Many solutions are narrow, and several rely on stitching together known lemmas in clever ways. But the pace and pattern are notable: scalable systems seem particularly well-suited to the long tail of deceptively simple Erdős-style questions, where persistence and literature coverage matter as much as inspiration.

The Formalization Turn in Lean and Proof Assistants

A key enabler is the surge in formal verification. The Lean proof assistant, originally developed at Microsoft Research, has matured alongside the community-built mathlib library, making it practical to encode complex arguments. Tools like Harmonic’s Aristotle sit on top, attempting to translate informal steps into Lean and flagging gaps for human attention.

This shift matters because it changes the trust model. Rather than asking mathematicians to accept an AI’s opaque reasoning, formal proof scripts can be checked line by line by a deterministic verifier. That reduces the risk of confident but wrong “hallucinations,” aligns with the culture of reproducibility, and—crucially—creates artifacts others can extend. As Harmonic’s team notes, adoption by professors and researchers is a better signal than demos: reputations hinge on getting the details right.

Where AI Helps and Where It Stumbles in Mathematics

The sweet spot today is combinatorics, elementary number theory, and inequalities—the domains densely covered by lemmas that models can retrieve and recombine. Problems demanding a deep, novel concept or a new object of study remain stubborn. Even when a model proposes the right high-level idea, technical execution can falter without careful human steering, and formalization can expose hidden gaps.

Tao has argued that scale favors models on obscure, easier conjectures—terrain that humans seldom prioritize. That suggests a division of labor: AI clears the underbrush, human mathematicians focus on the deep clearings, and both benefit from a shared, formalized foundation. It is not a fully autonomous future, but it is meaningfully different from earlier “calculator-for-proof” visions.

Implications for Research and Training in Mathematics

For working mathematicians, the near-term payoff is time. Literature triage that once took a week can take an afternoon. Draft proofs can be stress-tested by verifiers before a colleague sees them. Graduate students can learn by comparing informal arguments with their formal counterparts, a process already common in Lean study groups and seminars.

There are challenges ahead: formal libraries still lack coverage in areas like geometry and analysis, benchmarks lag real research, and editorial standards for AI-assisted work are evolving. But the trajectory is clear. With retrieval, deliberate search, and formal verification in the loop, AI is moving from clever calculator to credible collaborator—even, on the right problems, a solitary solver.