DeepMind’s latest: An AI for handling mathematical proofs


Just like AlphaZero, AlphaProof used two main components in most cases. The first was a huge neural network with a few billion parameters that learned to work in a Lean environment through trial and error. He was rewarded for each statement proven or disproven and penalized for each step of reasoning performed, which was a way of encouraging short, elegant proofs.
He was also trained in the use of a second component, a tree search algorithm. This explored all possible actions that could be taken to advance the evidence at each stage. Since the number of possible actions in mathematics can be almost infinite, the neural network’s job was to examine the available branches in the search tree and devote a computational budget to only the most promising ones.
After a few weeks of training, the system was able to perform well in most competitive math tests, based on problems from previous high school competitions, but it still struggled to solve the most difficult of them. To address this, the team added a third component that was not present in AlphaZero. Or elsewhere.
Spark of humanity
The third component, called Test-Time Reinforcement Learning (TTRL), roughly mimics the way mathematicians approach the most difficult problems. The learning part relied on the same combination of neural networks with search tree algorithms. The difference came from what she learned. Instead of relying on a large database of self-formalized problems, AlphaProof working in TTRL mode began its work by generating an entirely new training dataset based on the problem it faced.
The process involved the creation of countless variations of the original statement, some a little more simplified, some more general, and some loosely related to it. The system then attempted to prove or disprove them. This is pretty much what most humans do when faced with a particularly difficult puzzle, the AI equivalent of saying, “I don’t understand, so let’s try a simpler version of this for practice first.” » This allowed AlphaProof to learn on the fly, and it worked incredibly well.



