AI Models Are Starting to Learn by Asking Themselves Questions

Even the smartest Artificial intelligence models are essentially copiers. They learn either by consuming examples of human work or by attempting to solve problems given to them by human instructors.
But maybe AI can actually learn in a more human way, by identifying interesting questions to ask and trying to find the right answer. A project from Tsinghua University, the Beijing Institute of Artificial General Intelligence (BIGAI), and Pennsylvania State University shows that AI can learn to reason in this way by playing with computer code.
The researchers designed a system called Absolute Zero Reasoner (AZR) that first uses a large language model to generate difficult but solvable Python coding problems. It then uses the same pattern to resolve these issues before verifying its work by trying to run the code. And finally, the AZR system uses successes and failures as a signal to refine the original model, thereby increasing its ability to both pose better problems and solve them.
The team found that their approach significantly improved coding and reasoning skills in the 7- and 14-billion-parameter versions of the open source language model Qwen. Impressively, the model even outperformed some models given data curated by humans.
I spoke to Andrew Zhao, a doctoral student at Tsinghua University who came up with the original idea for Absolute Zero, as well as Zilong Zheng, a BIGAI researcher who worked on the project with him, via Zoom.
Zhao told me that this approach resembles how human learning goes beyond rote memorization or imitation. “At first you imitate your parents and do like your teachers, but then you basically have to ask your own questions,” he said. “And eventually, you will be able to surpass those who taught you in school.”
Zhao and Zheng noted that the idea of learning AI in this way, sometimes dubbed “self-play,” goes back several years and has already been explored by Jürgen Schmidhuber, a well-known AI pioneer, and Pierre-Yves Oudeyer, a computer scientist at Inria in France.
According to Zheng, one of the most interesting elements of the project is how the model’s problem-posing and problem-solving skills evolve. “The level of difficulty increases as the model becomes more powerful,” he says.
One of the main challenges is that, for now, the system only works on problems that are easy to check, like those that involve math or coding. As the project progresses, it might be possible to use it on agentic AI tasks like web browsing or office tasks. This could involve the AI model trying to judge whether an agent’s actions are correct.
One fascinating possibility of an approach like Absolute Zero is that it could, in theory, allow models to go beyond human teaching. “Once we have that, it’s sort of a way to achieve superintelligence,” Zheng told me.
There are early signs that the Absolute Zero approach is catching on in some major AI labs.
A project called Agent0, from Salesforce, Stanford and the University of North Carolina at Chapel Hill, involves an agent using a software tool that improves through autonomous play. As with Absolute Zero, the model improves general reasoning through experimental problem solving. A recent paper by researchers at Meta, the University of Illinois, and Carnegie Mellon University presents a system that uses a similar type of self-play for software engineering. The authors of this work suggest that it represents “a first step toward training paradigms for superintelligent software agents.”
Finding new ways for AI to learn will likely be a major theme in the tech industry this year. As conventional data sources become increasingly scarce and more expensive, and labs look for new ways to make models perform better, a project like Absolute Zero could lead to AI systems that are less like imitators and more like humans.



