Scientists make AI play Battleship to help it do science better

If artificial intelligence is to revolutionize the way science is done, as many cutting-edge AI labs hope, it must first master board games. This is the lesson of a recent study on the decision-making capabilities of AI models, tested with the game Battleship. The goal was to find ways for models to be more careful with limited resources: “cheap interventions” for information seeking, as researcher Valerio Pepe puts it.
Science requires many decisions: Researchers must choose which hypotheses to pursue and which simulations to run. The choices will determine the path forward when resources for experiments are limited. “You can only get so much data because getting data is expensive or time-consuming,” says Pepe, who led work on the project before joining OpenAI. In April, Pepe and his colleagues presented their findings at the International Conference on Learning Representations, an annual meeting dedicated to AI deep learning.
Researchers designed a collaborative version of Battleship that could be played by humans or AI. In the game, one team member generated questions on the ship location map while another answered them, in a combined effort to locate where the ships were hidden and sink them. By counting the number of turns needed to sink all the ships, the researchers were able to test the performance of the large language models (LLMs) against other LLMs and the 42 human players recruited by the group. Initially, humans consistently won in fewer moves than Llama-4-Scout, Meta’s efficiency-focused AI model. OpenAI’s first reasoning model, GPT-5, performed better than both.
On supporting science journalism
If you enjoy this article, please consider supporting our award-winning journalism by subscription. By purchasing a subscription, you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.
The scientists drew inspiration from Bayesian experimental design, in which researchers interpret decision-making by estimating the probabilities of events given prior hypotheses. They optimized their models to ask questions that maximized the chances of hitting targets accurately and the amount of information obtained with each question, as well as to anticipate a turning point when deciding which decision to make. The scientists also found that accuracy increased when players communicated with snippets of code rather than natural language. Through this process, the group allowed Llama-4-Scout to win in fewer moves than GPT-5, two-thirds of the time, for about one-hundredth the cost. On average, he also won in seven fewer moves than human players.
Battleship is much simpler than many scientific problems: chemical and biological samples, for example, cannot be interpreted as clearly as Battleship boards. But Pepe says the methods AI uses in the game will likely also be applicable to scientific decision-making.
“The framework will be very useful for measuring whether language models are actually progressing” in deciding which hypotheses to pursue among all possibilities, says Yuanqi Du, a researcher specializing in AI for chemistry and who recently completed his doctorate. at Cornell University and did not participate in the study. “Understanding all the hypothesis space you’re looking for, that’s the hard part.”
It’s time to defend science
If you enjoyed this article, I would like to ask for your support. Scientific American has been defending science and industry for 180 years, and we are currently experiencing perhaps the most critical moment in these two centuries of history.
I was a Scientific American subscriber since the age of 12, and it helped shape the way I see the world. SciAm always educates and delights me, and inspires a sense of respect for our vast and beautiful universe. I hope this is the case for you too.
If you subscribe to Scientific Americanyou help ensure our coverage centers on meaningful research and discoveries; that we have the resources to account for decisions that threaten laboratories across the United States; and that we support budding and working scientists at a time when the value of science itself too often goes unrecognized.
In exchange, you receive essential information, captivating podcasts, brilliant infographics, newsletters not to be missed, unmissable videos, stimulating games and the best writings and reports from the scientific world. You can even offer a subscription to someone.
There has never been a more important time for us to stand up and show why science matters. I hope you will support us in this mission.


