Google’s new robotics AI can run without the cloud and still tie your shoes

We sometimes call chatbots like Gemini and Chatgpt “robots”, but the generator also plays an increasing role in real physical robots. After announcing Gemini Robotics earlier this year, Google Deepmind has now revealed a new VLA model on disk (Vision Language Action) to control robots. Unlike the previous version, there is no cloud component, allowing robots to operate with complete autonomy.
Carolina Parada, responsible for robotics at Google Deepmind, says that this AI robotics approach could make robots more reliable in difficult situations. It is also the first version of Google’s robotic model that developers can adjust their specific uses.
Robotics is a unique problem for AI because not only does the robot exist in the physical world, but it also changes its environment. Whether you have it moved or tie your shoes, it is difficult to predict each possibility that a robot could meet. The traditional approach to the formation of a robot on action with strengthening has been very slow, but generative AI allows much greater generalization.
“He draws from understanding the multimodal world of Gemini in order to do a completely new task,” explains Carolina Parada. “What it allows is in the same way that Gemini can produce text, writing poetry, simply summarizing an article, you can also write code and you can also generate images. It can also generate robot actions.”
General robots, no necessary cloud
In the previous version of Gemini Robotics (which is always the “best” version of Google’s Robotics Tech), the platforms manage a hybrid system with a small model on the robot and a larger one working in the cloud. You’ve probably watched the chatbots “think” for measurable seconds because they generate an outing, but robots must react quickly. If you tell the robot to recover and move an object, you don’t want it to stop while each step is generated. The local model allows rapid adaptation, while the server -based model can help with complex reasoning tasks. Google Deepmind is now unleashing the local model as an autonomous VLA, and it is surprisingly robust.