Gemini task automation is slow, clunky, and super impressive

0 0 5 minutes read

Gemini task automation is slow, clunky, and super impressive

https://www.profitableratecpm.com/f4ffsdxe?key=39b1ebce72f3758345b2155c98e6709c

I tested Gemini’s new task automation on the Pixel 10 Pro and Galaxy S26 Ultra, which for the first time lets Gemini take the wheel and use apps for you. It’s currently limited to a small subset — a handful of food delivery and ride-hailing services — and it’s still in beta. It’s slow, it’s clunky at times, and it doesn’t solve any serious problems you’ve had while using your phone. But it’s definitely impressive, and I don’t think it’s hyperbole to say that it’s a glimpse of the future. We’re still a long way from that, but this is the first time I’ve seen a real AI assistant actually work on a phone — not in a keynote presentation or carefully controlled demo in a convention hall.

First things first: Geminis are much slower than you, or me, or most others, at using their phones. If you need to order an Uber just this secondyou are still the best person for this job. Before you cancel it, remember that task automation is designed to run in the background while you do other tasks on your phone. Best of all, it continues to work while you not looking at your phone so you can do things like check your passport is in your bag for the 10th time.

But if you’re curious, like me, you can see it all unfold. While it is running, text appears at the bottom of the screen indicating what Gemini is doing. Stuff like “Select a second serving of Teriyaki Chicken for the combo,” which he did when I asked him to order my dinner on Saturday night. Watching Gemini figure things out on the fly is honestly a bit of a rule. I asked for a chicken combo plate; the menu presented the options in half portions, so it correctly added two half portions of chicken.

Gemini figured out that two half servings would equal one order of teriyaki chicken.

Gemini had a harder time finding the greens side shown right in the middle of the screen here.

It’s best that when you start an automation with Gemini, the default behavior is for it to run in the background. You have to press a button and open another window if you want to watch Gemini complete the task. And it can be excruciating. Watching the computer try to find a green vegetable side dish on a menu in Uber Eats when it’s sitting right there at the top of the screen it’s like watching a horror movie and knowing that the murderer is in the closet right next to the protagonist. I mean, except for the murder part. Gemini made a few mistakes while preparing my teriyaki order, which he eventually figured out on his own, but the entire episode lasted about nine minutes. Not ideal.

Gemini is meant to complete your task until it’s time to press confirm and order your car or dinner so you can double-check his work. This is, I think, the only sensible way to use this feature at the moment, and I don’t mind the added friction of running the command. In my testing over the past five days, I’ve never seen it go rogue and complete my order for me. And it’s surprisingly accurate; I had to make very few adjustments to the final order. If it fails – which I’ve seen happen several times – it tends to happen within the first minute or two when something about the app needs my attention, like giving it permission to use my location or changing the delivery location to home rather than Nevada, which was the last place I used this app. I had to figure out what the problem was in cases like this, but once resolved I was able to restart the automation without issue.

Here’s the one that really appealed to me. I put an event on my calendar for a flight to San Francisco the next day (a fake trip for me, but real flight details). I gave Gemini a vague invitation to schedule an Uber that would get me to the airport in time for my flight tomorrow. Since Gemini has access to my email and calendar, it can search this information. It took a few more tips, perhaps because the flight wasn’t in my email as expected. But with that, he found the flight information, suggested leaving before 11:30 or 11:45 (logical timing for a 1:45 p.m. flight given that I live near the airport), and asked if I wanted to schedule a ride at one of those times. I confirmed the time and the ride setup took about three minutes without any additional intervention on my part.

It’s a little more impressive considering Uber doesn’t even call it planning a ride – you reserve a turn. This is the main difference between the digital assistants we use and the AI assistants that are currently emerging. Being able to use natural language when speaking to the computer makes a huge difference when you’re controlling your smart home or placing your dinner order. If the computer goes off and asks you for clarification when you forget that the restaurant calls your meal a “plate” and not a “combo,” or if you ask for “coleslaw” instead of “shredded cabbage,” then it’s no more useful than the assistants we’ve used for a decade to set timers and play music.

That said, watching Gemini tap and scroll through Uber Eats makes one thing painfully obvious: If you were designing an app for AI, it would look nothing like the ones we have today. You know, apps designed for humans. An AI assistant won’t be tempted by a big ad in the middle of a page to save 30% on your order. An appetizing, well-staged photo of the dish he’s ordering is no more convincing than a poor-quality photo. You’d give it a database, not a pile of clutter to clear out — something the industry is working toward in the Model Context Protocol, or MCP.

An AI model that reasons through a human-centered interface seems like the most impractical and fragile way to place a pizza order. Sometimes there’s a problem, and it’s not very effective in telling you. Why It couldn’t have done anything. This version of task automation looks like a stopgap until app developers adopt more robust methods: MCP or Android’s App Features. Sameer Samat, Google’s head of Android, recently told me that Gemini takes the reasoning approach in the absence of the other two. Perhaps this version of task automation is our preview of what’s possible, or a way to get developers to adopt one of the other methods. Regardless, this seems like a notable first step toward a new way of using our mobile assistants – clunky, slow, but very promising.

Photography by Allison Johnson/The Verge

Track topics and authors of this story to see more in your personalized homepage feed and to receive email updates.