AI Agents Are Terrible Freelance Workers

https://www.profitableratecpm.com/f4ffsdxe?key=39b1ebce72f3758345b2155c98e6709c

Even the best AI agents are pretty desperate for online freelance work, according to an experiment that challenges the idea that AI will replace office workers en masse.

The Remote Labor Index, a new benchmark developed by researchers at data annotation firm Scale AI and the nonprofit Center for AI Safety (CAIS), measures the ability of frontier AI models to automate economically valuable work.

The researchers offered several top AI agents a range of simulated freelance jobs and found that even the best ones could do less than 3% of the work, earning $1,810 out of a possible $143,991. The researchers looked at several tools and found the best performer was Manus from a Chinese startup of the same name, followed by xAI’s Grok, Anthropic’s Claude, OpenAI’s ChatGPT and Google’s Gemini.

“I hope this gives much more accurate impressions of what’s happening with AI capabilities,” says Dan Hendrycks, director of CAIS. He adds that even though some agents have improved significantly over the past year, that doesn’t mean it will continue at the same rate.

The dramatic advances in AI have led to speculation that AI will soon surpass human intelligence and replace large numbers of workers. In March, Anthropic CEO Dario Amodei suggested that 90% of coding work would be automated within a few months.

Previous waves of AI have inspired erroneous predictions about job losses, for example regarding the imminent replacement of radiologists with AI algorithms.

The researchers generated a range of freelance tasks through verified Upwork workers. Duties cover a range of work including graphic design, video editing, game development and administrative tasks such as data retrieval. They combined a description of each job with a directory of files needed to complete the job and an example of a finished project produced by a human.

Hendrycks says that even though AI models have improved at coding, mathematics and logical reasoning in recent years, they still struggle to use different tools and perform complex tasks involving many steps. “They don’t have long-term memory and can’t continually learn from experiences. They can’t learn skills on the job like humans,” he says.

The analysis offers a counterpoint to an economic work benchmark proposed in September by OpenAI called GDPval, which purports to measure economically valuable work. According to GDPval, pioneering AI models such as GPT-5 come close to human capabilities across 220 tasks across a range of office jobs. OpenAI did not provide any comment.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button