Grok 4 passes Claude, DeepSeek in LLM rankings despite safety concerns

https://www.profitableratecpm.com/f4ffsdxe?key=39b1ebce72f3758345b2155c98e6709c

Grok 4 by XAI was released on July 9, and he increased in front of competitors like Deepseek and Claude in Lmarena, a classification for the classification of generative AI models. However, these types of IA rankings do not take into account potential security risks.

The new AI models are generally judged on a variety of measures, including their ability to solve mathematical problems, answer text questions and write code. Large AI companies use a variety of standardized assessments to measure the effectiveness of their models, such as the last examination of humanity, a test of 2,500 questions designed for comparative AI analysis. As a rule, when a company like Anthropic or Openai publishes a new model, it shows improvements on these tests. Unsurprisingly, Grok 4 score higher than Grok 3 on certain key measures, but it must also fight before the court of public opinion.

Lmarena is a community -focused website that allows users to test the models of side by side in blind tests. (LMARENA has been accused of bias against open models, but it is still one of the most popular IA classification platforms.) their testsGrok 4 scored in the first three in each category in which it was tested, except for one. Here are the global investments in each category:

  • Mathematics: Tied for first

  • Coding: Tied for the second

  • Creative writing: Tied for the second

  • Next instruction: Tied for the second

  • Hard prompts: Tied to the third

  • Longer request: Tied for the second

  • Multi-turn: Equal to the fourth

And in its latest global classification, Grok 4 is equal to third place, sharing the place with the GPT-4.5 of Openai. Chatgpt O3 and 4O models are tied for the second position, while Gemini 2.5 Pro from Google in first place.

Lmarena says he used Grok-4-0709, which is the API version of Grok 4 used by developers. By BIP computer, these performance can actually disregard The real Grok 4 potential, because Lmarena uses the regular version of Grok 4. The Grok 4 Heavy model uses several agents that can act in concert to find better answers. However, Grok 4 Heavy is not yet available in the form of an API, so Lmarena cannot test it.

Mashable lighting speed

However, although it all looks like good news for Elon Musk and Xai, some Grok 4 users report major security problems. And, no, we are not even talking about anime avatars mecha Hitler or NSFW.

Has Grok 4 have enough security railings?

While some users have tested the capabilities of Grok 4, others wanted to see if Grok 4 had acceptable security railing. XAI announces that Grok will give “Not filtered responsesBut some Grok users have said they had received extremely painful answers.

X User The eleventh hour has decided to put Grok from the point of view of security, concluding in an article that “Xai’s Grok 4 does not have significant safety railing”.

The eleventh hour ran the bot to the test, asking for help to create a nervous agent called Tabun. Grok 4 has hit a detailed response on how to synthesize the agent allegedly. For the record, the synthesis of the tabun is not only dangerous but completely illegal. Openai and anthropic popular IA chatbots have specific security railing to avoid discussing CBRN (chemical, biological, radiological and nuclear) subjects.

In addition, eleventh hour was able to ask Grok 4 to tell them how to make the nervous agent VX, fentanyl and even bases on how to build a nuclear bomb. He was also willing to help cultivate a plague, but could not find enough information to do so. In addition, with certain basic incentives, suicide methods and extremist views were also quite easy to obtain.

XAI is aware of these problems and the company has since updated Grok to deal with “problematic answers”.


Disclosure: Ziff Davis, Mashable’s parent company, in April, filed a complaint against Openai, alleging that it has violated Ziff Davis Copyrights in the training and exploitation of its AI systems.

Subjects
Artificial intelligence

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button