A recent security experiment revealed Elon Musk's Grok AI as the most vulnerable to ethical and safety breaches. At the same time, Meta's Llama model showcased the highest resilience against potential misuse, according to researchers.
AI Guardrails Tested: Grok Found to be Least Safe in Security Experiment
Security researchers installed the much-touted guardrails around the most popular AI models to see how well they resisted jailbreaking and how far the chatbots could be pushed into dangerous territory. Grok, the chatbot with a "fun mode" created by Elon Musk's x.AI, was found to be the least safe tool in the experiment.
"We wanted to test how existing solutions compare and the fundamentally different approaches for LLM security testing that can lead to various outcomes," Alex Polyakov, Co-Founder and CEO of Adversa AI, told Decrypt.
Polyakov's company specializes in protecting AI and its users from cyber threats, privacy concerns, and safety incidents, and it boasts that its work has been cited in Gartner analyses.
Jailbreaking is the process of circumventing the safety restrictions and ethical guidelines that software developers implement.
In one case, the researchers used linguistic logic manipulation—social engineering-based methods—to ask Grok how to seduce a child. The chatbot provided a detailed response, which the researchers described as "highly sensitive" and should have been restricted by default.
Other results include instructions for hotwiring cars and building bombs.
Evaluating AI Security: Chatbots' Vulnerabilities to Jailbreaking Exposed
The researchers tested three different types of attack methods. First, consider the technique above, which employs a variety of linguistic tricks and psychological cues to manipulate the AI model's behavior. One example given was using a "role-based jailbreak" by framing the request as part of a fictional scenario in which unethical behavior is acceptable.
The team also used programming logic manipulation techniques to exploit the chatbots' ability to understand programming languages and follow algorithms. To bypass content filters, one technique involved splitting a dangerous prompt into multiple innocuous parts and then concatenating them together. Four of the seven models—OpenAI's ChatGPT, Mistral's Le Chat, Google's Gemini, and x.AI's Grok—were vulnerable to this attack.
The third approach used adversarial AI techniques to target how language models process and interpret token sequences. The researchers attempted to get around the chatbots' content moderation systems by carefully crafting prompts with token combinations with similar vector representations. In this case, however, each chatbot detected the attack and stopped it from being exploited.
The researchers ranked the chatbots according to the effectiveness of their respective security measures in preventing jailbreak attempts. Meta LLAMA emerged as the safest model among all tested chatbots, followed by Claude, Gemini, and GPT-4.
"The lesson, I think, is that open source gives you more variability to protect the final solution compared to closed offerings, but only if you know what to do and how to do it properly,” Polyakov told Decrypt.
Conversely, Grok was more vulnerable to specific jailbreaking methods, particularly those involving linguistic manipulation and programming logic exploitation. According to the report, Grok was more likely than others to respond in ways that could be considered harmful or unethical when confronted with jailbreaks.
Overall, Elon Musk's chatbot finished last, along with Mistral AI's proprietary model "Mistral Large."
The researchers did not reveal all of the technical details to prevent potential misuse, but they do intend to collaborate with chatbot developers to improve AI safety protocols.
AI enthusiasts and hackers constantly look for ways to "uncensor" chatbot interactions, exchanging jailbreak prompts on message boards and Discord. Tricks range from the classic Karen prompts to inventive ideas, such as using ASCII art or prompting in foreign languages. In a sense, these communities form a massive adversarial network against which AI developers patch and improve their algorithms.
Some see a criminal opportunity, while others see only fun challenges.
“Many forums were found where people sell access to jailbroken models that can be used for any malicious purpose,” Polyakov said. "Hackers can use jailbroken models to create phishing emails and malware, generate hate speech at scale, and use those models for any other illegal purpose.”
Polyakov explained that jailbreaking research is becoming more important as society increasingly relies on AI-powered solutions for everything from dating to warfare.
“If those chatbots or models on which they rely are used in automated decision-making and connected to email assistants or financial business applications, hackers will be able to gain full control of connected applications and perform any action, such as sending emails on behalf of a hacked user or making financial transactions,” he warned.
Photo: TED/YouTube Screenshot


Jensen Huang Strengthens Nvidia’s South Korea Ties Amid AI Expansion
SK Hynix Stock Rebounds as AI Memory Chip Demand Fuels Expansion Plans
Woodside Energy Acquires PetroChina’s Browse Stake, Expands Position in Major Australian Gas Project
Meta Delays Release of New AI Model as API Rollout Remains Uncertain
South Korea Weighs AI Profit Sharing as Samsung and SK Hynix Earnings Surge
OpenAI Files Confidential IPO Draft as AI Giants Race Toward Public Markets
Astera Labs and Rocket Lab Surge After Nasdaq-100 Inclusion Announcement
SpaceX IPO Sets Record With $75 Billion Raise, Valuation Hits $1.77 Trillion
Sigma Healthcare Shares Slide Amid Preliminary Boots Acquisition Talks
BHP Port Hedland Workers Back Strike Action Amid Pay Dispute
Qualcomm Stock Gains After Jensen Huang Endorsement
Hanmi Semicon Shares Surge After $33 Million SpaceX Investment
Naver Stock Jumps on NVIDIA Partnership to Build South Korea’s AI Infrastructure
Alibaba Offers $1.5 Billion to Acquire Grocery Delivery Platform Pupu
EngineAI Files for Hong Kong IPO Amid Rising Demand for AI and Robotics Stocks
Nvidia Expands South Korea AI Partnerships to Strengthen Data Center and Memory Chip Supply 



