Why xAI and Anthropic’s New Products Matter
**AI Agenda**
**Why XAI and Anthropic's New Products Matter**
**By Stephanie Palazzolo**
**Feb 25, 2025, 7:00am PST**
If you've been on X at all in the last few days, you've probably come across posts in which people described jailbreaking XAI's new Grok 3 model. For instance, one person asked the chatbot how to make chemical weapons, while another revealed instructions XAI gave the model to stop it from saying that Elon Musk is the top spreader of misinformation on X.
For all the attention these posts might have drawn, they're a little beside the point. Most developers I've spoken with say the latest XAI model is nearing the performance of models from bigger AI rivals like OpenAI and Anthropic in tasks like coding and math problems. That's quite an accomplishment, given XAI is less than two years old. And it strengthens the argument that large language models are becoming a commodity. In other words, in the long term, no single AI developer will be able to race far ahead of everyone else when it comes to quality.
As a result, AI developers need to do something different to stand out. That includes developing applications they can sell to consumers and businesses, not just selling access to models through an application programming interface—a topic we covered in Monday's issue. In tech lingo, this strategy is called “moving up the tech stack."
XAI has a natural advantage in that Musk has pushed Grok in front of X's hundreds of millions of users. Meta Platforms, similarly, has integrated its AI with Facebook, Instagram, and WhatsApp. By virtue of being the pioneer, OpenAI has created an app behemoth in ChatGPT, surpassing 400 million weekly active users earlier this month. Google's Gemini is being put in front of millions of Gmail users.
OpenAI's archrival, Anthropic, is in an arguably weaker position. It generates the vast majority of its revenue from API sales rather than its Claude app. And it projects that trend will continue.
But Anthropic may be hedging its bets too. Yesterday it announced its latest hybrid AI model—more on that in this newsletter, where we first scooped it—and, more interestingly, its new coding assistant app. The coding app is an especially risky move, given that some of Anthropic's largest customers are coding assistant startups like Cursor, whose makers won't be happy about competing with their model provider.
The immense interest in AI coding could explain why XAI worked hard to make Grok good at coding, science, and math. Coding assistant apps like Cursor are quick to switch AI providers when a better model comes out, so if XAI ever takes the lead from Anthropic in coding quality, or if enough Anthropic customers are upset by its coding assistant, XAI might get a lot of new business.
Musk has claimed that Grok has leapfrogged rivals in coding and other capabilities—but you should take that with a large grain of salt: XAI landed in some hot water after it revealed that some of Grok's most impressive results came from asking the model the same question 64 times and then picking the most common answer—typically not how you’d evaluate the performance of a large language model.
Since we're on the topic of AI evaluations, Anthropic said its new model made strides on a number of coding benchmarks, including scoring 70.3% on SWE-bench verified, a benchmark that tests how well LLMs can fix errors in code. OpenAI in December said its upcoming model, O3, hit 71.7% on SWE-bench verified, but it still hasn't released the model. So overall, it's looking like Anthropic will have a good week atop the AI coding summit.