GPT-5.5 Tops Academic Benchmarks but Loses to Rivals in Real-User Tests

Mădălin Mihai

For OpenAI, the release of GPT-5.5 a week ago was a big deal. For one thing, it is the first AI model in a long time to have received a complete pre-training — and it is meant to lay the foundation for everything else related to AI agents. In the benchmarks published by OpenAI, the new LLM naturally also shines in comparison to the two main competitors, Claude Opus 4.7 from Anthropic and Gemini 3.1 Pro from Google. But what about independent tests? Here, a mixed picture is emerging. On Arena.ai,

astăzi