News
A hot potato: OpenAI's latest artificial intelligence models, o3 and o4-mini, have set new benchmarks in coding, math, and multimodal reasoning. Yet, despite these advancements, the models are ...
When OpenAI unveiled its o3 “reasoning” AI model in December, the company partnered with the creators of ARC-AGI, a benchmark designed to test highly capable AI, to showcase o3’s capabilities.
A discrepancy between first- and third-party benchmark results for OpenAI’s o3 AI model is raising questions about the company’s transparency and model testing practices. When OpenAI unveiled ...
Epoch found that o3 scored around 10%, well below OpenAI's highest claimed score. That doesn't mean OpenAI lied, per se. The benchmark results the company published in December show a lower-bound ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results