News

When OpenAI unveiled its o3 “reasoning” AI model in December, the company partnered with the creators of ARC-AGI, a benchmark designed to test highly capable AI, to showcase o3’s capabilities.
A discrepancy between first- and third-party benchmark results for OpenAI’s o3 AI model is raising questions about the company’s transparency and model testing practices. When OpenAI unveiled ...
As a result, like older models, o3 and o4-mini show strong and even improved performance in coding, math, and science tasks. However, they also have an important new addition: visual understanding.
A hot potato: OpenAI's latest artificial intelligence models, o3 and o4-mini, have set new benchmarks in coding, math, and multimodal reasoning. Yet, despite these advancements, the models are ...
OpenAI is preparing to launch as many as three new AI models, possibly called "o4-mini", "o4-mini-high" and "o3". Right now, ChatGPT has as many as five models, including GPT 4o (the non-reasoning ...
Epoch found that o3 scored around 10%, well below OpenAI's highest claimed score. That doesn't mean OpenAI lied, per se. The benchmark results the company published in December show a lower-bound ...