In every GPT generation before GPT-4.5, scaling up led to massive jumps in performance across domains including mathematics, writing, and coding. Indeed, OpenAI says that GPT-4.5’s increased ...
Across some of the major benchmarks used to evaluate these models, including Competition Math (AIME 2024), PhD-level Science Questions (GPQA Diamond), and SWE-Bench verified (coding), GPT-4.5 ...